This course follows on from the introductory course in familiarising you with the core R language
In this course we focus on extending your language knowledge to include more advanced filtering and maniupulation, we also look at data restructuring and the summarisation of datasets with repeated values. We also look more at how to deal with awkward data - files that don't import cleanly, have missing values or are inconsistently annotated. All problems which can cause trouble when trying to perform an analysis
10X has become the dominant platform for the creation of high throughput single cell RNA-Seq data. In this course we look at the processing, exploration and analysis of this data using both desktop tools such as the Loupe Browser, and R packages such as Seurat. As well as standard analyses we also look at the types of artefacts and problems which affect this data and how to indentify and remedy them.
DNA methylation is one of the most important epigenetic signals in higher organisms. It is a major determinant of gene expression and is the mechanism by which genomic imprinting occurs. DNA methylation can be measured in a high throughput manner by the use of bisulphite sequencing.
In this course we look at the processing, exploration, visualisation and analysis of bisulphite sequencing datasets. We go through the different varities of library types which are commonly used and show how their processing differs. We also look at artefacts and biases which might affect the analysis and show how to effectively visualise these before formulating sensible experimental questions and analyses.
SeqMonk is a desktop program which can visualise, quantitate and analyse large sets of mapped genomic positions. It is used in the analysis of many differet types of data including RNA-Seq, ChIP-Seq and Bisulphite-Seq.
In this course we go through the main principles of the program, looking at creating new projects, ways to visualise raw data, options for quantitation and plotting of quantitated data, filtering and statistical analysis and then finally tracking and reporting.
This course provides the foundations for using SeqMonk for all types of data and is a useful introduction to a program which is used widely throughout our more application specific courses
GGPlot is the R tidyverse package to draw figures and graphs. This course follows on from the introductory R course and looks in more detail at how to construct and customise graphs using GGPlot. By the end of the course you will be able to produce and save complex multi-layered plots with custom annotation and colouring.
The endpoint of many different high throughput experiments is a list of interesting genes, often accompanied by metrics such as p-values or similar. Making biological sense of these lists can be challenging but is crucial if we are to target the most relevant aspects of biology.
In this course we look at the data sources and analysis techniques which allow us to take a shortcut to finding the interesting biology from gene set results. We look at both functional gene set analysis as well as sequence motif based approaches, and look at the choices we get in these techniqes, the artefacts and biases which can mislead us, and the options for how to present the results we get.
R is a specialised programming language whose main purpose is the maniupulation, visualisation and analysis of datasets. It is one of the core tools used in many numerical disciplines and is very popular in the field of bioinformatics.
Although R has highly specialised extensions to provide tools to analyse very specific types of data (genomics, proteomics, flow cytometry etc.) there is also a core set of concepts and functionality which you will need to know whatever you want to do with the language. This course focussed on equipping students with a solid understanding of the core R language which will stand them in good stead whatever type of data they eventually wish to go on to analyse.
The course will introduce the RStudio development environment and will show how this can be used to develop R code. It will look at importing, filtering, restructuring and summarising data, as well as using the ggplot library for producing publication quality figures and graphs.
As well as learning the language we will also cover good development practices, debugging, generating reports and managing your code in a source code repository to ensure you have the full range of skills necessary to make effective use of R.
This course starts from scratch, with no assumption about prior knowledge in statistics. At the start, we will revisit fundamental statistical principles and progressively build up towards more complex approaches. The course is about building up the confidence of scientists in their statistical skills, in their ability to use statistical tests as tools to quantify their confidence in the answers given by their data.
During the course, we will address a wide range of data related issues, from experimental design to analysis of quantitative and qualitative data.
The processing of many modern datasets requires the use of a unix or linux environment, and many people use this as their preferred operating system. In this course we look at how you can use the unix command line to control the running of individual programs, to manage your data and to perform some basic automation to make large scale processing easier.
In NGS sequencing you want to be able to identify problems in your data as early as possible to save wasted effort and to allow you to apply corrections. This course looks at the ways you can assess the quality of NGS data, the different types of failure you can have, and the options for ways to correct any issues you identify. It covers both the theory of how Illumina sequencing works, and the practical early stage analysis of this data.
ChIP-Seq is a high throughput sequencing technology which allows you to identify the position on the genome of anything you can target with an antibody. It was originally used to find trancsription factor binding sites, but has now been extended to many other epigenetic and regulatory marks. There are also other related techniques such as ATAC-Seq or Cut-n-Run etc. which adopt the same analysis proceedures as more traditional ChIP.
In this course we look at the processing and analysis of this data, looking carefully at how to evaluate data quality and identify artefacts which might confound your analysis. We go through options for visualisation and exploration of the data as well as more formal analyses such as peak calling and differential enrichment.
RNA-Seq is a very common high throughput sequencing technique used to measure the transcriptome of a biological sample. This course goes through the whole process of RNA-Seq data processing, visualisation and analysis, from raw fastq data to a validated and annotated set of hits.
We look at the way RNA-Seq libraries are constructed and the effect this has on downstream steps in the analysis. For each part of the pipeline we look at what can go wrong, how to assess that things are working, and remidial steps you can take to correct or explore problems.
After this course you should have the confidence to be able to process and analyse your own data for a simple RNA-Seq experimental design
Figures and graphs are mostly commonly the way we communicate our data to others. Producing atractive figures which are easy to interpret is the best way to make a convincing case for a scientific story you are trying to tell.
In this course we look at a number of ways to help you produce better figures. We look at the underlying theory of why humans find some figures easier to interpret than others, we extend this to look at choices for using common plot types, and then we bring in relevant aspects of graphical design to make the appearance of figures as appealing as possible.
Finally we look at the use of the open source Inkscape program to allow the drawing of novel illustrations, as well as customising existing graphs, and assembling multiple figures into panels.