Processing and Analysing RNA-Seq Data

Available Dates

No public dates currently available for this course

RNA-Seq is a very common high throughput sequencing technique used to measure the transcriptome of a biological sample. This course goes through the whole process of RNA-Seq data processing, visualisation and analysis, from raw fastq data to a validated and annotated set of hits.

We look at the way RNA-Seq libraries are constructed and the effect this has on downstream steps in the analysis. For each part of the pipeline we look at what can go wrong, how to assess that things are working, and remedial steps you can take to correct or explore problems.

After this course you should have the confidence to be able to process and analyse your own data for a simple RNA-Seq experimental design.

Pre-Course Requirements & Suggestions

Whilst not required, it may be useful to attend the following courses to supplement the knowledge you'll get from this one.

Introduction to Linux and Bash

Course Content

(click to expand each section)

We start by looking at how RNA-Seq libraries are produced and the challenges this introduces into their processing. We then go through all of the steps in a standard data processing pipeline going from raw FastQ files through to aligned reads in BAM files.
We next come to the exploration of our data, looking at some basic quality control steps and setting up an analysis project within the SeqMonk package. We look at how we can quantitate, explore and visualise our data to understand what's going on before moving to a statistical analysis.
We look at the most common statistical methods to detect differential expression in a 2 condition experiment. We examine the DESeq2 package in more detail to see how an initial p-value is created, and look at options for subsequent filtering of results to get to a sensible number of candidate hits which we can examine.