BioTrain - Bioinformatics and Statistical Training

The R language is an environment designed for statistical analysis. This course follows on from our introductory R courses to look in much more detail at the statistical aspects of R. It provides both an introduction to core statistical concepts and common tests, as well as how to practically implement and visualise these in R.

The course goes through all common aspects of statistics, from power analysis and experimental design to summary statistics and the analysis of both quantitative and qualitative data.

After attending this course you should have the theoretical knowledge of how to select and correctly apply a statistical test, and the R skills to put this into practice.

Pre-Course Requirements & Suggestions

This course assumes that you have knowledge or skills equivalent to those taught in the following courses.

Introduction to R (with tidyverse)

Please ask us if you're unsure if you have the necessary knowledge or skills for this course.

Whilst not required, it may be useful to attend the following courses to supplement the knowledge you'll get from this one.

Advanced R (with tidyverse)

Creating Complex Figures with GGPlot

Course Content

(click to expand each section)

The concept of power is central in statistics and pivotal for the correct interpretation of p-values. Power analysis leads to sample size estimation, a key aspect of experimental design. This session goes through the concepts of power and their implementation for different types of data.

Statistics relies of methods to summarise large datasets to understand their key properties. In this section we look at common metrics for the summarisation of quantitative data such as mean, standard deviation, standard error of the mean etc. We also see how these form the basis for some of the most common statistical tests for quantitative data. We show how the initial exploration and visualisation of your data is one of the most important steps in data analysis - even before statistical tests are run.

Quantitative values are probably the most commonly encountered data type. This section looks at the properties of these types of data. We cover how to evalute and possibly transform the data intially and then go through the statistics which apply to quantitative values. We look at the Student's T-Test for simple two-condition comparisons and both one-way and two-way ANOVA for more complex experiments with multiple conditions.

Continuing the work on quantitative data we look at relationships between independent quantitative values. We start by showing how correlation tests can both quantify and formalise the linkage between pairs of values. We then extend this to the concept of linear regression which provides a more comprehensive framework to explore more complex relationships between variables.

In this module, we cover the main non-parametric tests. The classic parametric tests such as t-tests or correlation have non-parametric counterparts that we should use when our data fail to meet the assumptions required by the tests for the behaviour of the data. We look at the non-parametric equivalents and the circumstances under which these would be a better choice for your analysis.

In the final section we move away from quantitative data to the analysis of categorical values where we are looking at the counts of different categorical options. As well as illustrating different ways to visualise this data we look at the chi-square and Fisher's exact test for the comparison of proportional shifts in the amonts of different categories in a data set.

Introduction to Statistics with R

Available Dates

Pre-Course Requirements & Suggestions

Course Content

Introduction to Statistics with R

Available Dates

Pre-Course Requirements & Suggestions

Course Content

Power analysis for sample size estimation

Descriptive statistics and data exploration

Analysis of quantitative data

Correlation and Linear regression

Analysis of quantitative data: Non-parametric statistics

Analysing qualitative data