BioTrain - Bioinformatics and Statistical Training

Course	Location	Starting Date
Introduction to Machine Learning with R	Online	25 March 2026	View

Machine learning is a powerful set of techniques which can help in the understanding of complex datasets. At the same time, these techniques can produce confusing or misleading results if used without care.

In this course we cover the concepts of machine learning and we look at some of the most common types of model. We go through the best practices for implementation of data preparation and modelling, and we show how to practically build your own models in R using the tidymodels package.

Pre-Course Requirements & Suggestions

This course assumes that you have knowledge or skills equivalent to those taught in the following courses.

Introduction to R (with tidyverse)

Please ask us if you're unsure if you have the necessary knowledge or skills for this course.

Whilst not required, it may be useful to attend the following courses to supplement the knowledge you'll get from this one.

Advanced R (with tidyverse)

Creating Complex Figures with GGPlot

Introduction to Statistics with R

Course Content

(click to expand each section)

We start by defining what machine learning is, and the types of questions it can and cannot answer. We cover the main principles and terminology used in the field and show examples of how it can apply to biological questions.

Machine Learning is an umbrella term for lots of different types of analytical techniques. We go through the theory of some of the more common types of model from linear regresison to decision trees to neural networks to show how they can be used to make predictions, and the different requirements they have.

One of the most important parts of machine learning is being able to objectively see whether the model you have built is producing useful results. In this section we go through how to build a test dataset for evaluation and the different metrics you can use. We show the problems with overfitting and why robust testing is important.

Having high quality, clean input data is one of the best ways to maximise the likelihood of success in machine learning. We talk about common problems seen during modelling and how we can avoid these with data preparation and filtering.

The tidymodels framework is a convenient and user-friendly way to build machine learning models in R and provides a structure which allows you to easily switch between different models without having to learn lots of different package specific code. We cover the basic structure of tidymodels and use this to build and evaluate our first model.

If you want to build a robust resuable model then it can be useful to know how to automate various steps in data preparation and model building. This can make a model easier to re-use on new data, and is essential if you want to optimise a model by exploring different sets of parameters used in the model constrction. We therefore look at how we can use tidymodels recipes to automate models.

Introduction to Machine Learning with R

Available Dates

Pre-Course Requirements & Suggestions

Course Content

What is machine learning

Types of machine learning model

Model Evaluation

Preparing Data for Modelling

Building models in tidymodels

Automation and Recipes within tidymodels