Getting started with scikit-learn
Following on from the Introduction to Machine Learning course, this series of hands-on workshops will get you started with applying supervised and unsupervised machine learning methods in Python, using the popular scikit-learn
package.
After completing this workshop, you will be better able to:
- Prepare a dataset for machine learning in Python
- Select a scikit-learn method appropriate for a particular learning task
- Construct your own workflows for model training and testing
- Evaluate the performance of a model
We will be working with python using jupyter notebooks. The easiest way to access jupyter is via the Anaconda platform.
Please install Anaconda from https://www.anaconda.com in advance of the first session.
Please ensure that you have an up-to-date scikit-learn package installed prior to starting the first session. General installation instructions are available here: https://scikit-learn.org/stable/install.html#installation-instructions
scikit-learn is part of the default installation of Anaconda, so you may already have everything you need.
Download this repository to your computer as a ZIP file and unpack it.
Open JupyterLab (within Anaconda) and navigate to the unpacked directory to work with the .ipynb notebooks.
Alternatively, you can run the notebooks online using Binder:
We will be working with a variety of real and synthetic data sets to illustrate various methods. For your own work between classes, you will be asked to identify a suitable data set from your own research or from other work within your field.
You can start thinking about this before the course, but the main requirements for a machine learning data set will be discussed more during the first session.