Machine Learning Process

The main two processes that we aim to cover with MyAutoML are the training and predicting processes. They are two separate processes, one for training a model and one for making predictions using a trained model. Each process is executed by running a Python script, e.g. train.py and predict.py. This can be as simple or as complex as you like: you can run the scripts manually (you can even run the code from a Jupyter notebook), or as an automated script in a Docker container on a Kubernetes platform scheduled by Airflow.

Training

The purpose of the training process is to start with some data, process it with a certain algorithm and produce a model that captures the interesting patterns in the training data.

../_images/training-process.png

Predicting

The goal of the prediction process is to use a (trained) model and apply it to some new data to make predictions. A prediction script can make predictions for a batch of items, or it can spawn an API for real-time, on-demand predictions.

../_images/prediction-process.png

Calibrating

In some classification use cases we need to calibrate the output of our models to actual probabilities, rather than generic scores. While sometimes this can be done directly in the training process, in other cases it is more pragmatic to train a model first, and perform the calibration separately using the following process:

../_images/calibrating-process.png