Experiment tracking in machine learning is the process of saving all experiment metadata in one central place (database or a repository). This includes model hyperparameters, model performance metrics, run logs, model artifacts, data artifacts, etc.
Experiment logging can be implemented in many different ways. It can be something as simple as spreadsheets (nobody does that these days!), or leverage GitHub to track experiments.
The easiest way to achieve experiment logging is by either using an open-source library / framework such as MLFlow or buy some enterprise tool platform offering these capabilities such as Weights & Biases, Comet, etc. This article lists down some really useful experiment logging tools for data scientists.
This is not a sponsored post.
MLflow is an open source platform to manage the machine learning lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components:
Tracking experiments to record and compare parameters and results (MLflow Tracking).
Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production (MLflow Projects).
Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).
Providing a central model store to collaboratively manage the full lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations (MLflow Model Registry).
MLflow is library-agnostic. You can use it with any machine learning library, and in any programming language, since all functions are accessible through a REST API and CLI. For convenience, the project also includes a Python API, R API, and Java API.
Example Code using PyCaret:
# install pycaret
pip install pycaret