Announcing PyCaret and DagsHub Integration 🤝

6 min readJan 9, 2023

Build a simple MLOps stack using PyCaret and DagsHub

Introduction

Using PyCaret with DagsHub, you can now log your experiments and artifacts on remote DagsHub servers without changing any code.

With this integration, you can use MLflow on a remote server that DagsHub manages and hosts for free. It enables collaboration as multiple people can access and work with the same MLflow experiment runs, allowing for better collaboration on projects.

Also, by using a remote MLflow server, your data is backed up and will not be lost in the event of a local machine failure. You can access your experiment runs from anywhere, as long as you have an internet connection. This can be especially useful if you are working on a project from multiple locations.

PyCaret

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few lines. This makes experiments exponentially fast and efficient.

The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner.

To learn more about PyCaret, check out the official documentation.

DagsHub

DagsHub is a platform for data scientists and machine learning engineers to version their data, models, experiments, and code. It allows data science teams to easily share, review, and reuse their work, providing a GitHub experience for machine learning.

DagsHub is built on popular open-source tools and formats, making it easy to integrate with the tools you already use. To learn more about DagsHub, check out the official documentation.

Integration: PyCaret and DagsHub

PyCaret provides an out-of-the-box integration with MLflow, enabling users to log experiment metrics, parameters, artifacts, and data locally that can be accessed using the MLFlow UI. This is great if you are working alone, but not so much if you want to coordinate with other data scientists and engineers on your team.

To collaborate on MLFlow experiments, you must set up a remote URI, configure a database to store model metrics and parameters, and use a file storage system like AWS S3, Azure Blob, etc. Setting up a remote URI and managing the MLFlow service on your own can be difficult, time-consuming, and costly in terms of run time and storage.

This is where integration with DagsHub comes in.

DagsHub provides a remote MLflow server for each repository, enabling users to log experiments with MLflow and view and manage the results and trained models from the built-in UI.

The DagsHub repository also has fully configured object storage for storing data, models, and any large files. These files can be diffed, allowing users to see the differences between different versions of their data and models and better understand the impact of those changes on their results.

With this integration, PyCaret users will be able to log their experiments on a DagsHub-hosted remote MLflow server and easily compare and share them with others.

Additionally, users can use DVC to version raw and processed data, which can then be pushed to DagsHub for viewing, comparison, and sharing.

All of this without changing a single line of code.

How to log experiments using DagsHub?

To use the DagsHub Logger with PyCaret, set log_experiment = 'dagshub' in the setup function.

# installing libraries
pip install --pre pycaret
pip install mlflow dagshub

# load dataset
from pycaret.datasets import get_data
data = get_data('iris')

# init setup
from pycaret.classification import *
s = setup(data, target = 'species', session_id = 123, 
          log_experiment = 'dagshub', experiment_name = 'project_iris', log_data = True)

# compare base models
best = compare_models()

# save best model
save_model(best, 'best_iris_model')

On running the setup it will print a link to authorize your DagsHub account. Click on that link to authorize the connection. Once that is done, you will be asked to enter ownner_name/repo_name to complete the setup function.

At this point, the repo is initialized on DagsHub. You can go to https://www.dagshub.com/moez.ali/project_iris to check out the project.

Click on Experiments tab to see all the model runs. This is an internal DagsHub logger (pretty similar to MLFlow).

You can click on the runs to see the parameter and metric details:

What about the MLFlow logger? Well, it is also there.

Go to https://www.dagshub.com/moez.ali/project_iris.mlflow and you will be able to see the MLFlow dashboard.

BOOM! The MLFlow server is fully managed and hosted for you by DagsHub for free.

You can now share this link with other people on your team and also work collaboratively with them on the same experiment (if you give them permission to write).

Isn’t it amazing?

Check out this Colab Notebook for a full demo.

To learn more about this integration, you can also read DagsHub official announcement.

Liked the blog? Connect with Moez Ali

Moez Ali is an innovator and technologist. A data scientist turned product manager dedicated to creating modern and cutting-edge data products and growing vibrant open-source communities around them.

Creator of PyCaret, 100+ publications with 500+ citations, keynote speaker and globally recognized for open-source contributions in Python.

Let’s be friends! connect with me:

👉 LinkedIn
👉 Twitter
👉 Medium
👉 YouTube

🔥 Check out my brand new personal website: https://www.moez.ai.

To learn more about my open-source work: PyCaret, you can check out this GitHub repo or you can follow PyCaret’s Official LinkedIn page.

Listen to my talk on Time Series Forecasting with PyCaret in DATA+AI SUMMIT 2022 by Databricks.

🚀 My most read articles:

Machine Learning in Power BI using PyCaret

A step-by-step tutorial for implementing machine learning in Power BI within minutes

towardsdatascience.com

Announcing PyCaret 2.0

An open source low-code machine learning library in Python

towardsdatascience.com

Time Series Forecasting with PyCaret Regression Module

A step-by-step tutorial for time-series forecasting using PyCaret

towardsdatascience.com

Multiple Time Series Forecasting with PyCaret

A step-by-step tutorial on forecasting multiple time series using PyCaret

towardsdatascience.com

Time Series Anomaly Detection with PyCaret

A step-by-step tutorial on unsupervised anomaly detection for time series data using PyCaret

towardsdatascience.co

Announcing PyCaret and DagsHub Integration 🤝

Introduction

PyCaret

DagsHub

Integration: PyCaret and DagsHub

How to log experiments using DagsHub?

What about the MLFlow logger? Well, it is also there.

Liked the blog? Connect with Moez Ali

Let’s be friends! connect with me:

🚀 My most read articles:

Machine Learning in Power BI using PyCaret

A step-by-step tutorial for implementing machine learning in Power BI within minutes

Announcing PyCaret 2.0

An open source low-code machine learning library in Python

Time Series Forecasting with PyCaret Regression Module

A step-by-step tutorial for time-series forecasting using PyCaret

Multiple Time Series Forecasting with PyCaret

A step-by-step tutorial on forecasting multiple time series using PyCaret

Time Series Anomaly Detection with PyCaret

A step-by-step tutorial on unsupervised anomaly detection for time series data using PyCaret

Written by Moez Ali

Responses (2)