Clustering Analysis in Power BI using PyCaret 3.0

Moez Ali
5 min readMar 28

Uncover Actionable Insights with PyCaret 3.0: How to Build a Clustering Model in Power BI

Photo by Sigmund on Unsplash


Clustering is a technique that groups data points with similar characteristics. These groupings are useful for exploring data, identifying patterns, and analyzing a subset of data.

Organizing data into clusters helps identify underlying structures in the data and finds applications across many industries. Some common business use cases for clustering are:

  • Customer segmentation for the purpose of marketing.
  • Customer purchasing behavior analysis for promotions and discounts.
  • Identifying geo-clusters in an epidemic outbreak such as COVID-19.

Types of Clustering

Given the subjective nature of clustering tasks, there are various algorithms that suit different types of problems. Each algorithm has its own rules and the mathematics behind how clusters are calculated.

This tutorial is about implementing a clustering analysis in Power BI using a Python library called PyCaret. Discussion of the specific algorithmic details and mathematics behind these algorithms is out-of-scope for this tutorial.

Ghosal A., Nandy A., Das A.K., Goswami S., Panday M. (2020) A Short Review on Different Clustering Techniques and Their Applications.

In this tutorial, we will use a K-Means algorithm which is one of the simplest and most popular unsupervised machine learning algorithms. If you would like to learn more about K-Means, you can read this paper.

Setting up Python

If you have used Python before, it is likely that you already have Anaconda Distribution installed on your computer. If not, click here to download Anaconda Distribution with Python 3.10.

Setting up the Environment

Before we start using PyCaret’s machine learning capabilities in Power BI we have to create a virtual…

Moez Ali

Data Scientist, Founder & Creator of PyCaret

Recommended from Medium


See more recommendations