Python Exploratory Data Analysis (EDA) libraries

Moez Ali
7 min readJun 4, 2022
Photo by Luke Chesser on Unsplash

Introduction

A typical machine learning workflow consists of six critical tasks that are important to the success of the project.

  1. Defining Problem
  2. Data Acquisition and ETL
  3. Exploratory Data Analysis
  4. Data Preparation
  5. Modeling (Model training and selection)
  6. Deployment and Monitoring
Machine Learning Life Cycle — Image by Author

Exploratory Data Analysis

Exploratory Data Analysis is a process of performing initial investigations on data so as to discover patterns, identify anomalies, test business hypotheses, and test assumptions with the help of statistical summary and visualizations. In short, the process of getting to know your data in depth is called Exploratory Data Analysis.

There are three ways you can do EDA:

  • Using libraries/frameworks in Python / R
  • Using automated EDA libraries in Python / R
  • Using licensed softwares such as Microsoft Power BI or Tableau, etc.

This article will provide a list of libraries and automated EDA frameworks in Python. The list in this article has no particular order and is not intended to be seen in any way as a type of ranking.

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. A large number of packages extend and build on Matplotlib’s functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, etc.)

It allows the user to visualize data using a variety of different types of plots, including but not limited to scatterplots, histograms, bar charts, error charts, and boxplots. in just a few lines of code.

Learning Matplotlib is easy. You can get started with the official user guide, tutorials, or some amazing code examples.