Exploratory Data Analysis | EDA

Written by
Caleb Hayes
Updated on:May-30th-2025

 

Exploratory data analysis is the process of getting the raw data and then using technology to help you understand it better, extract the 'good features', and build preliminary models.

 

This article will introduce how data is categorized and specific methods for visualizing different types of data.

 

What is Exploratory Data Analysis?

When it comes to basketball, everyone knows that height and arm span are key characteristics of athletes.

What about handball? I'm sure most people can't tell.

When you encounter an area that you are unfamiliar with, you need to quickly gain some understanding of the area.

There are 2 ways to help us understand unfamiliar territory:

  1. Consult an industry insider. Senior industry insiders will pass on some of their experience.

  2. Go and study the data in the unfamiliar field. We can take the physical data and performance data of handball players and do an analysis to see what characteristics the best handball players have. Without any experience in the industry, there are some discoveries that can be made through insights into the data.

 

The second path above is: Exploratory Data Analysis | Exploratory Data Analysis | EDA

 

Exploratory Data Analysis is a data analysis methodology and philosophy that explores the internal structure and patterns of data using a variety of techniques (mostly using data visualization).

The purpose of exploratory data analysis is to gain as much insight into the data set as possible, discover the internal structure of the data, extract important features, detect outliers, test basic hypotheses, and build preliminary models.

 

3-Step Approach to Exploratory Data Analysis

 

The process of exploratory data analysis is broadly divided into 3 steps:

  1. Data categorization

  2. Data Visualization

  3. Insight into the data

 

Step 1: Data Classification

When we get the data, the first step is to categorize that data and then use different methods for different types of data.

Data can be categorized from coarse to fine in the following way:

 

Structured Data VS Unstructured Data

Structured data: Data that can be organized in tables is considered structured data.

For example: Data in Excel, data in MySQL...

Unstructured data: Any data that is not organized in tables is considered unstructured.

For example: text, images, video...

 

Quantitative Data VS Qualitative Data

Quantitative data: numerical type, measuring the quantity of something.

Example: 1985

Qualitative data: categories that describe the nature of something.

Example: 80

 

4 Levels of Data

Norminal level: is the first level of data and has the weakest structure. It only needs to be categorized by name.

For example: blood type (A, B, AB, O), name, color.

Ordinal level: Ordinal level adds a natural ordering on the basis of ordinal level, so that we can compare different data.

For example: restaurant's star rating, company's appraisal level.

Interval level: Interval levels must be of numerical type, and these values ​​can be used not only for sorting, but also for addition and subtraction.

For example: degrees Fahrenheit, degrees Celsius (temperatures have negative numbers and cannot be multiplied or divided).

Ratio level: Based on the fixed-ratio level, absolute zero is added, which can be used not only for addition and subtraction, but also for multiplication and division.

For example: money, weight

 

Step 2: Data Visualization

In order to gain better insight into the data, we can visualize the data so that we can better observe its characteristics of the data.

There are several commonly used data visualizations below:

 

The 4 data levels above need to correspond to different visualization methods, and a table has been compiled below to help you better choose a visualization solution.

Here are some basic visualization options. In the actual application, there will be more complex, combined charts that can be used.

Use.

Data Level

Attribute

Descriptive Statistics

Charts

Classification

Discrete, Unordered

Frequency Percentage, Plurality

Bar charts, pie charts

Ordered

Ordered categories, comparisons

Frequency, Plurality, Median, Percentile

Bar charts, pie charts

spacing

Meaningful differences in numbers

Frequency, plural, median, mean, standard deviation

Bar graphs, pie charts, and box plots

fixed ratio

Continuous

Mean, standard deviation

Bar graph, curve, pie chart, box plot

 

 

Step 3: Insight into the data

Visualization of data can help us gain better insight into the data, we can more efficiently find out which data is more important, the possible relationship between different data, and which data will affect each other...

The reason why it is called exploratory data analysis is that there is no fixed routine, so there is nothing to talk about in this step.

 

Summary

Exploratory data analysis uses a variety of technical means (most of them use data visualization) to explore the internal structure of the data and the laws of data analysis methods and concepts.

The process of exploratory data analysis is roughly divided into 3 steps:

  1. Data classification

  2. Data Visualization

  3. Data Insight