Data visualization with different Charts in Python

Data Visualization is the presentation of data in graphical format. It helps people understand the significance of data by summarizing and presenting huge amount of data in a simple and easy-to-understand format and helps communicate information clearly and effectively.

Consider this given Data-set for which we will be plotting different charts :


Different Types of Charts for Analyzing & Presenting Data

1. Histogram :
The histogram represents the frequency of occurrence of specific phenomena which lie within a specific range of values and arranged in consecutive and fixed intervals.

In below code histogram is plotted for Age, Income, Sales. So these plots in the output shows frequency of each unique value for each attribute.

# import pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt
# create 2D array of table given above
data = [['E001', 'M', 34, 123, 'Normal', 350],
        ['E002', 'F', 40, 114, 'Overweight', 450],
        ['E003', 'F', 37, 135, 'Obesity', 169],
        ['E004', 'M', 30, 139, 'Underweight', 189],
        ['E005', 'F', 44, 117, 'Underweight', 183],
        ['E006', 'M', 36, 121, 'Normal', 80],
        ['E007', 'M', 32, 133, 'Obesity', 166],
        ['E008', 'F', 26, 140, 'Normal', 120],
        ['E009', 'M', 32, 133, 'Normal', 75],
        ['E010', 'M', 36, 133, 'Underweight', 40] ]
# dataframe created with
# the above data array
df = pd.DataFrame(data, columns = ['EMPID', 'Gender'
                                    'Age', 'Sales',
                                    'BMI', 'Income'] )
# create histogram for numeric data
# show plot

Output :

2. Column Chart :
A column chart is used to show a comparison among different attributes, or it can show a comparison of items over time.

# Dataframe of previous code is used here
# Plot the bar chart for numeric values
# a comparison will be shown between
# all 3 age, income, sales
# plot between 2 attributes
plt.bar(df['Age'], df['Sales'])

Output :

3. Box plot chart :
A box plot is a graphical representation of statistical data based on the minimum, first quartile, median, third quartile, and maximum. The term “box plot” comes from the fact that the graph looks like a rectangle with lines extending from the top and bottom. Because of the extending lines, this type of graph is sometimes called a box-and-whisker plot. For quantile and median refer to this Quantile and median.

# For each numeric attribute of dataframe
# individual attribute box plot

Output :

4. Pie Chart :
A pie chart shows a static number and how categories represent part of a whole the composition of something. A pie chart represents numbers in percentages, and the total sum of all segments needs to equal 100%.

plt.pie(df['Age'], labels = {"A", "B", "C",
                             "D", "E", "F",
                             "G", "H", "I", "J"},
autopct ='% 1.1f %%', shadow = True)
plt.pie(df['Income'], labels = {"A", "B", "C",
                                "D", "E", "F",
                                "G", "H", "I", "J"},
autopct ='% 1.1f %%', shadow = True)
plt.pie(df['Sales'], labels = {"A", "B", "C",
                               "D", "E", "F",
                               "G", "H", "I", "J"},
autopct ='% 1.1f %%', shadow = True)

Output :

5. Scatter plot :
A scatter chart shows the relationship between two different variables and it can reveal the distribution trends. It should be used when there are many different data points, and you want to highlight similarities in the data set. This is useful when looking for outliers and for understanding the distribution of your data.

# scatter plot between income and age
plt.scatter(df['income'], df['age'])
# scatter plot between income and sales
plt.scatter(df['income'], df['sales'])
# scatter plot between sales and age
plt.scatter(df['sales'], df['age'])

Output :

This article is attributed to GeeksforGeeks.org

leave a comment



load comments

Subscribe to Our Newsletter