Tagged: Data science, Data visualization, Matplotlib, Python
- This topic has 0 replies, 1 voice, and was last updated 1 year ago by
Idowu.
- AuthorPosts
- February 7, 2020 at 7:16 am #85354Participant@idowu
There are several ways of visualizing data. Presenting them on a bar graph is one nice way of telling stories about them.
I will therefore share the techniques through which you can make bar graphs with the Python’s matplotlib library.
Matplotlib is basically a library which is used for making 2-D plots with python. Apart from bar graphs, it houses several other plot types, which are normally used for visualization in everyday life.
Matplotlib however offers us an advantage of being able to customize plots in order to suite the purpose for which they are meant, which is usually to be able to communicate what they represent in a layman’s term.
When to use a Bar Graph
Bar graphs are used when the aim is to visualize the characteristic abundance of categorical or discrete variables. For instance, they are particularly useful when you intend to monitor changes or fluctuations that are occurring over time.
Good examples of such usefulness will be in the value count of the frequency or percentage occurrence of categorical variables such as gender and age range or discrete variables like number of movie watch times per month or number of books read over time.
It’s also used in multiple construction projects or site works – to monitor and compare activities and performance of different sites, in public health – to monitor disease prevalence trend, to compare the prevalence of different geographical locations and also to assess and identify hot zones. There are several other applications of bar graphs. Those are just a few of them.
One of the axes of a bar graph hold continuous values which are attached to the other(s) (usually a discrete or categorical variable) in a relative fashion, with the continuous axis standing as a measure of the other(s) (categorical or discrete).
Types of Bar Graphs
Depending on intention, there are different types of bar graphs, each of which serves specific purposes.
The three main bar graphs include:
- Single or simple bar graph
- Multiple bar graph
- Component or stacked bar graph
Single Bar Graph
A single bar graph, as the name imply consist of single bars, each representing a discrete or categorical variable and pointing to their respective values on the axis which holds a continuous measure of those categorical or discrete variables.
I will be examining a data set which contains the sales of a product over a period of 6 months (January – June). You can as well always create yours with Microsoft Excel as we move on in this tutorial and save it on your machine or you could even work with a more real data, if you have one at your disposal.
123456"""Import the necessary libraries"""import pandas as pdimport matplotlib.pyplot as pltdf=pd.read_excel(r'Sales_month.xlsx')print(df)123456789101112131415"""Declare the size of your image"""ax=plt.figure(figsize=(8, 8))plt.xlabel('Months')plt.ylabel('Sales')"""Define your X and Y axes"""x=df.Monthsy=df.Sales"""Set the orientation of the X-axis to save space"""plt.xticks(rotation=90)"""Make the plot"""plt.bar(x, y)plt.show()By way of visualization, the month with the highest sales was obviously May, followed by January, March, June, February and April in that order.
First off, after loading the data set, we declared the size of the output image by using the plt.figure function. We then set the labels for the X- and Y-axes.
We also went ahead to define the variables x and y and we used the
plt.xticks()
to set the degree of orientation of the months to 90 degrees, if you so wish, yours can be at 45 degrees, this was only done in order to manage the available space on the X-axis.Finally, the bar was made with
plt.bar()
class and it was displayed withplt.show()
.We can further beautify the bars by giving each of them a unique colour with these additional lines of codes:
123456"""State the colors you want in a variable"""colors=('blue', 'red', 'yellow', 'indigo', 'cyan', 'magenta')"""Apply the colour variable to each bar"""plt.bar(x, y, color=colors)plt.show()Multiple Bar Graph
Multiple bar graphs are useful when you intend to compare more than one variable at a time. For instance, if the sales of different products are to be compared over a period of six months and you must use the bar graph, then, a multiple bar graph will be a perfect fit.
Let’s see how we can achieve this with the following few lines of codes:
123456"""Import libraries"""import pandas as pdimport numpy as npimport matplotlib.pyplot as pltdf=pd.read_excel(r'Sales_month.xlsx')print(df)12345678910111213141516171819202122232425"""declare the size of your image"""ax=plt.figure(figsize=(8, 8))plt.xlabel('Months')plt.ylabel('Sales')"""Define your X and Y axes"""x=df.Months"""Your Ys will be"""product_1=df.Product_1product_2=df.Product_2product_3=df.Product_3"""Set the length on which other products should build upon"""e=np.arange(len(product_2))"""Set the orientation of the X-axis to save space"""plt.xticks(e+bar_width*2,['January', 'February', 'March', 'April', 'May', 'June'], rotation=90)bar_width=0.29plt.bar(e+bar_width, product_1, width=bar_width, color='orange', zorder=2)plt.bar(e+bar_width*2, product_2, width=bar_width, color='magenta', zorder=2)plt.bar(e+bar_width*3, product_3, width=bar_width, color='red', zorder=2)plt.show()Although, the plot above has successfully clustered the 3 bars together to represent the three products, with each bar pointing to a value on the Y-axis, but it still doesn’t make much sense, because the bars representing each products are not yet set.
To solve the problem, we just need to create a legend, but the legend will be stored in patches by using the patches class of matplotlib
123456789"""Create a legend to show the bar representing each product"""import matplotlib.patches as mpatchesbrown_patch = mpatches.Patch(color='brown', label='Sales of product 1 in $')magenta_patch = mpatches.Patch(color='magenta', label='Sales of product 2 in $')red_patch = mpatches.Patch(color='red', label='Sales of product 3 in $')plt.legend(handles=[brown_patch, magenta_patch, red_patch], loc=(0, 1))plt.show()By visualizing the newly generated bar graph, you will not only be able to tell the story of the months with good sales for each product, you will also be able to compare the products and tell which performed better than the other.
For instance, generally, it will be right to conclude that; Product_2 sold better than other products, and that while Product_1 and Product_2 had their highest sales in May, Product_3 recorded its highest sale in April.
Apart from what we did earlier for the single bar graph, for the multiple bars, we classified the three products as the variables on the Y-axis, being continuous. We also made a variable ”e” (give it any other name if you wish) by using the
np.arange()
function to declare that all other products should use the length of Product_2 as a basis for their arrangement.We therefore set the ticks for the month as written in the code snippet, this was so that we could take control of the ticks without having any troubles, so, basically, we didn’t need
df.Month
any longer, since we’ve imitated it by creating one with the xticks.The bar width was also set and was iterated over each bar, this was done, so we could cluster the bars. Finally, the patches was used to define the property of each product, it was then applied on the legend located at the top of the graph.
Component or Stacked Bar Graph
In case you want to measure the number of bad and good commodities over a period of six months, you can then decide to use a component or a stacked bar graph.
To see how it works, let’s run through the following codes:
123456"""Import libraries"""import pandas as pdimport numpy as npimport matplotlib.pyplot as pltdf=pd.read_excel(r'Sales_month.xlsx')print(df)12345678910111213141516171819202122232425262728293031"""declare the size of your image"""ax=plt.figure(figsize=(8, 8))plt.xlabel('Months')plt.ylabel('Measure')plt.xticks(rotation=90)"""You can set the title of the graph:"""plt.title('Good and bad commodities')bar_width=0.34x=df.Months"""Your Ys will be"""bad=df.Badgood=df.Good"""Make the plot"""plt.bar(x, good, width=bar_width, color='g')plt.bar(x, bad, width=bar_width, bottom=df.Good, color='r')"""Provide a legend"""import matplotlib.patches as mpatchesred_patch = mpatches.Patch(color='red', label='Bad commodities')green_patch = mpatches.Patch(color='green', label='Good commodities')plt.legend(handles=[green_patch, red_patch], loc=(0, 1))plt.show()Take note that within
plt.bar()
, we set the bottom to bedf.Good
, by doing that, we’ve simply set the good commodities to be at the bottom of the bars. You can play around it by switching the bottom todf.Bad
.To interpret the chart, let’s take a look at the first bar together – we can conclude that 60 of the commodities were good, while 30 were bad.
Horizontal Component bar chart
In order to change the orientation of the bars to horizontal view, use
plt.barh()
and simply replace the bottom with left. Also, remember to switch the labels:12345678910111213"""Use plt.barh() and not plt.bar() the plot"""plt.barh(x, good, color='g')plt.barh(x, bad, left=df.Good, color='r')"""Provide a legend"""import matplotlib.patches as mpatchesred_patch = mpatches.Patch(color='red', label='Bad commodities')green_patch = mpatches.Patch(color='green', label='Good commodities')plt.legend(handles=[green_patch, red_patch], loc=(0, 1))plt.show()Summary
We’ve taken a look at how we can use the matplotlib to plot the three major bars which can be used to visualize real world situations.
- AuthorPosts
- You must be logged in to reply to this topic.