Tagged: Data science, Data visualization, Matplotlib, Python
- This topic has 0 replies, 1 voice, and was last updated 2 years, 5 months ago by
Idowu.
- AuthorPosts
- February 9, 2020 at 3:12 pm #85515Spectator@idowu
In my recent article, I outlined the step by step methods of creating and customizing the different bar graphs, using the matplotlib module of Python – which we defined as a library that houses most of the 2D plots available in python.Few months back, I wanted to be able to tell some good stories with line plots and pie charts using matplotlib and I fell upon some easy to use strategies with few lines of codes, so, I decided to share them in this article.
Line Plots
Line plots are great ways of telling nice stories about data and visualizing trends in them. As the name implies, they are lines which show the characteristic changes of any variable (categorical or discrete) with respect to certain measurements, which are usually continuous.
Instances where they’ve been applied to solve real world problems include but not limited to; disease prevalence monitoring and study, in research – to measure the performance of equipment over time.
It’s also been used in comparative studies, where the aim is to compare the changes occurring in two or more variables at a point in time.
In finance and business, it’s being used to draw insights in market intelligence – such as telling stories about the buying rates of a cluster of customers, or of individual customers.
The application of line plots is however limitless, depending on the creativity of the story teller.
Please note that for this article, I’ll not be providing a data set, you can make them up on your own by simply using Microsoft Excel.
The first data we’ll work with is one which contains the buying rate of people over time at three different areas and our aim is to view these trends in just one of those areas.
Let’s now take a look how we can tell some easy to interpret stories with line plots:
import matplotlib.pyplot as plt import pandas as pd import numpy as np data = pd.read_excel(r'Buying_rate.xlsx') """Let’s view the structure of the data""" print(data)
"""Plotting a single line""" ax=plt.figure(figsize = (8, 8)) plt.xlabel('Area') plt.ylabel('Years (2010 - 2019)') years = ['2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019'] y = data.Area_A """Make the plot""" plt.plot(years, y, 'r', linewidth=5.0) plt.show()
Within the code snippet, we used the
plt.figure(figsize=())
to declare the size of the image we want to produce (you can play around that). We also set the labels for the X- and Y- axes, by using theplt.xlabel
andplt.ylabel
.We ignored the data.Years column because I didn’t want any year omitted on my years-axis, so, we made a list of the years, while we also declared y = data.Area_A, which is the Y- axis.
You may also decide to play around the linewidth by changing its values.
The buying trend of Area_A can easily be interpreted from the plot above.
Area-A experienced its lowest buying rate in 2014, with a close to average buying rate in 2011, while it peaked in 2016, but started declining afterwards until 2019.
Comparing Variables with Line Plots
Sometime you’ll be provided with a data set that contains the categorical variables on a single column against other parameters.
For instance, a data might house all the Areas A, B and C in a single column against Years and Buying_rates in their own separate columns.
We should take a look at such data:
import matplotlib.pyplot as plt import pandas as pd import numpy as np import matplotlib.patches as mpatches data = pd.read_excel(r'Buying_rate.xlsx') print(data)
Now, we’re going to work with the set of data above and make line plots from it. Each of the Areas will be sorted out and extracted as an independent DataFrame as you shall see in the following code snippets:
"""Extract each Area as an independent entity from the data set:""" Region1=data[data.Areas=='A'] print(Region1)
Region2=data[data.Areas=='B'] print(Region2)
Region3=data[data.Areas=='C'] print(Region3)
We’ve been able to isolate each of the region from the data, each of them now has a unique DataFrame that houses it.
Take note of how we isolated each region with the “==” sign, this is a more specific way of instructing Python to extract unique regions from the data.
We then move on to state all parameters for the Y-axis by calling them from each of the DataFrame we isolated earlier:
years = ['2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019'] """State the Y-axis from each of the isolated DataFrame:""" a=Region1.Buying_rates b=Region2.Buying_rates c=Region3.Buying_rates
The Region.Buying_rates function is how we instruct Python to locate the Buying_rates column in each DataFrame.
We can confirm if we’re right by using
print(Region1.Buying_rates)
function. This will give us the following output:Let’s move further by making our line plots:
"""Making the line plots:""" """Select a size for your figure:""" fig, ax=plt.subplots(figsize=(8, 6)) """Set the labels for the X- and Y-axes:""" plt.xlabel('Buying rates', size=17) plt.ylabel('Years', size=17) """Set the title for the plot:""" plt.title('Buying rate comparison', size=20) """Make the plot:""" plt.plot(years, a, 'b', years, b, 'c', years, c, 'r', linewidth=3.0) """Use the patches to store the legend, ensure colours are same as those in plt.plot():""" blue_patch = mpatches.Patch(color='blue', label='Buying rate of region 1') cyan_patch = mpatches.Patch(color='cyan', label='Buying rate of region 2') red_patch = mpatches.Patch(color='red', label='Buying rate of region 3') """Define what you want the legend to be like:""" plt.legend(handles=[blue_patch, cyan_patch, red_patch], loc=(0.8, -0.5)) """Print out the plots:""" plt.tight_layout() plt.show()
Take note of the line where we used the following code in the snippet:
plt.plot(years, a, 'b', years, b, 'c', years, c, 'r', linewidth=3.0)
– observe how we stacked each of the lines into one single plot inside a single parenthesis. Doing this makes our codes easier to read and understand.Furthermore, within the xlabel, ylabel and plt.title, we changed font sizes by using the function “size=” within the parenthesis.
We were also able to tweak the position of the legend by setting its location to be at the base of the plot.
Sometimes, you might actually want to eradicate the spines of the figure. This can be done by placing the following codes just anywhere before the
plt.show()
function:ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False)
By setting the visibility of ax.spines[] as False, you have successfully removed the spines.
Output:
From the plots generated, we can now make comparison between the three regions of concern and tell our stories about which performed better than the other(s) in term of buying rates.
Briefly looking at the output, you can easily conclude or infer that all the regions had rapid decline in buying rates after they all peaked in 2018.
The Pie Chart
Presenting information on a pie chart is also one of the best ways of gaining access to insights about a particular object or group of objects.
To learn about the different fractions of a population that’s behaving in a particular manner at a glance, without you having to look across board, a pie chart is usually the visualization of choice.
Let’s assume that we want to summarize the watch time per movie categories (Cartoons, Action movies, Scifi) for a movie cinema, we then need to access the watch time of each of the categories in order to draw insight to customer’s choice, so our client can serve them better. A good choice for this will also be a pie chart.
We will now take look at how this is implemented with matplotlib:
"""Import the following libraries:""" import matplotlib.pyplot as plt import numpy as np import pandas as pd import matplotlib.pyplot as plt """These are the values of the watch time per movie for the cinema""" Values=[586, 40, 53] """Define the labels""" labels=['Cartoons', 'Action movies', 'Scifi'] plt.figure(figsize=(8, 8)) colors=['red', 'green', 'magenta'] plt.title('Total number of watch time per movie category', size=20, color='brown') plt.pie(Values, colors=colors, startangle=10) plt.legend(labels) plt.show()
The startangle is just there to set the starting orientation for your pie chart, you can play around it to learn more about its purpose.
Although, we got a pie chart, but this is still not really communicating much sense yet, as the count and percentage of each watch time is missing from the plot.
Let’s make much sense of our pie chart by creating an auto-percent function:
"""Use the following function to create the count and auto-percent within the pie chart""" def make_autopct(Values): def my_autopct(pct): total= sum(Values) Val= int(round(pct*total/100)) return '{v:d} ({p:.2f}%) '.format(p=pct, v=Val) return my_autopct plt.pie(Values, colors=colors, shadow=True, autopct=make_autopct(Values), startangle=10) plt.legend(labels, loc=(0, 0.9)) plt.tight_layout() plt.show()
Within the function, we just summed up the values within the total variable and made a base percentage calculation for percentage in Val.
Finally, we instructed python to return the counts {v:d} and the percentage count for each {p:.2f}% in a parenthesis.
We also added shadow by declaring shadow=True. Ensure that you also include the atopct=make_autopct(Values) function within the plt.pie parenthesis.
We can still beautify the pie by making an explosion:
fig, ax=plt.subplots(figsize=(8, 6)) plt.pie(Values, colors=colors, shadow=True, explode=(0.1, 0.2, 0.3), autopct=make_autopct(Values), startangle=10) plt.legend(labels, loc=(-0.1, 0.8)) plt.tight_layout() plt.show()
To make the explosion, we only inserted the inbuilt function explode into the
plt.pie
parenthesis and we gave it specific values of instruction (you can tweak this on your own).Summary
We’ve just taken a look at the usefulness of line plots and pie charts, their applications to real life situations and how we can create customized visualizations with them using the matplotlib Python’s library.
- AuthorPosts
- You must be logged in to reply to this topic.