Introduction to Matplotlib for Data Analysis


Published on

This talk covers what matplotlib is, why use it and how to install it. We'll be covering simple examples that will allow you to get started, but also display the strengths of the matplotlib package. I spend most of my day writing queries using Jade and SQL. Here, we use matplotlib as part of my work for exploratory data analysis and display and explaination of complicated data. I presented this topic at my local linux group. This month someone showed me a graph they had done using matplotlib, after my talk.
10 slides of a open office presentation mixed with demonstration using pylab interactively and running a script to output a chart. - Introduction: Why I use matplotlib. - Installation: Which packages required and where to obtain them. - Website: Url and description - Short display using pylab interactively: Start up pylab in ipython and bring up a simple scatter chart. Explaination of the functionality of the show window. - Basic bargraph script: Explaination of code. Difference between axes, axis and figure. - Adding labels: Adding title, axis labels, legend, ticks and ticklabels. - How to import data: Genfromtxt Splitting the imported data - Multiple plots on the same figure: Using Subplot Using Gridspec which requires matplotlib 1.0.0 - Twin Axes: How to plot two different datasets on the same plot with different scales.

[Presentation by Catherine Thwaites, uploaded by ewblen on behalf of kiwipycon]

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Matplotlib for Data Analysis

  1. 1. Introduction to Matplotlib for Data Analysis <ul><li>What is matplotlib?
  2. 2. Why do I use it? </li></ul>
  3. 3. Installation What you need is: Python version 2.5, 2.6 or 2.7 Numpy version 1.3+ Matplotlib version 1.0.1 Linux Matplotlib 0.99 is the latest in the debian repositaries, Latest version 1.0.1. needs to be installed from source. Instructions Windows Download and install.
  4. 4. Website for documentation Gallery has large number of examples.
  5. 5. Ways to run matplotlib <ul><li>Interactively using pylab and ipython
  6. 6. Interactively in shell
  7. 7. File
  8. 8. As part of a larger program </li></ul>catherine@catherine-HP-Mini-110-3100:~$ ipython -pylab Interactively using pylab in ipython Imports modules required to plot in one namespace Chart is updated as you enter commands
  9. 9. In [1]: plot([1,2,3,4],[56,45,58,32]) Out[1]: [<matplotlib.lines.Line2D object at 0xa9a3a2c>] Show Window Save in various open formats Change plot size in window Zoom to inspect Pan to move along
  10. 10. Simple Bargraph Using bar import numpy as np import matplotlib.pyplot as plt data1=[12,23,38,42,41] fig = plt.figure(1,(6,6)) fig.clf() ax = fig.add_subplot(111) ind = np.arange(len(data1)) rects =, data1, width=0.75, color='thistle')
  11. 11. Add title ax.set_title('Simple bar graph', size=20) Change the plot range ax.set_ylim(0,180) Axis labels ax.set_xlabel('Data',size=14) ax.set_ylabel('Places'size=14) Axis ticks and labels ax.set_xticks(ind+0.5) labels = ['west','east','centre', 'north','south'] ax.set_xticklabels(labels, size=14) Add bar labels def bar_label(rects): above = 1.05 * min([r.get_height() for r in rects]) for rect in rects: height = rect.get_height() ax.text(rect.get_x()+rect.get_width()/2., 1.05*height, '%d'%int(height), ha='center', va='bottom') bar_label(rects) Titles and labels
  12. 12. Side by Side, data1, width=0.25, color='pink', label='A1'), data2, width=0.25, color='thistle', label='A2'), data3, width=0.25, color='salmon', label='A3') ax.legend(loc='upper left' ) Cumulative rects1 =, data1, width=0.75, color='lightblue', label='A1') rects2 =, data2, width=0.75, bottom=data1, color='thistle', label='A2') ax.legend(loc='upper left') Two datasets on same axes
  13. 13. Importing Data Using numpy genfromtxt import numpy as np infile = open(&quot;data.csv&quot;, &quot;r&quot;) data = np.genfromtxt(infile, delimiter=&quot;,&quot;, dtype=(&quot;S20,S20,f8&quot;), names=True) infile.close() – - Split into Colours yellow = data[data['Colour']=='Yellow'] blue = data[data['Colour']!='Yellow'] – - plot histogram of data fig = plt.figure(1, figsize=(12,8)) ax = fig.add_subplot(111) ax.hist(yellow['Length'], color='gold') ax.tick_params('both',labelsize=16)
  14. 14. Multiple plots on the same figure Using add_subplot fignum ax1 = fig.add_subplot(231) ax2 = fig.add_subplot(232) ax3 = fig.add_subplot(233) ax4 = fig.add_subplot(234) ax5 = fig.add_subplot(235) ax6 = fig.add_subplot(236) ax1.plot([12,13,25.5,15.2,19], 'bo-') ax2.plot([13,18.5,1.5,2,21], 'ro-') ax3.plot([10,12,11.5,16,23], 'go-') ax4.plot([6,11,5,12,21,32], 'ko-') ax5.plot([1.9,13,19.5,16.2,5], 'mo-') ax6.plot([13,13.2,26,18,14], 'yo-') numrows numcols Use to compare measurements across different categories
  15. 15. Multiple plots on the same figure Using Gridspec fig = plt.figure(1,(6,6)) gs = gridspec.GridSpec(3, 2, width_ratios=[1,1], height_ratios=[1,1,2], hspace=0.2,bottom=0.1) ax1 = fig.add_subplot(gs[0,0]) ax2 = fig.add_subplot(gs[0,1]) ax3 = fig.add_subplot(gs[1,0]) ax4 = fig.add_subplot(gs[1,1]) ax5 = fig.add_subplot(gs[2,:]) 3 by 2 grid Double height for bottom row <ul><li>Easier to use for complicated plot layouts </li></ul>Span bottom row
  16. 16. Multiple datasets on the same axes Using Twin Axes import numpy as np import matplotlib.pyplot as plt fig = plt.figure() ax = fig.add_subplot(111) twin_ax = ax.twinx() sales = [45,69,60,67] returns = [82,91,89,78.5] ind = np.arange(len(sales)) rects1 =, sales, width=0.75, color='thistle') p1 = twin_ax.plot(ind+0.5, returns,'gs-') ax.set_ylim(0, 75) twin_ax.set_ylim(0,100) ax.set_xticks(ind+0.5) ax.set_xticklabels(['North','South','East','West']) ax.set_ylabel('Sales') twin_ax.set_ylabel('% Returned') ax.set_title('Sales v Returns') plt.figlegend( (rects1[0], p1), ('Sales', '% Returned'), loc='upper left')
  17. 17. Questions?