analyzing MLB data with
ggplot
Greg Lamp
ggplot
● What is it?
● Alternatives
● How it works
● Why should I use it?
● Brief case study
● Questions
Here I am on
the Internet.
Founder/CTO @ Yhat
Hi, I’m Greg!
What is
ggplot?
DSL for graphics
DSL for graphics
scatterplot
histogram
labels
color
shape
What about
matplotlib?
a quick example
matplotlib ggplot
it’s not all bad!
matplotlib
syntax, api,
default themes,
learning curve
matplotlib
maturity, ipython,
customization, community
syntax, api,
default themes,
learning curve
What about
d3.js?
d3.js
ggplot
ggplot d3.js
How it works
Format
ggplot
data frame
“aesthetics”
Aesthetics
color
shape
size
...fill, alpha, slope,
intercept, ymin,
ymax, ...
Geoms,
Stats, &
Scales
geom_point
geom_area
...there are many
stat_smooth
...there are a few
scale_color_brewer
scale_color_gradient
...there are many
Layers
ggplot()
+
ggplot() geom_point()
+ +
ggplot() geom_point() stat_smooth()
+ +
ggplot() geom_point() stat_smooth()+ +
ggplot() +
geom_point() +
stat_smooth()
Why is this
good?
Makes “reasonable
assumptions”
not real colors
matplotlib freaks
still not real colors
...but i can guess
what you mean
Concise yet
expressive
Looks pretty good
(and is easy to customize)
Seaborngithub.com/mwaskom/seaborn
Case Study
pitch speed
103.4 mph
Load ggplot and pandas
Read in our pitch f/x data
define the x-
axis
pass in your data frame
add a histogram
How does fatigue
impact velocity?
...not helpful
What about at the
individual level?
Justin
Verlander
ggplot let’s you
fail quicker
Finding Help
/tagged/python-ggplot
http://ggplot.yhathq.com
What’s next?
Thanks!
@theglamp
greg@yhathq.com
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Analyzing mlb data with ggplot
Upcoming SlideShare
Loading in...5
×

Analyzing mlb data with ggplot

4,190

Published on

Making basic, good-looking plots in Python is tough. Matplotlib gives you great control, but at the expense of being very detailed. The rise of pandas has made Python the go-to language for data wrangling and munging but many people are still reluctant to leave R because of its outstanding data viz packages.


ggplot is a port of the popular R package ggplot2 into Python. It provides a high level grammar that allow users to quickly and easily make good looking plots. An example may be found here:
http://blog.yhathq.com/posts/ggplot-for-python.html

Greg will show you how to use ggplot to analyze data from the MLB's open data source, pitchf/x. He will take you through the basics of ggplot and show how easy it is to create histograms, plot smoothed curves, customize colors & shapes.

http://www.meetup.com/PyData-Boston/events/184382092/

Published in: Technology, Education
1 Comment
7 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,190
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
42
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Analyzing mlb data with ggplot

  1. 1. analyzing MLB data with ggplot Greg Lamp
  2. 2. ggplot ● What is it? ● Alternatives ● How it works ● Why should I use it? ● Brief case study ● Questions
  3. 3. Here I am on the Internet. Founder/CTO @ Yhat Hi, I’m Greg!
  4. 4. What is ggplot?
  5. 5. DSL for graphics
  6. 6. DSL for graphics scatterplot histogram labels color shape
  7. 7. What about matplotlib?
  8. 8. a quick example
  9. 9. matplotlib ggplot
  10. 10. it’s not all bad!
  11. 11. matplotlib syntax, api, default themes, learning curve
  12. 12. matplotlib maturity, ipython, customization, community syntax, api, default themes, learning curve
  13. 13. What about d3.js?
  14. 14. d3.js
  15. 15. ggplot
  16. 16. ggplot d3.js
  17. 17. How it works
  18. 18. Format
  19. 19. ggplot
  20. 20. data frame
  21. 21. “aesthetics”
  22. 22. Aesthetics
  23. 23. color
  24. 24. shape
  25. 25. size
  26. 26. ...fill, alpha, slope, intercept, ymin, ymax, ...
  27. 27. Geoms, Stats, & Scales
  28. 28. geom_point
  29. 29. geom_area
  30. 30. ...there are many
  31. 31. stat_smooth
  32. 32. ...there are a few
  33. 33. scale_color_brewer
  34. 34. scale_color_gradient
  35. 35. ...there are many
  36. 36. Layers
  37. 37. ggplot()
  38. 38. + ggplot() geom_point()
  39. 39. + + ggplot() geom_point() stat_smooth()
  40. 40. + + ggplot() geom_point() stat_smooth()+ +
  41. 41. ggplot() + geom_point() + stat_smooth()
  42. 42. Why is this good?
  43. 43. Makes “reasonable assumptions”
  44. 44. not real colors
  45. 45. matplotlib freaks
  46. 46. still not real colors ...but i can guess what you mean
  47. 47. Concise yet expressive
  48. 48. Looks pretty good (and is easy to customize)
  49. 49. Seaborngithub.com/mwaskom/seaborn
  50. 50. Case Study
  51. 51. pitch speed
  52. 52. 103.4 mph
  53. 53. Load ggplot and pandas
  54. 54. Read in our pitch f/x data
  55. 55. define the x- axis pass in your data frame
  56. 56. add a histogram
  57. 57. How does fatigue impact velocity?
  58. 58. ...not helpful
  59. 59. What about at the individual level?
  60. 60. Justin Verlander
  61. 61. ggplot let’s you fail quicker
  62. 62. Finding Help
  63. 63. /tagged/python-ggplot
  64. 64. http://ggplot.yhathq.com
  65. 65. What’s next?
  66. 66. Thanks! @theglamp greg@yhathq.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×