Analyzing mlb data with ggplot

4,869 views

Published on

Making basic, good-looking plots in Python is tough. Matplotlib gives you great control, but at the expense of being very detailed. The rise of pandas has made Python the go-to language for data wrangling and munging but many people are still reluctant to leave R because of its outstanding data viz packages.


ggplot is a port of the popular R package ggplot2 into Python. It provides a high level grammar that allow users to quickly and easily make good looking plots. An example may be found here:
http://blog.yhathq.com/posts/ggplot-for-python.html

Greg will show you how to use ggplot to analyze data from the MLB's open data source, pitchf/x. He will take you through the basics of ggplot and show how easy it is to create histograms, plot smoothed curves, customize colors & shapes.

http://www.meetup.com/PyData-Boston/events/184382092/

Published in: Technology, Education
1 Comment
7 Likes
Statistics
Notes
No Downloads
Views
Total views
4,869
On SlideShare
0
From Embeds
0
Number of Embeds
2,593
Actions
Shares
0
Downloads
54
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Analyzing mlb data with ggplot

  1. 1. analyzing MLB data with ggplot Greg Lamp
  2. 2. ggplot ● What is it? ● Alternatives ● How it works ● Why should I use it? ● Brief case study ● Questions
  3. 3. Here I am on the Internet. Founder/CTO @ Yhat Hi, I’m Greg!
  4. 4. What is ggplot?
  5. 5. DSL for graphics
  6. 6. DSL for graphics scatterplot histogram labels color shape
  7. 7. What about matplotlib?
  8. 8. a quick example
  9. 9. matplotlib ggplot
  10. 10. it’s not all bad!
  11. 11. matplotlib syntax, api, default themes, learning curve
  12. 12. matplotlib maturity, ipython, customization, community syntax, api, default themes, learning curve
  13. 13. What about d3.js?
  14. 14. d3.js
  15. 15. ggplot
  16. 16. ggplot d3.js
  17. 17. How it works
  18. 18. Format
  19. 19. ggplot
  20. 20. data frame
  21. 21. “aesthetics”
  22. 22. Aesthetics
  23. 23. color
  24. 24. shape
  25. 25. size
  26. 26. ...fill, alpha, slope, intercept, ymin, ymax, ...
  27. 27. Geoms, Stats, & Scales
  28. 28. geom_point
  29. 29. geom_area
  30. 30. ...there are many
  31. 31. stat_smooth
  32. 32. ...there are a few
  33. 33. scale_color_brewer
  34. 34. scale_color_gradient
  35. 35. ...there are many
  36. 36. Layers
  37. 37. ggplot()
  38. 38. + ggplot() geom_point()
  39. 39. + + ggplot() geom_point() stat_smooth()
  40. 40. + + ggplot() geom_point() stat_smooth()+ +
  41. 41. ggplot() + geom_point() + stat_smooth()
  42. 42. Why is this good?
  43. 43. Makes “reasonable assumptions”
  44. 44. not real colors
  45. 45. matplotlib freaks
  46. 46. still not real colors ...but i can guess what you mean
  47. 47. Concise yet expressive
  48. 48. Looks pretty good (and is easy to customize)
  49. 49. Seaborngithub.com/mwaskom/seaborn
  50. 50. Case Study
  51. 51. pitch speed
  52. 52. 103.4 mph
  53. 53. Load ggplot and pandas
  54. 54. Read in our pitch f/x data
  55. 55. define the x- axis pass in your data frame
  56. 56. add a histogram
  57. 57. How does fatigue impact velocity?
  58. 58. ...not helpful
  59. 59. What about at the individual level?
  60. 60. Justin Verlander
  61. 61. ggplot let’s you fail quicker
  62. 62. Finding Help
  63. 63. /tagged/python-ggplot
  64. 64. http://ggplot.yhathq.com
  65. 65. What’s next?
  66. 66. Thanks! @theglamp greg@yhathq.com

×