PyData: Past, Present Future (PyData SV 2014 Keynote)

1,561 views
1,456 views

Published on

From the closing keynoteLook back at the last two years of PyData, discussion about Python's role in the growing and changing data analytics landscape, and encouragement of ways to grow the community

Published in: Data & Analytics, Technology

PyData: Past, Present Future (PyData SV 2014 Keynote)

  1. 1. PyData: Past, Present, Future Peter Wang @pwang ! Continuum Analytics ! PyData SV 2014
  2. 2. How did we get here?
  3. 3. “Python Data Workshop” March 3, 2012, Google HQ
  4. 4. “Guido, please help us convince core dev to work with us to solve the packaging problem!”
  5. 5. “Guido, please help us convince core dev to work with us to solve the packaging problem!” “Meh. Feel free to solve it yourselves.”
  6. 6. “Guido, please help us convince core dev to work with us to solve the packaging problem!” “Meh. Feel free to solve it yourselves.”
  7. 7. “What Packaging Problem?”
  8. 8. “What Packaging Problem?” “I just use….”
  9. 9. “What Packaging Problem?” “I just use….” • pip & virtualenv
  10. 10. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew
  11. 11. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm
  12. 12. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get
  13. 13. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge
  14. 14. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf
  15. 15. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI
  16. 16. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install
  17. 17. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install • export PYTHONPATH=…
  18. 18. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install • export PYTHONPATH=…
  19. 19. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install • export PYTHONPATH=… from python import ! technical_debt
  20. 20. This Packaging Problem
  21. 21. This Packaging Problem
  22. 22. This Packaging Problem
  23. 23. This Packaging Problem
  24. 24. This Packaging Problem
  25. 25. PyData: The First 2 Years • Oct 2012: First PyData Conf, NYC ! • March 2013: PyData SV (PyCon) • July 2013: PyData Boston (Microsoft) • Oct 2013: PyData NYC (JP Morgan) ! • Feb 2014: PyData UK (Level39) • May 2014: PyData SV (Facebook) • July 2014: PyData Berlin (EuroPython) • October 2014: NYC (Strata NYC) ! • October 2014: NYC (YOUR COMPANY HERE)
  26. 26. PyData: The First 10 years
  27. 27. PyData: The First 10 years • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006
  28. 28. PyData: The First 15 Years • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • SciPy: 1999 • IPython: 2001 • matplotlib: 2002
  29. 29. PyData: The First 15 Years • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • SciPy: 1999 • IPython: 2001 • matplotlib: 2002 http://numfocus.org/johnhunter.html
  30. 30. PyData: The First 20 Years • Numarray: 2001 • Numeric: 1995 • Matrix Obj: 1994 • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • IPython: 2001 • matplotlib: 2002
  31. 31. Way Way Back
  32. 32. Way Way Back • python: 1989-1991
  33. 33. Way Way Back • python: 1989-1991 • v1.0: 1994
  34. 34. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL…
  35. 35. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL… …That would appeal to UNIX/C hackers”
  36. 36. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL… …That would appeal to UNIX/C hackers” $ conda create -n py10 python=1.0
  37. 37. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL… …That would appeal to UNIX/C hackers” http://continuum.io/blog/python-1.0 $ conda create -n py10 python=1.0
  38. 38. Way Way Back It is interactive, structured, high-level, and intended to be used instead of BASIC, Pascal, or AWK. ! It is not meant to be a systems-programming language but is intended for teaching or prototyping.
  39. 39. “In June [1960] we were introduced to this tall college kid that always signed his name with lowercase letters. He was don knuth … don claimed that he could write the [Algol] compiler and a language manual all by himself during his three and a half month summer vacation.”
  40. 40. PyData NYC 2013 Keynote
  41. 41. PyData NYC 2013 Keynote
  42. 42. PyData NYC 2013 Keynote
  43. 43. http://tuulos.github.io/sf-python-meetup-sep-2013/#/ “One of the most exciting features in development is the Numba-based UDF compiler. Building UDFs for Impala currently requires writing C++ or Java code and registering them manually with the cluster. Writing C++/Java code is more difficult, time-consuming, and error- prone for many data analysts.” http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/
  44. 44. http://grokbase.com/t/python/python-list/01az9hmtf1/python-development-practices
  45. 45. http://grokbase.com/t/python/python-list/01az9hmtf1/python-development-practices
  46. 46. Glue 2.0 Python’s legacy as a powerful glue language • manipulate files • call fast libraries ! Next-gen Glue: • Link data silos • Link disjoint memory & compute • Unify disparate runtime models • Transcend legacy models of computers
  47. 47. Hard Problems in Data Science Lots of data Messy data Noisy data
  48. 48. Hard Problems in Data Science Lots of data Messy data Noisy data Lots of computers Lots of tools Lots of hacking
  49. 49. Hard Problems in Data Science Lots of data Messy data Noisy data Lots of computers Lots of tools Lots of hacking More questions More data More people
  50. 50. The Hype & The Opportunity “Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition.
  51. 51. The Hype & The Opportunity “Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition. “Data Revolution” True Believer, 2014: Businesses that build data comprehension into their core will destroy their competition over the next 5-15 years.
  52. 52. The Hype & The Opportunity “Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition. “Data Revolution” True Believer, 2014: Businesses that build data comprehension into their core will destroy their competition over the next 5-15 years. (1993 == 2011?)
  53. 53. Soft Problems in Data Science
  54. 54. Soft Problems in Data Science Computers EE
  55. 55. Soft Problems in Data Science Computers EE Applications CS
  56. 56. Soft Problems in Data Science Computers EE Applications CS DATA Insights Math, Stats
  57. 57. Computers Applications Data Insights
  58. 58. Computers Applications Data Insights
  59. 59. Computers DATA Applications DataScientist
  60. 60. 2013 Data Science Salary Survey! http://www.oreilly.com/data/free/stratasurvey.csp
  61. 61. “Python is the second best language…” ...Because it blurs the lines between “user” and “maker”. ! We stand on the shoulders of Users who became Makers. ! Some people say: “R has a very strong user community.” ! I want people to say that “Python has a strong maker community.”
  62. 62. Standing Tall
  63. 63. Standing Tall • Science: Standing on the shoulders of giants
  64. 64. Standing Tall • Science: Standing on the shoulders of giants • Programming: Standing on each others toes
  65. 65. Standing Tall • Science: Standing on the shoulders of giants • Programming: Standing on each others toes • But in Python, we stand on each others’ shoulders - community that bootstraps itself
  66. 66. “For there is but one veritable problem - the problem of human relations…” —Antoine de Saint-Exupéry
  67. 67. https://archive.org/details/Scipy2010-PeterWang-PythonEvangelism101
  68. 68. Participate • Submit issues and pull requests • Represent for the tools you love in social media conversations • Start PyData meetups • Come to PyData conferences and present • Encourage diversity!!
  69. 69. How did we get here? • Hard Work • By a community of people • Who cared • About code and people
  70. 70. Where do we go from here? • More hard work • More community • More caring • More code • More people Python is not just glue. Python and PyData are communities!
  71. 71. Where do we go from here? • More hard work • More community • More caring • More code • More people Python is not just glue. Python and PyData are communities!

×