PyData: Past, Present Future (PyData SV 2014 Keynote)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

PyData: Past, Present Future (PyData SV 2014 Keynote)

  • 768 views
Uploaded on

From the closing keynoteLook back at the last two years of PyData, discussion about Python's role in the growing and changing data analytics landscape, and encouragement of ways to grow the community

From the closing keynoteLook back at the last two years of PyData, discussion about Python's role in the growing and changing data analytics landscape, and encouragement of ways to grow the community

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
768
On Slideshare
753
From Embeds
15
Number of Embeds
1

Actions

Shares
Downloads
18
Comments
0
Likes
2

Embeds 15

https://twitter.com 15

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. PyData: Past, Present, Future Peter Wang @pwang ! Continuum Analytics ! PyData SV 2014
  • 2. How did we get here?
  • 3. “Python Data Workshop” March 3, 2012, Google HQ
  • 4. “Guido, please help us convince core dev to work with us to solve the packaging problem!”
  • 5. “Guido, please help us convince core dev to work with us to solve the packaging problem!” “Meh. Feel free to solve it yourselves.”
  • 6. “Guido, please help us convince core dev to work with us to solve the packaging problem!” “Meh. Feel free to solve it yourselves.”
  • 7. “What Packaging Problem?”
  • 8. “What Packaging Problem?” “I just use….”
  • 9. “What Packaging Problem?” “I just use….” • pip & virtualenv
  • 10. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew
  • 11. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm
  • 12. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get
  • 13. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge
  • 14. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf
  • 15. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI
  • 16. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install
  • 17. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install • export PYTHONPATH=…
  • 18. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install • export PYTHONPATH=…
  • 19. “What Packaging Problem?” “I just use….” • pip & virtualenv • homebrew • rpm • apt-get • emerge • tar -zxf • double-click MSI • configure ; make ; make install • export PYTHONPATH=… from python import ! technical_debt
  • 20. This Packaging Problem
  • 21. This Packaging Problem
  • 22. This Packaging Problem
  • 23. This Packaging Problem
  • 24. This Packaging Problem
  • 25. PyData: The First 2 Years • Oct 2012: First PyData Conf, NYC ! • March 2013: PyData SV (PyCon) • July 2013: PyData Boston (Microsoft) • Oct 2013: PyData NYC (JP Morgan) ! • Feb 2014: PyData UK (Level39) • May 2014: PyData SV (Facebook) • July 2014: PyData Berlin (EuroPython) • October 2014: NYC (Strata NYC) ! • October 2014: NYC (YOUR COMPANY HERE)
  • 26. PyData: The First 10 years
  • 27. PyData: The First 10 years • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006
  • 28. PyData: The First 15 Years • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • SciPy: 1999 • IPython: 2001 • matplotlib: 2002
  • 29. PyData: The First 15 Years • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • SciPy: 1999 • IPython: 2001 • matplotlib: 2002 http://numfocus.org/johnhunter.html
  • 30. PyData: The First 20 Years • Numarray: 2001 • Numeric: 1995 • Matrix Obj: 1994 • IPython Notebook: 2005-2011 • pandas: 2008-2009 • scikit-learn: 2007 • NumPy: 2006 • IPython: 2001 • matplotlib: 2002
  • 31. Way Way Back
  • 32. Way Way Back • python: 1989-1991
  • 33. Way Way Back • python: 1989-1991 • v1.0: 1994
  • 34. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL…
  • 35. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL… …That would appeal to UNIX/C hackers”
  • 36. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL… …That would appeal to UNIX/C hackers” $ conda create -n py10 python=1.0
  • 37. Way Way Back • python: 1989-1991 • v1.0: 1994 • “ABC, SETL… …That would appeal to UNIX/C hackers” http://continuum.io/blog/python-1.0 $ conda create -n py10 python=1.0
  • 38. Way Way Back It is interactive, structured, high-level, and intended to be used instead of BASIC, Pascal, or AWK. ! It is not meant to be a systems-programming language but is intended for teaching or prototyping.
  • 39. “In June [1960] we were introduced to this tall college kid that always signed his name with lowercase letters. He was don knuth … don claimed that he could write the [Algol] compiler and a language manual all by himself during his three and a half month summer vacation.”
  • 40. PyData NYC 2013 Keynote
  • 41. PyData NYC 2013 Keynote
  • 42. PyData NYC 2013 Keynote
  • 43. http://tuulos.github.io/sf-python-meetup-sep-2013/#/ “One of the most exciting features in development is the Numba-based UDF compiler. Building UDFs for Impala currently requires writing C++ or Java code and registering them manually with the cluster. Writing C++/Java code is more difficult, time-consuming, and error- prone for many data analysts.” http://blog.cloudera.com/blog/2014/04/a-new-python-client-for-impala/
  • 44. http://grokbase.com/t/python/python-list/01az9hmtf1/python-development-practices
  • 45. http://grokbase.com/t/python/python-list/01az9hmtf1/python-development-practices
  • 46. Glue 2.0 Python’s legacy as a powerful glue language • manipulate files • call fast libraries ! Next-gen Glue: • Link data silos • Link disjoint memory & compute • Unify disparate runtime models • Transcend legacy models of computers
  • 47. Hard Problems in Data Science Lots of data Messy data Noisy data
  • 48. Hard Problems in Data Science Lots of data Messy data Noisy data Lots of computers Lots of tools Lots of hacking
  • 49. Hard Problems in Data Science Lots of data Messy data Noisy data Lots of computers Lots of tools Lots of hacking More questions More data More people
  • 50. The Hype & The Opportunity “Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition.
  • 51. The Hype & The Opportunity “Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition. “Data Revolution” True Believer, 2014: Businesses that build data comprehension into their core will destroy their competition over the next 5-15 years.
  • 52. The Hype & The Opportunity “Internet Revolution” True Believer, 1996: Businesses that build network capability into their core will outcompete and destroy their competition. “Data Revolution” True Believer, 2014: Businesses that build data comprehension into their core will destroy their competition over the next 5-15 years. (1993 == 2011?)
  • 53. Soft Problems in Data Science
  • 54. Soft Problems in Data Science Computers EE
  • 55. Soft Problems in Data Science Computers EE Applications CS
  • 56. Soft Problems in Data Science Computers EE Applications CS DATA Insights Math, Stats
  • 57. Computers Applications Data Insights
  • 58. Computers Applications Data Insights
  • 59. Computers DATA Applications DataScientist
  • 60. 2013 Data Science Salary Survey! http://www.oreilly.com/data/free/stratasurvey.csp
  • 61. “Python is the second best language…” ...Because it blurs the lines between “user” and “maker”. ! We stand on the shoulders of Users who became Makers. ! Some people say: “R has a very strong user community.” ! I want people to say that “Python has a strong maker community.”
  • 62. Standing Tall
  • 63. Standing Tall • Science: Standing on the shoulders of giants
  • 64. Standing Tall • Science: Standing on the shoulders of giants • Programming: Standing on each others toes
  • 65. Standing Tall • Science: Standing on the shoulders of giants • Programming: Standing on each others toes • But in Python, we stand on each others’ shoulders - community that bootstraps itself
  • 66. “For there is but one veritable problem - the problem of human relations…” —Antoine de Saint-Exupéry
  • 67. https://archive.org/details/Scipy2010-PeterWang-PythonEvangelism101
  • 68. Participate • Submit issues and pull requests • Represent for the tools you love in social media conversations • Start PyData meetups • Come to PyData conferences and present • Encourage diversity!!
  • 69. How did we get here? • Hard Work • By a community of people • Who cared • About code and people
  • 70. Where do we go from here? • More hard work • More community • More caring • More code • More people Python is not just glue. Python and PyData are communities!
  • 71. Where do we go from here? • More hard work • More community • More caring • More code • More people Python is not just glue. Python and PyData are communities!