Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2016 Continuum Analytics - Confidential & Proprietary
Python for Data:

Past, Present, and Future
Peter Wang
CTO, Co-fou...
© 2017 Anaconda, Inc.
• Our Journey with Anaconda
• Why Python for Data?
• The Future
Agenda
2
3
My Journey with Anaconda
© 2017 Anaconda, Inc.
• Degree in Physics (Cornell Univ.)
• Computer graphics developer (C, C++)
• Scientific Python devel...
When we started 5 years ago…
© 2017 Anaconda, Inc.
The birth of conda…
6
“Guido, please help
convince core dev to
work with us to solve
the packaging p...
© 2017 Anaconda, Inc. 7
• 500+ Popular Python Packages
• Optimized & Compiled
• Free for Everyone
• Extensible via Conda P...
© 2017 Anaconda, Inc. 8
0
500
1,000
1,500
2,000
2,500
3,000
3,500 2015/1
2015/2
2015/3
2015/4
2015/5
2015/6
2015/7
2015/8
...
© 2017 Anaconda, Inc.
The Growth of Data Science - Python Leading the Way
9
https://stackoverflow.blog/2017/09/06/incredib...
© 2017 Anaconda, Inc.
Other Problems in 2012…
10
• Performance: You had to choose between vectorized system like NumPy,
or...
© 2017 Anaconda, Inc. 11
• Everyone is learning it, major universities are teaching it
• Proven in production at Serious P...
https://www.youtube.com/watch?v=nU09j2gGHYg
Why Python for Data?
13
© 2017 Anaconda, Inc. 14
1973 19811968 1974
SQL
Numeric
19962005 1993 1991
© 2017 Anaconda, Inc.
Python & ABC
15
It is interactive, structured, high-
level, and intended to be used
instead of BASIC...
© 2017 Anaconda, Inc. 16
Analyst
• Uses graphical tools
• Can call functions,
cut & paste code
• Can change some
variables...
© 2017 Anaconda, Inc.
• VERY common misconception
• Python is probably the most misunderstood
language
• There are “tribes...
© 2017 Anaconda, Inc.
• Data exploration and analysis are going to be a new kind of literacy that
will be required to do g...
© 2017 Anaconda, Inc. 19
What’s Next?
20
© 2017 Anaconda, Inc.
• Python will become a preferred way to develop cognitive applications:
online model learning and tr...
© 2017 Anaconda, Inc.
• Not about licenses
• Empowering people &
communities to innovate
• Aligns us with users, customers...
© 2017 Anaconda, Inc.
• Not about cost of software (“capital
expense”)
• Not even about maintenance of
software (“operatio...
5 Years
25+ Conferences
100s of talks
© 2017 Anaconda, Inc.
Questions?
28
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
Upcoming SlideShare
Loading in …5
×

Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

1,807 views

Published on

Talk about the Python/PyData ecosystem, the origins of Anaconda, and the future of open source tools in data science

Published in: Software
  • Be the first to comment

Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

  1. 1. © 2016 Continuum Analytics - Confidential & Proprietary Python for Data:
 Past, Present, and Future Peter Wang CTO, Co-founder Anaconda / Continuum Analytics
  2. 2. © 2017 Anaconda, Inc. • Our Journey with Anaconda • Why Python for Data? • The Future Agenda 2
  3. 3. 3 My Journey with Anaconda
  4. 4. © 2017 Anaconda, Inc. • Degree in Physics (Cornell Univ.) • Computer graphics developer (C, C++) • Scientific Python developer and consultant (Chaco, Traits, …) • Founded Continuum Analytics in 2012 with Travis Oliphant • Launched / Created: PyData conferences and community, Anaconda distribution, conda package manager, Bokeh web visualization, Blaze data library • Think a lot about future of Python for data+science, machine learning About Peter 4
  5. 5. When we started 5 years ago…
  6. 6. © 2017 Anaconda, Inc. The birth of conda… 6 “Guido, please help convince core dev to work with us to solve the packaging problem!” “Meh. Feel free to solve it yourselves.”
  7. 7. © 2017 Anaconda, Inc. 7 • 500+ Popular Python Packages • Optimized & Compiled • Free for Everyone • Extensible via Conda Package Manager • Sandbox Packages & Libraries • Cross-Platform – Windows, Linux, Mac • Not just Python - over 230 R packages
  8. 8. © 2017 Anaconda, Inc. 8 0 500 1,000 1,500 2,000 2,500 3,000 3,500 2015/1 2015/2 2015/3 2015/4 2015/5 2015/6 2015/7 2015/8 2015/9 2015/10 2015/11 2015/12 2016/1 2016/2 2016/3 2016/4 2016/5 2016/6 2016/7 2016/8 2016/9 2016/10 2016/11 2016/12 2017/1 2017/2 2017/3 2017/4 2017/5 2017/6 2017/7 Thousands Anaconda& Miniconda Downloads Anaconda Miniconda Over 20 Million Downloads
  9. 9. © 2017 Anaconda, Inc. The Growth of Data Science - Python Leading the Way 9 https://stackoverflow.blog/2017/09/06/incredible-growth-python/
  10. 10. © 2017 Anaconda, Inc. Other Problems in 2012… 10 • Performance: You had to choose between vectorized system like NumPy, or going to Cython or wrapping C code. No nice JIT like Julia. • We created Numba • No system for building simple data-driven web apps, like Shiny for R. • We created Bokeh, to serve as both Shiny and D3 for Python • No easy parallelism, or intrinsic parallel primitives like Spark. • We created Dask, which has parallel arrays and dataframes. • Also solves “data doesn't fit in RAM” problem.
  11. 11. © 2017 Anaconda, Inc. 11 • Everyone is learning it, major universities are teaching it • Proven in production at Serious Places, not merely hip startups • Vastly outstrips scripting language rivals like Ruby, Perl • Growing faster than pure analysis langs like R, SAS, Matlab • Data science, machine learning application is taking off like a rocket • Python is most popular language for Deep Learning, the most rapidly-innovating area of machine learning • Python 2 vs 3 rift is less of an issue for most people Python in 2017
  12. 12. https://www.youtube.com/watch?v=nU09j2gGHYg
  13. 13. Why Python for Data? 13
  14. 14. © 2017 Anaconda, Inc. 14 1973 19811968 1974 SQL Numeric 19962005 1993 1991
  15. 15. © 2017 Anaconda, Inc. Python & ABC 15 It is interactive, structured, high- level, and intended to be used instead of BASIC, Pascal, or AWK. It is not meant to be a systems- programming language but is intended for teaching or prototyping.
  16. 16. © 2017 Anaconda, Inc. 16 Analyst • Uses graphical tools • Can call functions, cut & paste code • Can change some variables Gets paid for: Insight Excel, VB, Tableau, Analyst / Data Developer • Builds simple apps & workflows • Used to be "just an analyst" • Likes coding to solve problems • Doesn't want to be a "full-time programmer" Gets paid (like a rock star) for: Code that produces insight SAS, R, Matlab, Programmer • Creates frameworks & compilers • Uses IDEs • Degree in CompSci • Knows multiple languages Gets paid for: Code C, C++, Java, JS, Python Python Python
  17. 17. © 2017 Anaconda, Inc. • VERY common misconception • Python is probably the most misunderstood language • There are “tribes” and ecosystems in Python: web dev, scipy, pydata, embedded, scripting, 3D graphics, etc. • But businesses tend to pigeonhole it: • IT/software/data engineering view: competes with Java, C#, Ruby… • Analytics, stats, data science view: competes with R, SAS, Matlab, SPSS, BI systems Data science != Software Development 17
  18. 18. © 2017 Anaconda, Inc. • Data exploration and analysis are going to be a new kind of literacy that will be required to do great work in any field.
 • Language is a human instinct and is a natural path to insight. We see this in our interaction with Python/PyData users, whose passion chiefly stems from this expressiveness and agility.
 • An analytical language is “thoughtware”, not “software”. Era of Data Literacy 18
  19. 19. © 2017 Anaconda, Inc. 19
  20. 20. What’s Next? 20
  21. 21. © 2017 Anaconda, Inc. • Python will become a preferred way to develop cognitive applications: online model learning and training • There will be a steady income stream for people who want to maintain Python 2.x codebases • Multi-language interoperability will be greatly improved once people adopt the Apache Arrow format for storing data. This means Python code running alongside Java/Scala/JVM will not be a second-class citizen. • Constant improvements in memory and storage, as well as GPUs, mean that people will continue doing lots of Python locally on big workstations. A Few Predictions 21
  22. 22. © 2017 Anaconda, Inc. • Not about licenses • Empowering people & communities to innovate • Aligns us with users, customers, innovators • “Software is eating the world” • Open source is eating software Open Source and Developers 23
  23. 23. © 2017 Anaconda, Inc. • Not about cost of software (“capital expense”) • Not even about maintenance of software (“operational expense”) • Core business goals: • Avoid lock-in • Harness innovation Open Source and Businesses 25
  24. 24. 5 Years 25+ Conferences 100s of talks
  25. 25. © 2017 Anaconda, Inc. Questions? 28

×