Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"

433 views

Published on

Talk in Karlsruhe, Germany, on October 25, 2018

Published in: Technology
  • Be the first to comment

  • Be the first to like this

PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"

  1. 1. Looking backward, looking forward Wes McKinney @wesmckinn PyCon DE / PyData Karlsruhe 2018
  2. 2. Motivations
  3. 3. Guiding questions
  4. 4. How to make data analysis “easier”?
  5. 5. Making individuals more productive
  6. 6. More fruitful open source collaborations
  7. 7. Better hardware utilization
  8. 8. Examining the status quo
  9. 9. Change is difficult
  10. 10. From one existential crisis to another
  11. 11. April 2008 - Avant garde PyData ● Socializing Python inside AQR, a quantitative hedge fund ● scipy.stats.models enabled some R -> Python workload migration
  12. 12. Dec 2009 - pandas 0.1 ● First open source release after ~18 months of internal-only use
  13. 13. May 2011 - “PyData” core dev meetings "Need a toolset that is robust, fast, and suitable for a production environment..."
  14. 14. May 2011 "Need a toolset that is robust, fast, and suitable for a production environment..." "... but also good for interactive research... " May 2011 - “PyData” core dev meetings
  15. 15. May 2011 "Need a toolset that is robust, fast, and suitable for a production environment..." "... but also good for interactive research... " "... and easy / intuitive for non-software engineers to use" May 2011 - “PyData” core dev meetings
  16. 16. May 2011 * also, we need to fix packaging May 2011 - “PyData” core dev meetings
  17. 17. July 2011- Concerns "... the current state of affairs has me rather anxious … these tools [e.g. pandas] have largely not been integrated with any other tools because of the community's collective commitment anxiety" http://wesmckinney.com/blog/a-roadmap-for-rich-scientific-data-structures-in-python/
  18. 18. July 2011- Concerns "Fragmentation is killing us” http://wesmckinney.com/blog/a-roadmap-for-rich-scientific-data-structures-in-python/
  19. 19. Reading CSV files
  20. 20. Python for Data Analysis book - 2012 ● A primer in data manipulation in Python ● Focus: NumPy, IPython /Jupyter, pandas, matplotlib ● 2 editions (2012, 2017) ● 8 translations so far
  21. 21. 2013-2014 - An Entrepeneurial Detour DataPad Python-powered Business Analytics ● Backend built with PyData stack + custom analytics ● Goal to contribute tech back to OSS ecosystem
  22. 22. DataPad learnings ● 200ms threshold for interactivity ● Multitenant query execution, resource management ● pandas performance / memory use problems
  23. 23. PyData NYC 2013: 10 Things I Hate About pandas ● November 2013 ● Summary: “pandas is not designed like, or intended to be used as, a database query engine”
  24. 24. Vertical Integration The Good ● Control ● Development Speed ● Releases
  25. 25. Vertical Integration The Bad ● Large scope of code ownership ● Lack of code reuse ● Bit rot
  26. 26. Fall 2014: Python in a Big Data World Task: Helping Python become a first-class technology for Big Data Some Problems ● File formats ● JVM interop ● Non-array-oriented interfaces
  27. 27. Fragmentation of data and code
  28. 28. Apache Arrow: Defragmenting data systems ● Language-independent open standard in-memory representation for columnar data (i.e. data frames) ● Easily reuse code targeting Arrow memory ● Efficient memory interchange Arrow memory JVM Data Ecosystem Database Systems Data Science Libraries
  29. 29. Apache Arrow: Defragmenting data systems ● https://github.com/apache/arrow ● Over 200 unique contributors ● Some level of support for 11 programming languages
  30. 30. Funding ambitious new open source projects
  31. 31. Early Partners ● https://ursalabs.org ● Apache Arrow-powered Data Science Tools ● Funded by corporate partners ● Built in collaboration with RStudio
  32. 32. Looking forward

×