Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Raising the Tides: Open Source Analytics for Data Science

2,429 views

Published on

Given March 2, 2017 at Newsweek AI/Data Science in Capital Markets conference

Published in: Technology
  • Be the first to comment

Raising the Tides: Open Source Analytics for Data Science

  1. 1. Raising the Tides: Open Source Analytics for Data Science Wes McKinney @wesmckinn N E W S W E E K A I & D A T A S C I E N C E C O N F E R E N C E – C A P I T A L M A R K E T S 2 M A R C H 2 0 1 7
  2. 2. Wes McKinney @wesmckinn Me
  3. 3. Wes McKinney @wesmckinn Important Legal Information • The information presented here is offered for informational purposes only and should not be used for any other purpose (including, without limitation, the making of investment decisions). Examples provided herein are for illustrative purposes only and are not necessarily based on actual data. Nothing herein constitutes: an offer to sell or the solicitation of any offer to buy any security or other interest; tax advice; or investment advice. This presentation shall remain the property of Two Sigma Investments, LP (“Two Sigma”) and Two Sigma reserves the right to require the return of this presentation at any time. • Some of the images, logos or other material used herein may be protected by copyright and/or trademark. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa. • Copyright © 2017 TWO SIGMA INVESTMENTS, LP. All rights reserved
  4. 4. Wes McKinney @wesmckinn In the next 20 minutes ∞ Important trends in the industry ∞ Two Sigma involvement in open source ∞ Growing the community
  5. 5. WHAT I’M SEEING TODAY
  6. 6. Wes McKinney @wesmckinn Industry giants open source core AI and machine learning technology
  7. 7. Wes McKinney @wesmckinn Open source “disruption” in data science languages and supporting technologies
  8. 8. Wes McKinney @wesmckinn Observation #1: User Mindshare is a Key Asset
  9. 9. Wes McKinney @wesmckinn Observation #2: Tools may be less important than human capital and data
  10. 10. Wes McKinney @wesmckinn Two Sigma Building a state-of-the-art, collaborative data science platform
  11. 11. Wes McKinney @wesmckinn Scaling data science in many dimensions ∞ Access to diverse data sets
  12. 12. Wes McKinney @wesmckinn Scaling data science in many dimensions ∞ Access to diverse data sets ∞ Enhancing individual productivity
  13. 13. Wes McKinney @wesmckinn Scaling data science in many dimensions ∞ Access to diverse data sets ∞ Enhancing individual productivity ∞ Computational capabilities: larger and more complex data sets
  14. 14. Wes McKinney @wesmckinn Scaling data science in many dimensions ∞ Access to diverse data sets ∞ Enhancing individual productivity ∞ Computational capabilities: larger and more complex data sets ∞ Collaboration within and across teams
  15. 15. TOOLS AND THE “DATA SCIENTIST SHORTAGE”
  16. 16. WHY WE PARTICIPATE IN OPEN SOURCE
  17. 17. Wes McKinney @wesmckinn Why we participate in Open Source 1. Drive progress and innovation in foundational technologies
  18. 18. Wes McKinney @wesmckinn Why we participate in Open Source 1. Drive progress and innovation in foundational technologies 2. Increase the overall value, interoperability, and sustainability of our closed source systems
  19. 19. Wes McKinney @wesmckinn Why we participate in Open Source 1. Drive progress and innovation in foundational technologies 2. Increase the overall value, interoperability, and sustainability of our closed source systems 3. Raise awareness of problems faced at scale on real world data
  20. 20. Wes McKinney @wesmckinn Why we participate in Open Source 1. Drive progress and innovation in foundational technologies 2. Increase the overall value, interoperability, and sustainability of our closed source systems 3. Raise awareness of problems faced at scale on real world data 4. Benefit sooner from open source innovations
  21. 21. Wes McKinney @wesmckinn Why we participate in Open Source 1. Drive progress and innovation in foundational technologies 2. Increase the overall value, interoperability, and sustainability of our closed source systems 3. Raise awareness of problems faced at scale on real world data 4. Benefit sooner from open source innovations 5. Attract and retain the best engineering talent
  22. 22. Wes McKinney @wesmckinn Where we are investing Collaboration and Publishing Cluster Resource Management Scalable / Distributed Computing High Performance Data Processing
  23. 23. Wes McKinney @wesmckinn Core data infrastructure technologies Apache Arrow Apache Parquet • Efficient columnar in- memory data processing • High-speed, interoperable data messaging for Java, C++, Python • Industry-standard columnar file format for distributed storage • Efficient IO for Spark, Python, etc.
  24. 24. Wes McKinney @wesmckinn Open source in-memory and distributed analytics • Popular Python analytics library • Powerful and easy-to-use data cleaning, analytics, and time series processing • Flint: scalable time series analytics for Spark • Enhanced Python integration
  25. 25. Wes McKinney @wesmckinn Cluster resource management • Scalable cluster resource manager • Native container support • Fair job scheduler for Mesos • Managing multi-tenant Spark clusters cook
  26. 26. Wes McKinney @wesmckinn Collaboration and publishing • Notebook “kernels” for polyglot research and development • Inter-language data exchange • Leading web notebook & reproducible research development platform • Interactive widgets framework
  27. 27. TOWARD HIGH TIDE: Preserving competitive advantage and building common knowledge
  28. 28. Thank you Wes McKinney @wesmckinn

×