Submit Search
Upload
An Incomplete Data Tools Landscape for Hackers in 2015
•
24 likes
•
8,136 views
Wes McKinney
Follow
Overview talk for a mostly Python/R crowd in Minneapolis
Read less
Read more
Technology
Report
Share
Report
Share
1 of 22
Download now
Download to read offline
Recommended
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
Wes McKinney
My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
Recommended
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
Wes McKinney
My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
High Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
Wes McKinney
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Uwe Korn
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Apache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
Apache Spark Briefing
Apache Spark Briefing
Thomas W. Dinsmore
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
DataWorks Summit/Hadoop Summit
Hacking
Hacking
Karan Poshattiwar
User Experience for Business Analysts
User Experience for Business Analysts
Carol Smith
More Related Content
What's hot
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
High Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
Wes McKinney
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Uwe Korn
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Apache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
Apache Spark Briefing
Apache Spark Briefing
Thomas W. Dinsmore
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
DataWorks Summit/Hadoop Summit
What's hot
(20)
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
High Performance Python on Apache Spark
High Performance Python on Apache Spark
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Apache Arrow - An Overview
Apache Arrow - An Overview
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Apache Spark Briefing
Apache Spark Briefing
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Viewers also liked
Hacking
Hacking
Karan Poshattiwar
User Experience for Business Analysts
User Experience for Business Analysts
Carol Smith
Riding the Enterprise Integration train
Riding the Enterprise Integration train
Dominopoint - Italian Lotus User Group
Salesforce DX Pilot Product Overview
Salesforce DX Pilot Product Overview
Salesforce Partners
Productive Data Tools for Quants
Productive Data Tools for Quants
Wes McKinney
How To Be A Hacker
How To Be A Hacker
Paul Tarjan
Hacking For Innovation Delhi
Hacking For Innovation Delhi
Christian Heilmann
Success Community Wizard Overview
Success Community Wizard Overview
David Giller
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
Wes McKinney
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
AI for IA's: Machine Learning Demystified at IA Summit 2017 - IAS17
AI for IA's: Machine Learning Demystified at IA Summit 2017 - IAS17
Carol Smith
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
정훈 서
Startup Pitch Decks
Startup Pitch Decks
Steve Schlafman
Viewers also liked
(16)
Hacking
Hacking
User Experience for Business Analysts
User Experience for Business Analysts
Riding the Enterprise Integration train
Riding the Enterprise Integration train
Salesforce DX Pilot Product Overview
Salesforce DX Pilot Product Overview
Productive Data Tools for Quants
Productive Data Tools for Quants
How To Be A Hacker
How To Be A Hacker
Hacking For Innovation Delhi
Hacking For Innovation Delhi
Success Community Wizard Overview
Success Community Wizard Overview
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
AI for IA's: Machine Learning Demystified at IA Summit 2017 - IAS17
AI for IA's: Machine Learning Demystified at IA Summit 2017 - IAS17
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
Understanding deep learning requires rethinking generalization (2017) 2 2(2)
Startup Pitch Decks
Startup Pitch Decks
Similar to An Incomplete Data Tools Landscape for Hackers in 2015
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Hakka Labs
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
PyData Boston 2013
PyData Boston 2013
Travis Oliphant
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
Work-Bench
High-Performance Python On Spark
High-Performance Python On Spark
Jen Aman
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Cloudera, Inc.
Pandas & Cloudera: Scaling the Python Data Experience
Pandas & Cloudera: Scaling the Python Data Experience
Turi, Inc.
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Codemotion
Similar to An Incomplete Data Tools Landscape for Hackers in 2015
(20)
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
PyData Boston 2013
PyData Boston 2013
Improving Data Interoperability for Python and R
Improving Data Interoperability for Python and R
High-Performance Python On Spark
High-Performance Python On Spark
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Pandas & Cloudera: Scaling the Python Data Experience
Pandas & Cloudera: Scaling the Python Data Experience
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Keynote at Converge 2019
Keynote at Converge 2019
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
More from Wes McKinney
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
New Directions for Apache Arrow
New Directions for Apache Arrow
Wes McKinney
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Wes McKinney
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
PyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
Wes McKinney
More from Wes McKinney
(18)
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
New Directions for Apache Arrow
New Directions for Apache Arrow
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
PyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
Recently uploaded
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
MounikaPolabathina
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
mohitsingh558521
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Pixlogix Infotech
Recently uploaded
(20)
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
An Incomplete Data Tools Landscape for Hackers in 2015
1.
1 © Cloudera,
Inc. All rights reserved. A [Incomplete] Data Tools Landscape [for Hackers] in 2015 Wes McKinney @wesmckinn Data^3 MeeMng — Minneapolis, MN
2.
2 © Cloudera,
Inc. All rights reserved. This talk • A parMal look at different languages and tools • LimiMng scope to either: • Permissively licensed open source soSware, e.g. Apache-‐licensed (OSS) • Non-‐dual-‐licensed copyleS OSS (e.g. GPL) • i.e. “do you [the community] have any incenMve to create patches?” • Some trends (that I see, anyway) • Challenges and opportuniMes
3.
3 © Cloudera,
Inc. All rights reserved. Who am I? • Python data firestarter • Financial analyMcs in R / Python starMng 2007 • pandas project born of frustraMon in 2008 • 2010-‐2012 • Hiatus from gainful employment • Make pandas ready for primeMme • Write "Python for Data Analysis"
4.
4 © Cloudera,
Inc. All rights reserved. Who am I? (cont’d) • 2013-‐2014: Co-‐founder/CEO of DataPad (analyMcs startup, with early pandas collaborator Chang She) • Late 2014: DataPad team joins Cloudera • Now: backend systems and all-‐things-‐Python @ Cloudera
5.
5 © Cloudera,
Inc. All rights reserved. SQL: SMll a lingua franca • “SQL: the Fortran of AnalyMcs” • OSen a concise, declaraMve way to express data transforms, analyMcs, etc. • RelaMvely easy to parse, analyze • SQL recently has seen resurgence with focus on interacMve-‐speed SQL engines, especially on top of HDFS/Hadoop • Relevant and impaclul features (e.g. JSON support) sMll arriving in established RDBMS like PostgreSQL
6.
6 © Cloudera,
Inc. All rights reserved. Historical Python Context • ScienMfic / HPC compuMng focus in 1990s, 2000s • Python web community developed in parallel, matured faster! • NumPy became community standard in 2005, born from Numeric + Numarray • Pyrex, later Cython, easier C / C++ wrapping • f2py: easy Fortran wrapping • Anaconda distribuMon • Finally solving Python deployment for all
7.
7 © Cloudera,
Inc. All rights reserved. EssenMal Python stack • NumPy: low-‐level array processing • SciPy: essenMal computaMonal algos • pandas: data wrangling • scikit-‐learn: machine learning • matplotlib (+ add-‐ons, like seaborn): visualizaMon • numba: numeric hotspot LLVM compiler • Domain-‐specific toolkits: nltk, scikit-‐image, statsmodels, Theano, PyCUDA/ PyOpenCL and many others
8.
8 © Cloudera,
Inc. All rights reserved. pandas • A Pythonic take on the classic R “data frame” data structure • CriMcal piece to make the Python stack useful in everyday work • Added axis metadata / labeling for represenMng mulMdimensional data • Focus on easy data wrangling, IO, ploung, and basic analyMcs
9.
9 © Cloudera,
Inc. All rights reserved. Jeff Reback’s “pandas as PyData middleware” diagram
10.
10 © Cloudera,
Inc. All rights reserved. Newer / Up-‐and-‐coming Python projects • Bokeh: interacMve / reacMve visualizaMon for the web • Blaze: uniform data expression API • Odo: easy data migraMon
11.
11 © Cloudera,
Inc. All rights reserved. R Project • Trusted base of staMsMcs libraries • Latest and greatest stats research oSen hits R first • RStudio • The "Hadley stack” • VisualizaMon: ggplot2 (staMc) and ggvis (interacMve) • Data Wrangling: dplyr • legacy: plyr / reshape2
12.
12 © Cloudera,
Inc. All rights reserved. dplyr • Started late 2012 by Hadley Wickham, supported by RStudio • Composable / chainable analyMcs and data wrangling expressions • In-‐memory and SQL backends • Has avracted folks back to R from Python in a lot of cases
13.
13 © Cloudera,
Inc. All rights reserved. Some other great R stuff • shiny: interacMve web apps in R • Rcpp • data.table • xts
14.
14 © Cloudera,
Inc. All rights reserved. IPython • IPython started out as a bever interacMve Python • Grew to include web-‐based computaMonal notebook, GUI console, and other components • (Google even integrated into Google Drive!) • IPython Notebook architecture enabled “kernel” processes to be wriven in nearly any language (even bash!) • How to build community beyond Python?
15.
15 © Cloudera,
Inc. All rights reserved. Enter Jupyter • hvp://jupyter.org • Breaking out notebook machinery into a standalone non-‐Python-‐specific project • Enable project components to evolve at own pace, without large monolithic releases • JupyterHub: upcoming mulM-‐user notebook server
16.
16 © Cloudera,
Inc. All rights reserved. A few words about Hadoop + Big Data
17.
17 © Cloudera,
Inc. All rights reserved. Apache Spark • Originated from Berkeley AMPLab • General purpose distributed memory-‐centric data processing framework • Official APIs: Scala, Java, Python Source: databricks.com
18.
18 © Cloudera,
Inc. All rights reserved. Spark 1.3: DataFrames! • R/pandas-‐inspired API for tabular data manipulaMon in Scala, Python, etc. • Logical operaMon graphs rewriven internally in more efficient form • Good interop with Spark SQL • Some interoperability with pandas • Will help close the semanMc gap between Spark and R/Python
19.
19 © Cloudera,
Inc. All rights reserved. Some problems in need of solving • A Shiny-‐like quick-‐and-‐dirty data app development framework for Python • IPython/Jupyter notebook collaboraMon • A community-‐standard, Apache-‐licensed C/C++ data frame library with best-‐in-‐ class performance • Ubiquitous support for emerging analyMcal on-‐disk storage standards like Parquet
20.
20 © Cloudera,
Inc. All rights reserved. Other interesMng stuff to look at • Torch7 / LuaJIT: high performance ML / deep learning on GPUs • Facebook AI group open sourced several ML modules • Apache Flink • Up-‐and-‐coming Scala-‐based data processing framework • Some overlap with Spark use cases
21.
21 © Cloudera,
Inc. All rights reserved. Some other interesMng industry trends • MicrosoS • Acquired RevoluMon AnalyMcs, leading commercial R vendor • Launched Azure ML: R, Python, and more on Azure cloud • Dato (ya GraphLab) • faster, more scalable machine learning, with Python interface (Paid commercial product, free for non-‐commercial/academic use) • Largest-‐ever VC investment in a data tools company beung big on Python • Databricks • Offering cloud Spark-‐notebook-‐as-‐a-‐service
22.
22 © Cloudera,
Inc. All rights reserved. Thank you @wesmckinn
Download now