SlideShare a Scribd company logo
1 of 46
Download to read offline
Building better
analytics workflows
Strata-Hadoop World 2013
Wes McKinney
@wesmckinn
• Former quant @ AQR (a hedge fund)
• Creator of Pandas project for Python
• Author of
Python for Data Analysis — O’Reilly

• Founder and CEO of DataPad

2

www.datapad.io
•
•

3

> 20k copies since Oct 2012
Bringing many new people
to Python and data analysis
with code

www.datapad.io
Why so many learning
to program?
• Increasing data scale
• More and more data munging/integration
• Need for Statistics and Predictive Analytics
• Building complex data visualizations
• Inadequacy of Excel or other UI-driven data
tools

4

www.datapad.io
The Analytics Workflow

Acquisition

5

Preparation

Visualization

www.datapad.io

Analysis

Sharing
The Analytics Workflow

6

www.datapad.io
What do we care about?

• Minimize time to answer
Ask more questions
•
Reduce friction between tools and
•
processes

• Team productivity
7

www.datapad.io
Data Tools for Humans
(TM?)

8

www.datapad.io
What can go wrong?

• Inefficient workflows lead to lower
quality analysis

•

9

Results may not be actionable in a
reasonable time-frame

www.datapad.io
10

www.datapad.io
Three type of problems

• Tooling
Workflow management
•
Collaboration
•
11

www.datapad.io
Big Notable Data Trends

12

www.datapad.io
Data Preparation:
an ongoing problem

13

www.datapad.io
14

www.datapad.io
For programmers, luckily
it’s not 2005 anymore

• R: Hadley Wickham’s packages
Python: pandas
•
Hadoop: Pig
•
www.datapad.io
Data preparation with
visual tools

• Google OpenRefine
Google Fusion Tables
•
Microsoft Excel
•
• Data Wrangler
www.datapad.io
Some new startups building
data preparation tools

www.datapad.io
Business Intelligence:
essential for doing business

www.datapad.io
BI macro-trends

• Self Service BI
Visual Discovery
•
SQL on Hadoop
•
www.datapad.io
It’s the hey-day for BI
startups

www.datapad.io
Predictive Analytics
is getting easier

www.datapad.io
Some predictive analytics
startups

www.datapad.io
Perils of “data science
in a box”

www.datapad.io
Predictive analytics
pitfalls

• Signal vs. Noise
Identify the right patterns
•
Uncertain ROI
•
www.datapad.io
Some analytics workflow
problems still need work

www.datapad.io
Friction between tools

www.datapad.io
Friction between tools:
a typical scenario

•
Tableau for visualization
•
• SPSS/R for modeling

Excel and SQL for data wrangling

www.datapad.io
Time series analytics

www.datapad.io
Large scale visualization

www.datapad.io
Data workflows as
dependency graphs?
E

A

C

D

B

30

F

www.datapad.io
Data workflows as
dependency graphs?

CHRONOS

31

www.datapad.io
Iterating on analysis

www.datapad.io
Versioning and provenance

www.datapad.io
Leveraging diverse
skill sets

• Within teams, different
competencies

•

Work together on a data project sharing code, data, tracking changes

www.datapad.io
The elusive
GitHub for Data Analysis?

www.datapad.io
...Google Docs for Data
Analysis?

www.datapad.io
Make an impact

• Getting results into the hands of
people who need it

•

Getting models "into production"

www.datapad.io
Some possible
solutions
www.datapad.io
Build more integrated
tool environments

www.datapad.io
www.datapad.io
Enhance collaboration

www.datapad.io
Accessible data science
...with training wheels

www.datapad.io
One more thing

www.datapad.io
• http://datapad.io
Founded in 2013, located in SF
•
In private beta, join us!
•
• Hiring for engineering
www.datapad.io
Q&A time

www.datapad.io
Thank you!

46

www.datapad.io

More Related Content

What's hot

Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningWes McKinney
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"Wes McKinney
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data scienceAndy Petrella
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Databricks
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataDataWorks Summit/Hadoop Summit
 
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkSpark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkDatabricks
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbenchRan Wei
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Pandas/Data Analysis at Baypiggies
Pandas/Data Analysis at BaypiggiesPandas/Data Analysis at Baypiggies
Pandas/Data Analysis at BaypiggiesAndy Hayden
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsDataWorks Summit
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Organising for Data Success
Organising for Data SuccessOrganising for Data Success
Organising for Data SuccessLars Albertsson
 

What's hot (19)

Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with SparkSpark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Pandas/Data Analysis at Baypiggies
Pandas/Data Analysis at BaypiggiesPandas/Data Analysis at Baypiggies
Pandas/Data Analysis at Baypiggies
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Organising for Data Success
Organising for Data SuccessOrganising for Data Success
Organising for Data Success
 

Similar to Building Better Analytics Workflows (Strata-Hadoop World 2013)

The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)DataPad Inc.
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceMark West
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data HubDatavail
 
Using Schema-Less Data API for Building Self-Service Analytics
Using Schema-Less Data API for Building Self-Service AnalyticsUsing Schema-Less Data API for Building Self-Service Analytics
Using Schema-Less Data API for Building Self-Service AnalyticsDionizas Antipenkovas
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudDataWeCloudData
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdfMarcos993896
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science OverviewDavide Mauri
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Roland Bullivant
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyNeo4j
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 

Similar to Building Better Analytics Workflows (Strata-Hadoop World 2013) (20)

The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data Hub
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
Using Schema-Less Data API for Building Self-Service Analytics
Using Schema-Less Data API for Building Self-Service AnalyticsUsing Schema-Less Data API for Building Self-Service Analytics
Using Schema-Less Data API for Building Self-Service Analytics
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseBuilding the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdf
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right Technology
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 

More from Wes McKinney

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkWes McKinney
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache ArrowWes McKinney
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportWes McKinney
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesWes McKinney
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future Wes McKinney
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackWes McKinney
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackWes McKinney
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Wes McKinney
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataWes McKinney
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney
 
Shared Infrastructure for Data Science
Shared Infrastructure for Data ScienceShared Infrastructure for Data Science
Shared Infrastructure for Data ScienceWes McKinney
 
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Wes McKinney
 
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceRaising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceWes McKinney
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityWes McKinney
 

More from Wes McKinney (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Shared Infrastructure for Data Science
Shared Infrastructure for Data ScienceShared Infrastructure for Data Science
Shared Infrastructure for Data Science
 
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
 
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceRaising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 

Recently uploaded

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?Rustici Software
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationBuild Intuit
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 

Recently uploaded (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientation
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 

Building Better Analytics Workflows (Strata-Hadoop World 2013)