SlideShare a Scribd company logo
Future of Pandas
Jeff Reback
PyData NYC
November 2017
• The information presented here is offered for informational purposes only and should not be used for any other
purpose (including, without limitation, the making of investment decisions). Examples provided herein are for
illustrative purposes only and are not necessarily based on actual data. Nothing herein constitutes: an offer to sell
or the solicitation of any offer to buy any security or other interest; tax advice; or investment advice. This
presentation shall remain the property of Two Sigma Investments, LP (“Two Sigma”) and Two Sigma reserves the
right to require the return of this presentation at any time.
• Some of the images, logos or other material used herein may be protected by copyright and/or trademark. If so,
such copyrights and/or trademarks are most likely owned by the entity that created the material and are used
purely for identification and comment as fair use under international copyright and/or trademark laws. Use of
such image, copyright or trademark does not imply any association with such organization (or endorsement of
such organization) by Two Sigma, nor vice versa.
• Copyright © 2017 TWO SIGMA INVESTMENTS, LP. All rights reserved
IMPORTANT LEGAL INFORMATION
@jreback
● Former quant
● Senior Engineer at Two Sigma, working on holistic approaches to
modeling
● Core committer to pandas for last 5 years
● Managed pandas since 2013
Kudos!
Kudos! Complaints!
Overview
● State of the Pandas
○ The Good
○ The Bad
○ The Ugly
Overview
● State of the Pandas
○ The Good
○ The Bad
○ The Ugly
● The Present
Overview
● State of the Pandas
○ The Good
○ The Bad
○ The Ugly
● The Present
● The Future
The Good
pandas’s role in the Python Data Ecosystem
pandas
Numerical
Computing
IO / Data
Access
Data
Visualization
Statistics +
Machine
Learning
Libraries
Users
The Bad
http://wesmckinney.com/blog/apache-arrow-pandas-internals/
@wesm "10 Things I Hate About pandas"
DataTypes - what are we Missing?
DataTypes - Missing values can cause dtype changes
● Complex groupby operations awkward and slow
● Copy Semantics
API
API - Opaque UDFs
API - Groupby Performance
API - Aggregation Syntax
API - Copy Semantics
● In-memory format that is custom
● Eager evaluation model, no query planning
● "Slow", limited multicore algorithms for large datasets
Performance
Data Tooling Spectrum
Small Data
“Medium” Data
“Big” Data
< 5GB 5-100GB > 100 GB
pandas starts to fail as an effective
tool somewhere around the 10 GB
mark
Block Storage
Block Storage
Float
1.0 2.0
1.0 2.0
1.0 2.0
1.0 2.0
Int
1
2
3
4
Index
RangeIndex
(0, 4, 1)
Columns
Index
[‘A’, ‘B’, ‘C’]
Block Manager
AxesBlocks
The Ugly
Big Data Unfriendly
Each system has its own internal memory format
70-80% computation wasted on serialization and deserialization
Similar functionality implemented in multiple projects
The Present
CategoricalDtype efficient memory & first class Categoricals
efficient IO
out-of-core and multi-core
In current pandas
The Future
pandas2 architecture
Arrow-optimized data connectors
Arrow in-memory format
Python user API, User-defined functions
Logical Data Frame Expression Graphs
Parallel Dataflow Execution Engine
Apache Arrow
Ibis
pandas2
DataFrame semantics & compatibility
Ibis
Python user API, User-defined functions
Logical Data Frame Expression Graphs
Ibis in a nutshell
Ibis
Python code
Compiler Back End
compiled code
Abstract
Syntax Tree
Apache Arrow
Arrow-optimized data connectors
Arrow in-memory format
Parallel Dataflow Execution Engine
Apache Arrow project
The Arrow supports zero-copy reads and is optimized for data locality.
Fast
Arrow acts as a new high-performance interface between various systems.
Flexible
Apache Arrow is backed by key developers of 13 major open source projects
Standard
Big Data friendly
All systems utilize the same memory format
No overhead for cross-system communication
Projects can share functionality (eg, Parquet-to-Arrow reader)
DataFrame
Computation
● Kernel functions: atomic units of computation
● Operator nodes: input/output types, operator
parallelism properties
Physical Operator Graphs
Log Add
b
a
(a + b).log().sum()
Sum
Parallel Evaluation of Operator Graphs
a b tmp
tmp
2
out
Add SumLog
Parallel Evaluation of Operator Graphs
Status & Links
pandas2
https://github.com/pandas-dev/pandas2
Ibis
https://github.com/ibis-project/ibis
0.12.0 released in October.
Arrow
https://github.com/apache/arrow
0.7.1 released in October.
What can the community do?
● We love contributions.
● We love donations (to NUMFocus).
Thanks

More Related Content

What's hot

Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
Cambridge Semantics
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
GetInData
 
The Year of the Graph
The Year of the GraphThe Year of the Graph
The Year of the Graph
Cambridge Semantics
 
Satyam open analytics nyc
Satyam open analytics nycSatyam open analytics nyc
Satyam open analytics nycOpen Analytics
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
TigerGraph
 
Graph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleGraph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at Scale
TigerGraph
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
Cambridge Semantics
 
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital StrategyFrom Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
Cambridge Semantics
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
TigerGraph
 
Platfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big DataPlatfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big Data
Mark Ginnebaugh
 
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Cambridge Semantics
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AI
TigerGraph
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
Cambridge Semantics
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
Cambridge Semantics
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
Cambridge Semantics
 
Graph+AI for Fin. Services
Graph+AI for Fin. ServicesGraph+AI for Fin. Services
Graph+AI for Fin. Services
TigerGraph
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
Cambridge Semantics
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive Analytics
Bob Samuels
 
TigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial CrimesTigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial Crimes
TigerGraph
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
Cambridge Semantics
 

What's hot (20)

Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
The Year of the Graph
The Year of the GraphThe Year of the Graph
The Year of the Graph
 
Satyam open analytics nyc
Satyam open analytics nycSatyam open analytics nyc
Satyam open analytics nyc
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
 
Graph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at ScaleGraph-Based Identity Resolution at Scale
Graph-Based Identity Resolution at Scale
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital StrategyFrom Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Platfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big DataPlatfora - An Analytics Sandbox In A World Of Big Data
Platfora - An Analytics Sandbox In A World Of Big Data
 
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AI
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Accelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
 
Graph+AI for Fin. Services
Graph+AI for Fin. ServicesGraph+AI for Fin. Services
Graph+AI for Fin. Services
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive Analytics
 
TigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial CrimesTigerGraph UI Toolkits Financial Crimes
TigerGraph UI Toolkits Financial Crimes
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 

Similar to Future of Pandas - Jeff Reback

Future of pandas
Future of pandasFuture of pandas
Future of pandas
Jeff Reback
 
Improving Pandas and PySpark performance and interoperability with Apache Arrow
Improving Pandas and PySpark performance and interoperability with Apache ArrowImproving Pandas and PySpark performance and interoperability with Apache Arrow
Improving Pandas and PySpark performance and interoperability with Apache Arrow
PyData
 
Improving Pandas and PySpark interoperability with Apache Arrow
Improving Pandas and PySpark interoperability with Apache ArrowImproving Pandas and PySpark interoperability with Apache Arrow
Improving Pandas and PySpark interoperability with Apache Arrow
Li Jin
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Spark Summit
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Sri Ambati
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Julien Le Dem
 
Gen Z BI Paradigm
Gen Z BI ParadigmGen Z BI Paradigm
Gen Z BI Paradigm
Deepikavalli A
 
Graph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudGraph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraud
DataWorks Summit
 
DIY Analytics with Apache Spark
DIY Analytics with Apache SparkDIY Analytics with Apache Spark
DIY Analytics with Apache Spark
Adam Roberts
 
Data Lake na área da saúde- AWS
Data Lake na área da saúde- AWSData Lake na área da saúde- AWS
Data Lake na área da saúde- AWS
Amazon Web Services LATAM
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
ArunshankarArjunan
 
Iod 2013 Jackman Schwenger
Iod 2013 Jackman SchwengerIod 2013 Jackman Schwenger
Iod 2013 Jackman Schwenger
MARIA N. SCHWENGER
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Databricks
 
Using big data_to_your_advantage
Using big data_to_your_advantageUsing big data_to_your_advantage
Using big data_to_your_advantage
John Repko
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
DataWorks Summit
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
Amazon Web Services
 
The Dynamic Information Model
The Dynamic Information ModelThe Dynamic Information Model
The Dynamic Information Model
georgebina
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Databricks
 

Similar to Future of Pandas - Jeff Reback (20)

Future of pandas
Future of pandasFuture of pandas
Future of pandas
 
Improving Pandas and PySpark performance and interoperability with Apache Arrow
Improving Pandas and PySpark performance and interoperability with Apache ArrowImproving Pandas and PySpark performance and interoperability with Apache Arrow
Improving Pandas and PySpark performance and interoperability with Apache Arrow
 
Improving Pandas and PySpark interoperability with Apache Arrow
Improving Pandas and PySpark interoperability with Apache ArrowImproving Pandas and PySpark interoperability with Apache Arrow
Improving Pandas and PySpark interoperability with Apache Arrow
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
Gen Z BI Paradigm
Gen Z BI ParadigmGen Z BI Paradigm
Gen Z BI Paradigm
 
Graph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudGraph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraud
 
DIY Analytics with Apache Spark
DIY Analytics with Apache SparkDIY Analytics with Apache Spark
DIY Analytics with Apache Spark
 
Data Lake na área da saúde- AWS
Data Lake na área da saúde- AWSData Lake na área da saúde- AWS
Data Lake na área da saúde- AWS
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
Iod 2013 Jackman Schwenger
Iod 2013 Jackman SchwengerIod 2013 Jackman Schwenger
Iod 2013 Jackman Schwenger
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...
 
Using big data_to_your_advantage
Using big data_to_your_advantageUsing big data_to_your_advantage
Using big data_to_your_advantage
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
The Dynamic Information Model
The Dynamic Information ModelThe Dynamic Information Model
The Dynamic Information Model
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li JinVectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
 

More from Two Sigma

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School Bullying
Two Sigma
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
Two Sigma
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng Li
Two Sigma
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Two Sigma
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-Scaler
Two Sigma
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif Walsh
Two Sigma
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Two Sigma
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + Practice
Two Sigma
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For Spark
Two Sigma
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
Two Sigma
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
Two Sigma
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Two Sigma
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
Two Sigma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
Two Sigma
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied Volatility
Two Sigma
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API Design
Two Sigma
 

More from Two Sigma (16)

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School Bullying
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng Li
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-Scaler
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif Walsh
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane Adams
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + Practice
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For Spark
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied Volatility
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API Design
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 

Future of Pandas - Jeff Reback