Submit Search
Upload
DataFrames: The Good, Bad, and Ugly
•
18 likes
•
12,893 views
Wes McKinney
Follow
Thoughts on the status quo in "data frame" interfaces
Read less
Read more
Technology
Report
Share
Report
Share
1 of 24
Download now
Download to read offline
Recommended
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
Data Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodels
Wes McKinney
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
Recommended
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
Data Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodels
Wes McKinney
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
Productive Data Tools for Quants
Productive Data Tools for Quants
Wes McKinney
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
Apache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Uber's data science workbench
Uber's data science workbench
Ran Wei
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
More Related Content
What's hot
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Wes McKinney
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
Productive Data Tools for Quants
Productive Data Tools for Quants
Wes McKinney
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
Apache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Uber's data science workbench
Uber's data science workbench
Ran Wei
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Databricks
What's hot
(20)
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Productive Data Tools for Quants
Productive Data Tools for Quants
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
Apache Arrow - An Overview
Apache Arrow - An Overview
Uber's data science workbench
Uber's data science workbench
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
Viewers also liked
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
Revolution Analytics
A Step Towards Reproducibility in R
A Step Towards Reproducibility in R
Revolution Analytics
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
Wes McKinney
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
Effective Data Analysis with Deedle
Effective Data Analysis with Deedle
Howard Mansell
F# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis Simple
Tomas Petricek
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
Wes McKinney
Ayasdi strata
Ayasdi strata
Alpine Data
Apache Spark Components
Apache Spark Components
Girish Khanzode
A look inside pandas design and development
A look inside pandas design and development
Wes McKinney
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
Wes McKinney
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
Wes McKinney
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
Taras Matyashovsky
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
Kristian Alexander
Papel quadriculado
Papel quadriculado
Crescendo EAprendendo
Viewers also liked
(16)
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
A Step Towards Reproducibility in R
A Step Towards Reproducibility in R
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Effective Data Analysis with Deedle
Effective Data Analysis with Deedle
F# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis Simple
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
Ayasdi strata
Ayasdi strata
Apache Spark Components
Apache Spark Components
A look inside pandas design and development
A look inside pandas design and development
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
Introduction to real time big data with Apache Spark
Introduction to real time big data with Apache Spark
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
Papel quadriculado
Papel quadriculado
Similar to DataFrames: The Good, Bad, and Ugly
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
Cloudera, Inc.
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
michaelguia
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
Felicia Haggarty
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Hakka Labs
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Dremio Corporation
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Cloudera, Inc.
Microservices with Terraform, Docker and the Cloud. DevOps Wet 2018
Microservices with Terraform, Docker and the Cloud. DevOps Wet 2018
Derek Ashmore
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
Introduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
Hadoop and Machine Learning
Hadoop and Machine Learning
joshwills
Similar to DataFrames: The Good, Bad, and Ugly
(20)
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Microservices with Terraform, Docker and the Cloud. DevOps Wet 2018
Microservices with Terraform, Docker and the Cloud. DevOps Wet 2018
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Introduction to Apache Kudu
Introduction to Apache Kudu
Hadoop and Machine Learning
Hadoop and Machine Learning
More from Wes McKinney
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
New Directions for Apache Arrow
New Directions for Apache Arrow
Wes McKinney
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Wes McKinney
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
PyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
Wes McKinney
More from Wes McKinney
(20)
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
New Directions for Apache Arrow
New Directions for Apache Arrow
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
PyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
Recently uploaded
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Deakin University
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
LBM Solutions
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
jimielynbastida
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
BookNet Canada
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Recently uploaded
(20)
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Key Features Of Token Development (1).pptx
Key Features Of Token Development (1).pptx
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
DataFrames: The Good, Bad, and Ugly
1.
1 © Cloudera,
Inc. All rights reserved. DataFrames: The Good, Bad, and Ugly Wes McKinney @wesmckinn NY R Conference, 2015-‐04-‐25
2.
2 © Cloudera,
Inc. All rights reserved. Disclaimer: the views presented in this talk are my personal opinions and not necessarily those of Cloudera
3.
3 © Cloudera,
Inc. All rights reserved. This talk • Some commentary on all the data frame interfaces out there • Biased observaTons and cursory judgments • Thoughts on craVing high quality data tools
4.
4 © Cloudera,
Inc. All rights reserved. Disclaimer #2: This is a nuanced discussion
5.
5 © Cloudera,
Inc. All rights reserved. Who am I? • Father of pandas (2008 -‐ ) • Financial analyTcs in R / Python starTng 2007 • 2010-‐2012 • Hiatus from gainful employment • Make pandas ready for primeTme • Write "Python for Data Analysis” • 2013-‐2014: DataPad with Chang She & co • 2014 -‐ : Cloudera
6.
6 © Cloudera,
Inc. All rights reserved. What’s in a DataFrame?
7.
7 © Cloudera,
Inc. All rights reserved. What’s in a DataFrame? A table with some rows By any other name would analyze as sweet.
8.
8 © Cloudera,
Inc. All rights reserved. Got a table?
9.
9 © Cloudera,
Inc. All rights reserved. Got a table? Put a DataFrame (interface) on it!
10.
10 © Cloudera,
Inc. All rights reserved. What is this “data frame” that you speak of • A table-‐like data structure • An API / user interface for the table • SelecTng data • Math and relaTonal algebra (join, filter, etc.) • File / database IO • ad infinitum
11.
11 © Cloudera,
Inc. All rights reserved. Some axes of comparison • Data structure internals (types, in-‐memory representaTon, etc.) • Basic table API • RelaTonal algebra support • Group-‐by / split-‐apply-‐combine API • Performance, memory use, evaluaTon semanTcs • Missing data • Data Tdying / ETL tools • IO uTliTes • Domain specific tools (e.g. Tme series) • …
12.
12 © Cloudera,
Inc. All rights reserved. The Great Data Tool Decoupling™ • Thesis: over Tme, user interfaces, data storage, and execuTon engines will decouple and specialize • In fact, you should really want this to happen • Share systems among languages • Reduce fragmentaTon and “lock-‐in” • ShiV developer focus to usability • PredicTon: we’ll be there by 2025; sooner if we all get our act together
13.
13 © Cloudera,
Inc. All rights reserved. CraVing quality data tools • Quality / usefulness is usually forged by the fire of basle • Real world use cases and social proof trump theory • Eat that dog food • When in doubt? Look at the test suite.
14.
14 © Cloudera,
Inc. All rights reserved. R data frames • Thin layer on top of R list type • Sequence of named vectors • Can have row names (any R vector) • Simple column and row selecTon API • AnalyTcs, data transformaTon, etc. leV to base package and add-‐on libraries • Richness / usability comes largely from libraries
15.
15 © Cloudera,
Inc. All rights reserved. Some awesome R data frame stuff • “Hadley Stack” • dplyr, Tdyr • legacy: plyr, reshape2 • ggplot2 • data.table (data.frame + indices, fast algorithms) • xts : Tme series
16.
16 © Cloudera,
Inc. All rights reserved. R data frames: rough edges • Copy-‐on-‐write semanTcs • API fragmentaTon / inconsistency • Use the “Hadley stack” for improved sanity • Factor / String dichotomy • stringsAsFactors=FALSE a blessing and curse • Somewhat limited type system
17.
17 © Cloudera,
Inc. All rights reserved. dplyr • Composable table API • Good example of what the “decoupled” future might look like • New in-‐memory R/RCpp execuTon engine • SQL backends for large subset of API
18.
18 © Cloudera,
Inc. All rights reserved. Spark DataFrames • R/pandas-‐inspired API for tabular data manipulaTon in Scala, Python, etc. • Logical operaTon graphs rewrisen internally in more efficient form • Good interop with Spark SQL • Some interoperability with pandas • ParTal API Decoupling! (it sTll binds you to Spark)
19.
19 © Cloudera,
Inc. All rights reserved. pandas • Several key data structures, data frame among them • Considerably more complex internals than other data frame libraries • Some good things • Born of need • A “baseries included” approach • Hierarchical axis labeling: addresses some hard use cases at expense of semanTc complexity • Strong Tme series support
20.
20 © Cloudera,
Inc. All rights reserved. pandas: rough edges • Axis labelling can get in the way for folks needing “just a table” • Ceded control of its type system / data rep’n from day 1 to NumPy • Inefficient string handling (uses NumPy object arrays) • Missing data handling less precise than other tools
21.
21 © Cloudera,
Inc. All rights reserved. Julia: DataFrames.jl • Started by Harlan Harris & co • Part of broader JuliaStats iniTaTve • More R-‐like than pandas-‐like • Very acTve: > 50 contributors! • STll comparaTvely early • Less comprehensive API • More limited IO capabiliTes
22.
22 © Cloudera,
Inc. All rights reserved. Other data frames • Saddle (Scala) • Dev’d by Adam Klein (ex-‐AQR) at Novus Partners (fintech startup) • Designed and used for financial use cases • Deedle (F# / .NET) • Dev’d by AK’s colleagues at BlueMountain (hedge fund) • GraphLab / Dato • Really good C++ data frame with Python interface • Dual-‐licensed: AGPL + Commercial • That’s not all! Haskell, Go, usw…
23.
23 © Cloudera,
Inc. All rights reserved. We’re not done yet • The future is JSON-‐like • Support for nested types / semi-‐structured data is sTll weak • Wanted: Apache-‐licensed, community standard C/C++ data frame that we all use (R, Python, Julia) • Bring on the Great Decoupling
24.
24 © Cloudera,
Inc. All rights reserved. Thank you @wesmckinn
Download now