SlideShare a Scribd company logo
What’s new in pandas and
the SciPy stack for financial
           users
         Wes McKinney
Me
•   AQR: August 2007 - July 2010

•   Duke Statistics: 2010 - present (now on leave)

•   My plans

    •   Improving Python libs for statistics and finance

    •   Building a financial software + consulting business
        based on said tools
Core Python stack for finance
• NumPy, SciPy (heavy lifting)
• pandas (data handling / computation)
• IPython (dev and research env)
• Cython (perf optimization)
• matplotlib (visualization)
• statsmodels (statistics / econometrics)
General sentiments
•   Scientific Python growing solidly in finance and
    in many other fields

    •   Though good sci-pythonistas are still scarce

•   Important work happening in many of the core
    projects

•   Growing consensus: a new computational
    model is needed to better cope with “big data”
NumPy
• Significantly refactored C internals
• Great progress on native datetime64 type
 • Will significantly improve date-handling
    performance and usability
 • Extensible business day / holiday logic
    planned / in progress
• Addition of low-level missing data (NA)
  support in the works
IPython

• One of Python’s killer apps gets even better
• Rich Qt GUI console with inline plotting
• New and improved architecture for high perf
  parallel / distributed computing
• See Fernando Pérez’s SciPy 2011 talk / video
Cython
• Still the first tool you should reach for to get
  better performance
• New: OpenMP integration (for multi-core)
     with nogil:
         for i in prange(n):
             # do something in parallel

• Supports (almost) all of standard Python now
  (some things, like closures, used to not work)
statsmodels
•   Statistics and econometrics in Python

•   Major work in time series models over last year+

    •   VAR, SVAR models, eventually (V)ECM models
        for cointegrated time series

    •   AR/ARMA, Kalman Filter, various macro filters
        (e.g. Hodrick-Prescott) implemented

    •   Soon: Bayesian state space models (DLMs),
        ARCH/GARCH models, etc.
statsmodels
• Major criticism: weak user interface
 • No R-style formula framework
 • pandas not integrated (need to pass raw
    NumPy arrays)
• I have begun work on pandas integration,
  formulas have been implemented and will
  hopefully arrive within the next few months
pandas
• Still the Python data hacker’s best friend?
• Most recent release: 0.3.0 on 2/20/2011
• However, last 4 months have been the most
  active development period in the library’s
  history
• ~375 commits since 0.3.0 release (more than
  the entire prior open source history)
The state of data structures
Ambitious big picture

• I want to make pandas the cornerstone of the
  “next generation” statistical computing
  environment
• Ease-of-use, performance, flexibility all equally
  important
Ambitious big picture

• Taking the best features of other languages (R
  and friends) and making them better and
  easier to use
• See my recent blog article “A Roadmap for
  Rich Scientific Data Structures in Python”
pandas: under the hood
• Complete redesign of DataFrame internals
 • Now a single class for 2D data retaining
     optimal performance of old DataFrame and
     DataMatrix classes
 •   Significantly improved mixed-type and missing
     data handling
 •   Plan to use internal data structure to
     implement “NDFrame” for n-dimensional data
Fancy indexing
• Index a Series / DataFrame in a matrix-like
  way via special .ix attribute, use:
  • Slices with integers or labels
  • Lists of integers, labels, or boolean vecs
  • Integer or label locations
  df.ix[0]
  df.ix[date1:date2]
  df.ix[:5, ‘A’:’F’]

  df.ix[df[‘A’] > 0, [‘B’, ‘C’, ‘D’]] = nan
Misc new features
• “Sparse” (mostly NA) versions of Series,
  DataFrame, WidePanel
• Many new functions on Series/DataFrame
 • describe, quantile, select, drop, dropna,
    corrwith, ...
• New moving window methods: rolling_quantile
  and rolling_apply
Improved IO
• read_csv, read_table functions more
  flexible and robust, better type inferencing

 df = read_table(‘foo.txt’, skiprows=[0,1],
                     na_values=[‘#N/A’])


• ExcelFile class for reading multiple sheets
  out of .xls files
Improved IO
• HDFStore class provides a complete, tested
  dict-like PyTables storage container
       store = HDFStore(‘mydata.h5’)
       store[‘x’] = x
       store[‘y’] = y
       y = store[‘y’]

• Experimental: store as Table and query
      store.put('df', df, table=True)
      piece = store.select(‘df’,
          [{‘field’ : ‘index’, ‘op’ : ‘>=’,
            ‘value’ : date}])
Group by enhancements
• Can group by multiple columns or key
  functions, SQL-like but more general
• Syntactic sugar to invoke aggregation
  functions on groups
• Automatic exclusion of “nuisance”
  columns of DataFrames
• Various other usability enhancements
Very soon: hierarchical indexing

• Enable axis ticks to be identified by multiple
  labels instead of a single label
• Easily select subsets of data by “level”
• Create Excel-style pivot tables / cross-
  tabulations in a sensible way
• Will integrate naturally with groupby
Other misc things

• Flexible binary operators
 • a.add(b, fill_value=0.)
• Some timezone support in DateRange
• Numerous performance optimizations
• See the (long) release notes =)
Planned work

• Fast time series up/downsampling
• Improved support and perf for HF/tick data
• Even more sophisticated group by tools
• Better documentation, online screencast
  tutorials / examples
Thanks

• Email: wesmckinn@gmail.com
• Twitter: @wesmckinn
• Blog: http://blog.wesmckinney.com
• pandas: http://github.com/wesm/pandas
• statsmodels: http://statsmodels.sourceforge.net

More Related Content

What's hot

Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
Jonathan Holloway
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
Databricks
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
Modern Data Stack France
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
Databricks
 
Cascalog
CascalogCascalog
Cascalog
nathanmarz
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Databricks
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
BigMine
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
Databricks
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
Wes McKinney
 

What's hot (20)

Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Cascalog
CascalogCascalog
Cascalog
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
 
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data CitizenEnabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 

Viewers also liked

My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
 
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and UglyDataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
Wes McKinney
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
 
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the FuturePython Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
 
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsStructured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
Wes McKinney
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
Wes McKinney
 
Improving data interoperability in Python and R
Improving data interoperability in Python and RImproving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
Wes McKinney
 
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceIbis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
Wes McKinney
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
Wes McKinney
 
Data Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodelsData Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodels
Wes McKinney
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
Wes McKinney
 
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceRaising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
 
Road to Analytics
Road to AnalyticsRoad to Analytics
Road to Analytics
Datio Big Data
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 

Viewers also liked (19)

My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
 
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and UglyDataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
 
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the FuturePython Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
 
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsStructured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
 
Improving data interoperability in Python and R
Improving data interoperability in Python and RImproving data interoperability in Python and R
Improving data interoperability in Python and R
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceIbis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
Data Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodelsData Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodels
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
 
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data ScienceRaising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Road to Analytics
Road to AnalyticsRoad to Analytics
Road to Analytics
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
 

Similar to What's new in pandas and the SciPy stack for financial users

Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
Kellyn Pot'Vin-Gorman
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
kammeyer
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Python ml
Python mlPython ml
Python ml
Shubham Sharma
 
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTablesMWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
Michael Smith
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
Thomas Hütter
 
Power BI / AAS Data Model Optimization 101 v2
Power BI / AAS Data Model Optimization 101 v2Power BI / AAS Data Model Optimization 101 v2
Power BI / AAS Data Model Optimization 101 v2
Dan English
 
Week4
Week4Week4
Week4
Tony Hirst
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
Big Data Spain
 
Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches
Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches
Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches
Rui Romano
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
Mydbops
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - Public
Julian Payne
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
Evan Chan
 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
dhaferbenali1
 
SAP HANA_class1.pptx
SAP HANA_class1.pptxSAP HANA_class1.pptx
SAP HANA_class1.pptx
SudhaVukkalkar1
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
Simon Belak
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
Open Day May 2016
Open Day May 2016Open Day May 2016
Open Day May 2016
Neil Lasrado
 

Similar to What's new in pandas and the SciPy stack for financial users (20)

Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Python ml
Python mlPython ml
Python ml
 
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTablesMWLUG 2016 : AD117 : Xpages & jQuery DataTables
MWLUG 2016 : AD117 : Xpages & jQuery DataTables
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Power BI / AAS Data Model Optimization 101 v2
Power BI / AAS Data Model Optimization 101 v2Power BI / AAS Data Model Optimization 101 v2
Power BI / AAS Data Model Optimization 101 v2
 
Week4
Week4Week4
Week4
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches
Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches
Top 10 of Data & BI Summit Series: Power BI Tips & Tricks from the Trenches
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - Public
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
 
SAP HANA_class1.pptx
SAP HANA_class1.pptxSAP HANA_class1.pptx
SAP HANA_class1.pptx
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Open Day May 2016
Open Day May 2016Open Day May 2016
Open Day May 2016
 

More from Wes McKinney

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
 
Shared Infrastructure for Data Science
Shared Infrastructure for Data ScienceShared Infrastructure for Data Science
Shared Infrastructure for Data Science
Wes McKinney
 
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
 
PyCon APAC 2016 Keynote
PyCon APAC 2016 KeynotePyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
Wes McKinney
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
 

More from Wes McKinney (19)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Shared Infrastructure for Data Science
Shared Infrastructure for Data ScienceShared Infrastructure for Data Science
Shared Infrastructure for Data Science
 
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
 
PyCon APAC 2016 Keynote
PyCon APAC 2016 KeynotePyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
 

Recently uploaded

Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
janagijoythi
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
bhumivarma35300
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
Priyanka Aash
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 

Recently uploaded (20)

Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptxMAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
MAKE MONEY ONLINE Unlock Your Income Potential Today.pptx
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 

What's new in pandas and the SciPy stack for financial users

  • 1. What’s new in pandas and the SciPy stack for financial users Wes McKinney
  • 2. Me • AQR: August 2007 - July 2010 • Duke Statistics: 2010 - present (now on leave) • My plans • Improving Python libs for statistics and finance • Building a financial software + consulting business based on said tools
  • 3. Core Python stack for finance • NumPy, SciPy (heavy lifting) • pandas (data handling / computation) • IPython (dev and research env) • Cython (perf optimization) • matplotlib (visualization) • statsmodels (statistics / econometrics)
  • 4. General sentiments • Scientific Python growing solidly in finance and in many other fields • Though good sci-pythonistas are still scarce • Important work happening in many of the core projects • Growing consensus: a new computational model is needed to better cope with “big data”
  • 5. NumPy • Significantly refactored C internals • Great progress on native datetime64 type • Will significantly improve date-handling performance and usability • Extensible business day / holiday logic planned / in progress • Addition of low-level missing data (NA) support in the works
  • 6. IPython • One of Python’s killer apps gets even better • Rich Qt GUI console with inline plotting • New and improved architecture for high perf parallel / distributed computing • See Fernando Pérez’s SciPy 2011 talk / video
  • 7. Cython • Still the first tool you should reach for to get better performance • New: OpenMP integration (for multi-core) with nogil: for i in prange(n): # do something in parallel • Supports (almost) all of standard Python now (some things, like closures, used to not work)
  • 8. statsmodels • Statistics and econometrics in Python • Major work in time series models over last year+ • VAR, SVAR models, eventually (V)ECM models for cointegrated time series • AR/ARMA, Kalman Filter, various macro filters (e.g. Hodrick-Prescott) implemented • Soon: Bayesian state space models (DLMs), ARCH/GARCH models, etc.
  • 9. statsmodels • Major criticism: weak user interface • No R-style formula framework • pandas not integrated (need to pass raw NumPy arrays) • I have begun work on pandas integration, formulas have been implemented and will hopefully arrive within the next few months
  • 10. pandas • Still the Python data hacker’s best friend? • Most recent release: 0.3.0 on 2/20/2011 • However, last 4 months have been the most active development period in the library’s history • ~375 commits since 0.3.0 release (more than the entire prior open source history)
  • 11. The state of data structures
  • 12. Ambitious big picture • I want to make pandas the cornerstone of the “next generation” statistical computing environment • Ease-of-use, performance, flexibility all equally important
  • 13. Ambitious big picture • Taking the best features of other languages (R and friends) and making them better and easier to use • See my recent blog article “A Roadmap for Rich Scientific Data Structures in Python”
  • 14. pandas: under the hood • Complete redesign of DataFrame internals • Now a single class for 2D data retaining optimal performance of old DataFrame and DataMatrix classes • Significantly improved mixed-type and missing data handling • Plan to use internal data structure to implement “NDFrame” for n-dimensional data
  • 15. Fancy indexing • Index a Series / DataFrame in a matrix-like way via special .ix attribute, use: • Slices with integers or labels • Lists of integers, labels, or boolean vecs • Integer or label locations df.ix[0] df.ix[date1:date2] df.ix[:5, ‘A’:’F’] df.ix[df[‘A’] > 0, [‘B’, ‘C’, ‘D’]] = nan
  • 16. Misc new features • “Sparse” (mostly NA) versions of Series, DataFrame, WidePanel • Many new functions on Series/DataFrame • describe, quantile, select, drop, dropna, corrwith, ... • New moving window methods: rolling_quantile and rolling_apply
  • 17. Improved IO • read_csv, read_table functions more flexible and robust, better type inferencing df = read_table(‘foo.txt’, skiprows=[0,1], na_values=[‘#N/A’]) • ExcelFile class for reading multiple sheets out of .xls files
  • 18. Improved IO • HDFStore class provides a complete, tested dict-like PyTables storage container store = HDFStore(‘mydata.h5’) store[‘x’] = x store[‘y’] = y y = store[‘y’] • Experimental: store as Table and query store.put('df', df, table=True) piece = store.select(‘df’, [{‘field’ : ‘index’, ‘op’ : ‘>=’, ‘value’ : date}])
  • 19. Group by enhancements • Can group by multiple columns or key functions, SQL-like but more general • Syntactic sugar to invoke aggregation functions on groups • Automatic exclusion of “nuisance” columns of DataFrames • Various other usability enhancements
  • 20. Very soon: hierarchical indexing • Enable axis ticks to be identified by multiple labels instead of a single label • Easily select subsets of data by “level” • Create Excel-style pivot tables / cross- tabulations in a sensible way • Will integrate naturally with groupby
  • 21. Other misc things • Flexible binary operators • a.add(b, fill_value=0.) • Some timezone support in DateRange • Numerous performance optimizations • See the (long) release notes =)
  • 22. Planned work • Fast time series up/downsampling • Improved support and perf for HF/tick data • Even more sophisticated group by tools • Better documentation, online screencast tutorials / examples
  • 23. Thanks • Email: wesmckinn@gmail.com • Twitter: @wesmckinn • Blog: http://blog.wesmckinney.com • pandas: http://github.com/wesm/pandas • statsmodels: http://statsmodels.sourceforge.net