SlideShare a Scribd company logo
1 of 20
Download to read offline
Getting started with
                             Pandas

                                    Maik Röder
                          Barcelona Python Meetup Group
                                    17.05.2012




Friday, May 18, 2012
Pandas
                       • Python data analysis library
                       • Built on top of Numpy
                       • Panel Data System
                       • Open Sourced by AQR Capital
                         Management, LLC in late 2009
                       • 30.000 lines of tested Python/Cython code
                       • Used in production in many companies

Friday, May 18, 2012
The ideal tool for data
                             scientists
                       • Munging data
                       • Cleaning data
                       • Analyzing
                       • Modeling data
                       • Organizing the results of the analysis into a
                         form suitable for plotting or tabular display


Friday, May 18, 2012
Installation
                       • Install Python 2.6.8 or later
                       • Current versions:
                        • Numpy 1.6.1 and Pandas 0.7.3
                       • Recommendation: Install with pip
                         pip install numpy
                         pip install pandas



Friday, May 18, 2012
Axis Indexing

                       • Every axis has an index
                       • Highly optimized data structure
                       • Hierarchical indexing
                       • group by and join-type operations


Friday, May 18, 2012
Series data structure
              • 1-dimensional
                       import numpy as np
                       randn = np.random.randn
                       from pandas import *
                       s = Series(randn(3),
                                  index=['a','b','c'])
                       s
                       a   -0.889880
                       b    1.102135
                       c   -2.187296


Friday, May 18, 2012
Series to/from dict
                       d = dict(s)
                       {'a': -0.88988001423312313,
                         'c': -2.1872960440695666,
                         'b': 1.1021347373670938}
                       Series(d)
                       a    -0.889880
                       b     1.102135
                       c    -2.187296
                 • Index comes from sorted dictionary keys
Friday, May 18, 2012
Reindexing labels
                       >>>   s
                       a     -0.496848
                       b       0.607173
                       c     -1.570596
                       >>>   s.reindex(['c','b','a'])
                       c     -1.570596
                       b       0.607173
                       a     -0.496848


Friday, May 18, 2012
Vectorization
                       >>> s + s
                       a   -1.779760
                       b    2.204269
                       c   -4.374592
                       >>> np.exp(s)
                       a    0.410705
                       b    3.010586
                       c    0.112220
                 • Series work with Numpy
Friday, May 18, 2012
Structured Data
          • Data that can be represented as tables
           • rows and columns
          • Each row is a different object
          • Columns represent attributes of the object




Friday, May 18, 2012
Structured data
                       • Like SQL Table or Excel Sheet
                       • Heterogeneous columns, but each column
                         homogeneously typed
                       • Row and column-oriented operations
                       • Axis meta data
                       • Seamless integration with Python data
                         structures and Numpy


Friday, May 18, 2012
DataFrame data structure

                       • Like data.frame in R
                       • 2-dimensional tabular data structure
                       • Data manipulation with integrated indexing
                       • Support heterogeneous columns
                       • Homogeneous columns

Friday, May 18, 2012
DataFrame

                       >>> d = {'one': s*s,
                                'two': s+s}
                       >>> DataFrame(d)
                               one       two
                       a 0.791886 -1.779760
                       b 1.214701 2.204269
                       c 4.784264 -4.374592



Friday, May 18, 2012
Dataframe add column
                       >>> s
                       a   -0.889880
                       b     1.102135
                       c   -2.187296
                       >>> df['three'] = s * 3
                       >>> df
                               one      two     three
                       a 0.791886 -1.779760 -2.669640
                       b 1.214701 2.204269 3.306404
                       c 4.784264 -4.374592 -6.561888
Friday, May 18, 2012
Select row by label
                 >>> row = df.xs('a')
                 one      0.791886
                 two     -1.779760
                 three   -2.669640
                 Name: a
                 >>> type(row)
                 <class'pandas.core.series.Series'>
                 >>> df.dtypes
                 one      float64
                 two      float64
                 three    float64
Friday, May 18, 2012
Descriptive statistics
                       >>> df.mean()
                       one      2.263617
                       two     -1.316694
                       three   -1.975041
                 • Also: count, sum, median, min, max, abs, prod,
                       std, var, skew, kurt, quantile, cumsum,
                       cumprod, cummax, cummin


Friday, May 18, 2012
Computational Tools

                 • Covariance
                       >>> s1 = Series(randn(1000))
                       >>> s2 = Series(randn(1000))
                       >>> s1.cov(s2)
                       0.013973709323221539
                 • Also: pearson, kendall, spearman


Friday, May 18, 2012
This and much more...
                       • Group by: split-apply-combine
                       • Merge, join and aggregate
                       • Reshaping and Pivot Tables
                       • Time Series / Date functionality
                       • Plotting with matplotlib
                       • IO Tools (Text, CSV, HDF5, ...)
                       • Sparse data structures
Friday, May 18, 2012
Resources


                       • http://pypi.python.org/pypi/pandas
                       • http://code.google.com/p/pandas


Friday, May 18, 2012
Book coming soon...




Friday, May 18, 2012

More Related Content

What's hot

Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPyHuy Nguyen
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPyDevashish Kumar
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1Jatin Miglani
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Cryptography & Steganography
Cryptography & SteganographyCryptography & Steganography
Cryptography & SteganographyAnimesh Shaw
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Lecture 10 intruders
Lecture 10 intrudersLecture 10 intruders
Lecture 10 intrudersrajakhurram
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)PyData
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python amiable_indian
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaEdureka!
 
OSINT: Open Source Intelligence - Rohan Braganza
OSINT: Open Source Intelligence - Rohan BraganzaOSINT: Open Source Intelligence - Rohan Braganza
OSINT: Open Source Intelligence - Rohan BraganzaNSConclave
 

What's hot (20)

Introduction to NumPy
Introduction to NumPyIntroduction to NumPy
Introduction to NumPy
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1
 
NumPy
NumPyNumPy
NumPy
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Cryptography & Steganography
Cryptography & SteganographyCryptography & Steganography
Cryptography & Steganography
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Python programming : Strings
Python programming : StringsPython programming : Strings
Python programming : Strings
 
Pandas
PandasPandas
Pandas
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Lecture 10 intruders
Lecture 10 intrudersLecture 10 intruders
Lecture 10 intruders
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | Edureka
 
OSINT: Open Source Intelligence - Rohan Braganza
OSINT: Open Source Intelligence - Rohan BraganzaOSINT: Open Source Intelligence - Rohan Braganza
OSINT: Open Source Intelligence - Rohan Braganza
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 

Similar to Getting started with pandas

Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges finalRajesh M
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...Kangaroot
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...
Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...
Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...Sri Ambati
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and developmentWes McKinney
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionseXascale Infolab
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsVíctor Zabalza
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
ClickHouse 2018.  How to stop waiting for your queries to complete and start ...ClickHouse 2018.  How to stop waiting for your queries to complete and start ...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...Altinity Ltd
 
Data science in Node.js
Data science in Node.jsData science in Node.js
Data science in Node.jsSean Byrnes
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 
Using the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesUsing the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesTiffany Timbers
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduceohkura
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 

Similar to Getting started with pandas (20)

Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges final
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
ggplotcourse.pptx
ggplotcourse.pptxggplotcourse.pptx
ggplotcourse.pptx
 
Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...
Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...
Feature Engineering in H2O Driverless AI - Dmitry Larko - H2O AI World London...
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web Collections
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
ClickHouse 2018.  How to stop waiting for your queries to complete and start ...ClickHouse 2018.  How to stop waiting for your queries to complete and start ...
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
 
Data science in Node.js
Data science in Node.jsData science in Node.js
Data science in Node.js
 
Quick dive to pandas
Quick dive to pandasQuick dive to pandas
Quick dive to pandas
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Using the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesUsing the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slides
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Quick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map ReduceQuick Wikipedia Mining using Elastic Map Reduce
Quick Wikipedia Mining using Elastic Map Reduce
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 

More from maikroeder

Encode RNA Dashboard
Encode RNA DashboardEncode RNA Dashboard
Encode RNA Dashboardmaikroeder
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2maikroeder
 
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...maikroeder
 
Cms - Content Management System Utilities for Django
Cms - Content Management System Utilities for DjangoCms - Content Management System Utilities for Django
Cms - Content Management System Utilities for Djangomaikroeder
 
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik RöderPlone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Rödermaikroeder
 

More from maikroeder (6)

Google charts
Google chartsGoogle charts
Google charts
 
Encode RNA Dashboard
Encode RNA DashboardEncode RNA Dashboard
Encode RNA Dashboard
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2
 
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
Repoze Bfg - presented by Rok Garbas at the Python Barcelona Meetup October 2...
 
Cms - Content Management System Utilities for Django
Cms - Content Management System Utilities for DjangoCms - Content Management System Utilities for Django
Cms - Content Management System Utilities for Django
 
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik RöderPlone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
Plone Conference 2007: Acceptance Testing In Plone Using Funittest - Maik Röder
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Getting started with pandas

  • 1. Getting started with Pandas Maik Röder Barcelona Python Meetup Group 17.05.2012 Friday, May 18, 2012
  • 2. Pandas • Python data analysis library • Built on top of Numpy • Panel Data System • Open Sourced by AQR Capital Management, LLC in late 2009 • 30.000 lines of tested Python/Cython code • Used in production in many companies Friday, May 18, 2012
  • 3. The ideal tool for data scientists • Munging data • Cleaning data • Analyzing • Modeling data • Organizing the results of the analysis into a form suitable for plotting or tabular display Friday, May 18, 2012
  • 4. Installation • Install Python 2.6.8 or later • Current versions: • Numpy 1.6.1 and Pandas 0.7.3 • Recommendation: Install with pip pip install numpy pip install pandas Friday, May 18, 2012
  • 5. Axis Indexing • Every axis has an index • Highly optimized data structure • Hierarchical indexing • group by and join-type operations Friday, May 18, 2012
  • 6. Series data structure • 1-dimensional import numpy as np randn = np.random.randn from pandas import * s = Series(randn(3), index=['a','b','c']) s a -0.889880 b 1.102135 c -2.187296 Friday, May 18, 2012
  • 7. Series to/from dict d = dict(s) {'a': -0.88988001423312313, 'c': -2.1872960440695666, 'b': 1.1021347373670938} Series(d) a -0.889880 b 1.102135 c -2.187296 • Index comes from sorted dictionary keys Friday, May 18, 2012
  • 8. Reindexing labels >>> s a -0.496848 b 0.607173 c -1.570596 >>> s.reindex(['c','b','a']) c -1.570596 b 0.607173 a -0.496848 Friday, May 18, 2012
  • 9. Vectorization >>> s + s a -1.779760 b 2.204269 c -4.374592 >>> np.exp(s) a 0.410705 b 3.010586 c 0.112220 • Series work with Numpy Friday, May 18, 2012
  • 10. Structured Data • Data that can be represented as tables • rows and columns • Each row is a different object • Columns represent attributes of the object Friday, May 18, 2012
  • 11. Structured data • Like SQL Table or Excel Sheet • Heterogeneous columns, but each column homogeneously typed • Row and column-oriented operations • Axis meta data • Seamless integration with Python data structures and Numpy Friday, May 18, 2012
  • 12. DataFrame data structure • Like data.frame in R • 2-dimensional tabular data structure • Data manipulation with integrated indexing • Support heterogeneous columns • Homogeneous columns Friday, May 18, 2012
  • 13. DataFrame >>> d = {'one': s*s, 'two': s+s} >>> DataFrame(d) one two a 0.791886 -1.779760 b 1.214701 2.204269 c 4.784264 -4.374592 Friday, May 18, 2012
  • 14. Dataframe add column >>> s a -0.889880 b 1.102135 c -2.187296 >>> df['three'] = s * 3 >>> df one two three a 0.791886 -1.779760 -2.669640 b 1.214701 2.204269 3.306404 c 4.784264 -4.374592 -6.561888 Friday, May 18, 2012
  • 15. Select row by label >>> row = df.xs('a') one 0.791886 two -1.779760 three -2.669640 Name: a >>> type(row) <class'pandas.core.series.Series'> >>> df.dtypes one float64 two float64 three float64 Friday, May 18, 2012
  • 16. Descriptive statistics >>> df.mean() one 2.263617 two -1.316694 three -1.975041 • Also: count, sum, median, min, max, abs, prod, std, var, skew, kurt, quantile, cumsum, cumprod, cummax, cummin Friday, May 18, 2012
  • 17. Computational Tools • Covariance >>> s1 = Series(randn(1000)) >>> s2 = Series(randn(1000)) >>> s1.cov(s2) 0.013973709323221539 • Also: pearson, kendall, spearman Friday, May 18, 2012
  • 18. This and much more... • Group by: split-apply-combine • Merge, join and aggregate • Reshaping and Pivot Tables • Time Series / Date functionality • Plotting with matplotlib • IO Tools (Text, CSV, HDF5, ...) • Sparse data structures Friday, May 18, 2012
  • 19. Resources • http://pypi.python.org/pypi/pandas • http://code.google.com/p/pandas Friday, May 18, 2012