SlideShare a Scribd company logo
1 of 19
Download to read offline
Princeton Research
Data Management
Workshop 2020
Co-sponsored by the Center for Digital Humanities, the Center for Statistics and Machine Learning, the Office of
the Dean for Research, and Data-Driven Social Science Initiative
Organized by Princeton University Library’s Princeton Research
Data Service, Princeton Institute for Computational Science and
Engineering, and OIT Research Computing
Day Two:
Break-out Session:
Python, Numpy, Pandas
Python, Numpy, and Pandas
Henry Schreiner, PICSiE/PHY
henryfs@princeton.edu
2020 Research Data Management Workshop
Python for data science
● Second most popular
language on GitHub
● General purpose
● Only Data Science
language in top 10
● Over 200K PyPI
packages, 1.6 billion
releases
Python for data science
● Another metric (PYPL, Google-based) has it #1
● Data Science languages shown below
● Python fastest growing
● R peaked around 2017
● Others also in decline
● Note the log scale!
Timeline
● 1994: Python 1.0 released
● 1995: First array package: Numeric
● 2003: Matplotlib
● 2005: Numeric and numarray merged into Numpy
● 2008: Pandas introduced
● 2012: The Anaconda python distribution
Timeline
● 2012: Numba JIT compiler
● 2014: IPython becomes Jupyter project & notebook
● 2016: LIGO's discovery: Jupyter Notebook + Python
● 2017: Google releases TensorFlow (Python)
● Now: All Machine Learning libraries are primarily or
exclusively used via Python
Why Python?
What makes Python
special?
● Great interactivity
● General purpose
● Weaknesses filled
by libraries and
services
Python: the language
● Simple
● Easy to
learn
● Flexible and
powerful
● Object
Oriented
def square(x):
return x**2
print(square(4))
# Prints 4
IPython
● Adds interactive features to
Python
○ Timing chunks of code
○ Shell-like features
○ Fancy display system
%cd my_dir
%%timeit
run_long()
! ./program
Jupyter Notebooks
● Cell-based HTML
document
● Supports many
kernels (IPython was
first and is the most
popular)
● Interleave
documentation, code,
and output
Jupyter Lab
● Holds multiple
views of
○ Notebooks
○ Output
○ Editors
○ Terminals
Jupyter Hub
● Multiuser notebook or lab instances
● Available at mybinder.org or through Princeton Research
Computing
Example: Runge-Kutta static notebook, runnable mybinder
Libraries
PyPI
● The core service for
Python libraries
● Uses pip to install
● Environment
management separate
Anaconda
● Can package Python
and complex libraries
● Uses conda to install
● Environment manager
too (reproducible)
● conda-forge is
community effort
Numpy
● Adds an array type
● Fast computations
array-at-a-time
● Python and Numpy now
define a standard protocol
for arrays
● A library that replaces
langagues like ADL
import numpy as np
v = np.array([1,2,3])
print(v**2)
# Prints 1, 4, 9
Pandas
● Tabular data
○ A library that replaces languages like R and Excel
○ Designed with interactivity in mind
● Other libraries mimic Pandas’ API
Numba
● Adds full JIT (just in time) compiler to Python
● Compiles normal python functions into LLVM
● Growing subset of Python and Numpy
● Can be as fast as any compiled language
● Supports parallel computation, GPUs, and more
Other libraries of note
● CuPY: CUDA with a numpy interface
● TensorFlow/PyTorch: Machine learning libraries
● Matplotlib: The plotting library for Python
● PyQt/PySide: Bindings to Qt Graphical User Interface
● PyBind11: Easy C++ bindings
Summary
● Python is wildly popular, simple to learn, and well
supported
● Python has an impressive collection of tools
○ Interactivity: IPython, Jupyter
○ Package delivery: PyPI (pip), Conda
○ Libraries: Numpy, Pandas, and many more
Demo
● The second half is devoted to a Pandas demo session

More Related Content

What's hot

Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonMohammed Rafi
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsPhoenix
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonManishJha237
 
Python variables and data types.pptx
Python variables and data types.pptxPython variables and data types.pptx
Python variables and data types.pptxAkshayAggarwal79
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksEueung Mulyana
 
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAChapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAMaulik Borsaniya
 
Python Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, DictionaryPython Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, DictionarySoba Arjun
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1Jatin Miglani
 
Chapitre1: Langage Python
Chapitre1: Langage PythonChapitre1: Langage Python
Chapitre1: Langage PythonAziz Darouichi
 
Intro to Python Programming Language
Intro to Python Programming LanguageIntro to Python Programming Language
Intro to Python Programming LanguageDipankar Achinta
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonAgung Wahyudi
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
Introduction to python programming
Introduction to python programmingIntroduction to python programming
Introduction to python programmingSrinivas Narasegouda
 

What's hot (20)

Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Introduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data AnalyticsIntroduction to Python Pandas for Data Analytics
Introduction to Python Pandas for Data Analytics
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Python variables and data types.pptx
Python variables and data types.pptxPython variables and data types.pptx
Python variables and data types.pptx
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter Notebooks
 
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYAChapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
Chapter 1 - INTRODUCTION TO PYTHON -MAULIK BORSANIYA
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Introduction to numpy
Introduction to numpyIntroduction to numpy
Introduction to numpy
 
Python Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, DictionaryPython Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, Dictionary
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
Python - the basics
Python - the basicsPython - the basics
Python - the basics
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1
 
Chapitre1: Langage Python
Chapitre1: Langage PythonChapitre1: Langage Python
Chapitre1: Langage Python
 
Pandas
PandasPandas
Pandas
 
Python
PythonPython
Python
 
Intro to Python Programming Language
Intro to Python Programming LanguageIntro to Python Programming Language
Intro to Python Programming Language
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Matplotlib
MatplotlibMatplotlib
Matplotlib
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
Introduction to python programming
Introduction to python programmingIntroduction to python programming
Introduction to python programming
 

Similar to RDM 2020: Python, Numpy, and Pandas

Python workshop
Python workshopPython workshop
Python workshopShiraz LUG
 
Python Introduction its a oop language and easy to use
Python Introduction its a oop language and easy to usePython Introduction its a oop language and easy to use
Python Introduction its a oop language and easy to useSrajanCollege1
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryMarcus Hanwell
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and SparkFelix Crisan
 
Python, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for EngineersPython, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for EngineersBoey Pak Cheong
 
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfA Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfjagan477830
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in pythonUmmeSalmaM1
 
Python in geospatial analysis
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysisSakthivel R
 
Python in Industry
Python in IndustryPython in Industry
Python in IndustryDharmit Shah
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with PythonSivaranjan Goswami
 
Python and big data : a good match?
Python and big data : a good match?Python and big data : a good match?
Python and big data : a good match?PyDataParis
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
SoC Python Discussion Group
SoC Python Discussion GroupSoC Python Discussion Group
SoC Python Discussion Groupkrishna_dubba
 

Similar to RDM 2020: Python, Numpy, and Pandas (20)

Python workshop
Python workshopPython workshop
Python workshop
 
Python workshop
Python workshopPython workshop
Python workshop
 
Python Introduction its a oop language and easy to use
Python Introduction its a oop language and easy to usePython Introduction its a oop language and easy to use
Python Introduction its a oop language and easy to use
 
London level39
London level39London level39
London level39
 
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Why learn python in 2017?
Why learn python in 2017?Why learn python in 2017?
Why learn python in 2017?
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
 
Python, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for EngineersPython, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for Engineers
 
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfA Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Python in geospatial analysis
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysis
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
 
Python and big data : a good match?
Python and big data : a good match?Python and big data : a good match?
Python and big data : a good match?
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
SoC Python Discussion Group
SoC Python Discussion GroupSoC Python Discussion Group
SoC Python Discussion Group
 

More from Henry Schreiner

Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Henry Schreiner
 
Princeton RSE Peer network first meeting
Princeton RSE Peer network first meetingPrinceton RSE Peer network first meeting
Princeton RSE Peer network first meetingHenry Schreiner
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Henry Schreiner
 
Princeton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance ToolingPrinceton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance ToolingHenry Schreiner
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you neededHenry Schreiner
 
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...Henry Schreiner
 
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingPyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingHenry Schreiner
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsHenry Schreiner
 
boost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingboost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingHenry Schreiner
 
Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingHenry Schreiner
 
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHenry Schreiner
 
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHenry Schreiner
 
ACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingHenry Schreiner
 
2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexingHenry Schreiner
 
2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and hist2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and histHenry Schreiner
 
IRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and HistIRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and HistHenry Schreiner
 

More from Henry Schreiner (20)

Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024
 
Princeton RSE Peer network first meeting
Princeton RSE Peer network first meetingPrinceton RSE Peer network first meeting
Princeton RSE Peer network first meeting
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023
 
Princeton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance ToolingPrinceton Wintersession: Software Quality Assurance Tooling
Princeton Wintersession: Software Quality Assurance Tooling
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
Everything you didn't know you needed
Everything you didn't know you neededEverything you didn't know you needed
Everything you didn't know you needed
 
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
 
SciPy 2022 Scikit-HEP
SciPy 2022 Scikit-HEPSciPy 2022 Scikit-HEP
SciPy 2022 Scikit-HEP
 
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packagingPyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python Extensions
 
boost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meetingboost-histogram / Hist: PyHEP Topical meeting
boost-histogram / Hist: PyHEP Topical meeting
 
Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meeting
 
CMake best practices
CMake best practicesCMake best practices
CMake best practices
 
Pybind11 - SciPy 2021
Pybind11 - SciPy 2021Pybind11 - SciPy 2021
Pybind11 - SciPy 2021
 
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex ReconstructionHOW 2019: Machine Learning for the Primary Vertex Reconstruction
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
 
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutesHOW 2019: A complete reproducible ROOT environment in under 5 minutes
HOW 2019: A complete reproducible ROOT environment in under 5 minutes
 
ACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexingACAT 2019: A hybrid deep learning approach to vertexing
ACAT 2019: A hybrid deep learning approach to vertexing
 
2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing2019 CtD: A hybrid deep learning approach to vertexing
2019 CtD: A hybrid deep learning approach to vertexing
 
2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and hist2019 IRIS-HEP AS workshop: Boost-histogram and hist
2019 IRIS-HEP AS workshop: Boost-histogram and hist
 
IRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and HistIRIS-HEP: Boost-histogram and Hist
IRIS-HEP: Boost-histogram and Hist
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

RDM 2020: Python, Numpy, and Pandas

  • 1. Princeton Research Data Management Workshop 2020 Co-sponsored by the Center for Digital Humanities, the Center for Statistics and Machine Learning, the Office of the Dean for Research, and Data-Driven Social Science Initiative Organized by Princeton University Library’s Princeton Research Data Service, Princeton Institute for Computational Science and Engineering, and OIT Research Computing Day Two: Break-out Session: Python, Numpy, Pandas
  • 2. Python, Numpy, and Pandas Henry Schreiner, PICSiE/PHY henryfs@princeton.edu 2020 Research Data Management Workshop
  • 3. Python for data science ● Second most popular language on GitHub ● General purpose ● Only Data Science language in top 10 ● Over 200K PyPI packages, 1.6 billion releases
  • 4. Python for data science ● Another metric (PYPL, Google-based) has it #1 ● Data Science languages shown below ● Python fastest growing ● R peaked around 2017 ● Others also in decline ● Note the log scale!
  • 5. Timeline ● 1994: Python 1.0 released ● 1995: First array package: Numeric ● 2003: Matplotlib ● 2005: Numeric and numarray merged into Numpy ● 2008: Pandas introduced ● 2012: The Anaconda python distribution
  • 6. Timeline ● 2012: Numba JIT compiler ● 2014: IPython becomes Jupyter project & notebook ● 2016: LIGO's discovery: Jupyter Notebook + Python ● 2017: Google releases TensorFlow (Python) ● Now: All Machine Learning libraries are primarily or exclusively used via Python
  • 7. Why Python? What makes Python special? ● Great interactivity ● General purpose ● Weaknesses filled by libraries and services
  • 8. Python: the language ● Simple ● Easy to learn ● Flexible and powerful ● Object Oriented def square(x): return x**2 print(square(4)) # Prints 4
  • 9. IPython ● Adds interactive features to Python ○ Timing chunks of code ○ Shell-like features ○ Fancy display system %cd my_dir %%timeit run_long() ! ./program
  • 10. Jupyter Notebooks ● Cell-based HTML document ● Supports many kernels (IPython was first and is the most popular) ● Interleave documentation, code, and output
  • 11. Jupyter Lab ● Holds multiple views of ○ Notebooks ○ Output ○ Editors ○ Terminals
  • 12. Jupyter Hub ● Multiuser notebook or lab instances ● Available at mybinder.org or through Princeton Research Computing Example: Runge-Kutta static notebook, runnable mybinder
  • 13. Libraries PyPI ● The core service for Python libraries ● Uses pip to install ● Environment management separate Anaconda ● Can package Python and complex libraries ● Uses conda to install ● Environment manager too (reproducible) ● conda-forge is community effort
  • 14. Numpy ● Adds an array type ● Fast computations array-at-a-time ● Python and Numpy now define a standard protocol for arrays ● A library that replaces langagues like ADL import numpy as np v = np.array([1,2,3]) print(v**2) # Prints 1, 4, 9
  • 15. Pandas ● Tabular data ○ A library that replaces languages like R and Excel ○ Designed with interactivity in mind ● Other libraries mimic Pandas’ API
  • 16. Numba ● Adds full JIT (just in time) compiler to Python ● Compiles normal python functions into LLVM ● Growing subset of Python and Numpy ● Can be as fast as any compiled language ● Supports parallel computation, GPUs, and more
  • 17. Other libraries of note ● CuPY: CUDA with a numpy interface ● TensorFlow/PyTorch: Machine learning libraries ● Matplotlib: The plotting library for Python ● PyQt/PySide: Bindings to Qt Graphical User Interface ● PyBind11: Easy C++ bindings
  • 18. Summary ● Python is wildly popular, simple to learn, and well supported ● Python has an impressive collection of tools ○ Interactivity: IPython, Jupyter ○ Package delivery: PyPI (pip), Conda ○ Libraries: Numpy, Pandas, and many more
  • 19. Demo ● The second half is devoted to a Pandas demo session