SlideShare a Scribd company logo
Python and HDF5
Andrew Collette
University of Colorado
What makes scientific data special?
What makes scientific data special?

It’s meant to be shared - collaborative
Ad-hoc or changing structure - flexible
Archived and preserved - robust

Python and HDF5 together address all three
High-level language
Fully object-oriented
Almost no “boilerplate” code

Readable
Free
(the language)
“Exception” error handling

Self-documenting

First-class module/namespace support
(the platform)

Mature numerical, plotting and scientific modules
Hundreds of specialized science packages
Thousands more general-purpose
Python itself is “batteries included”
Core analysis packages
NumPy - Array objects and basic operations
SciPy - Advanced science & engineering library
Matplotlib - Publication-quality plots

(both rendered and interactive)
Thousands of others
Distribution - distutils/pip single-command installs
Unit testing - unittest module in stdlib
Interface: F2PY (Fortran), Cython (C), ctypes, others
Web servers and development - literally hundreds
Only need to write code for your problem
Python highlights
Readable
Iteration
C

IDL

Python
Speed
Speed
FFTs and optimized routines built in to NumPy/Scipy
Speed
FFTs and optimized routines built in to NumPy/Scipy
ctypes and Cython
ctypes
Advanced foreign function interface
Call C libraries from pure Python code
Cython
Example from the HDF5 C Library:
HDF5
HDF5
Hierarchical Data Format
3 things:
File specification and object model
C library
Ecosystem of users and developers
Objects
Datasets - Homogenous arrays of data
Groups: containers holding datasets and groups
Attributes: arbitrary metadata on groups & datasets

Standard constructs using these, or make your own!
Dataset features
Partial I/O: read and write just what you want
(In Python, we even use the array-access syntax!)
Automatic type conversion
On-the-fly compression
Parallel reads & writes with MPI
(Directly from Python!)
Metadata & Organization
Groups form a POSIX-style “filesystem” in the file
Attributes can store arbitrary data on arbitrary objects
How should the file be organized?
You decide!
!

Thousands of domain-specific “application formats”
Anyone can read them because HDF5 is self-describing!
Example
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Open an HDF5 file
Extract a particular dataset
Read the data
Make an interactive plot
Close the file
Demo
Real-world use
UCLA Large Plasma Device
UCLA Large Plasma Device

Image credit: Basic Plasma Science Facility
Laser Experiment

Image credit: Basic Plasma Science Facility
LAPD Data Products
Acquisition file - “Planes” of data in HDF5
Metadata:

timestamps, digitizer settings, probe positions,
background plasma conditions…
Packaged into HDF5 following “lab layout”
Users take their data back home and analyze
Visualization
Python 2D plotting

A. Collette et al. Phys. Rev. Lett 105, 195003 (2010)
Only 160 lines of code!

A. Collette et al. Phys. Rev. Lett 105, 195003 (2010)
Python does 3D too!
“MayaVi” 3D visualizer
Development sponsored
by Enthought
Both offline (scripted) and
interactive modes

A. Collette et al. Phys. Plasmas 18, 055705 (2011)
CU Accelerator
CU Accelerator
CU Accelerator
CU Accelerator
CU Accelerator
Raw data

HDF5 Shot file
Automated
speed/mass
calculation

Data search
HDF5 file for user

MySQL
Where to get Python
Where to get Python
Distributions are the best way to get started
(they include HDF5/h5py!)
Anaconda (Windows, Mac, Linux):
http://continuum.io
PythonXY (Windows)
http://pythonxy.googlecode.com
Questions?

More Related Content

What's hot

Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
The HDF-EOS Tools and Information Center
 

What's hot (20)

Adding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 FileAdding CF Attributes to an HDF5 File
Adding CF Attributes to an HDF5 File
 
HDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/OHDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/O
 
Images of HDF5
Images of HDF5Images of HDF5
Images of HDF5
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
HDF5 FastQuery
HDF5 FastQueryHDF5 FastQuery
HDF5 FastQuery
 
Projection Indexes for HDF5 Datasets
Projection Indexes for HDF5 DatasetsProjection Indexes for HDF5 Datasets
Projection Indexes for HDF5 Datasets
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF4 Mapping Project Update
HDF4 Mapping Project UpdateHDF4 Mapping Project Update
HDF4 Mapping Project Update
 
Tools to improve the usability of NASA HDF Data
Tools to improve the usability of NASA HDF DataTools to improve the usability of NASA HDF Data
Tools to improve the usability of NASA HDF Data
 
Digital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS dataDigital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS data
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
NASA HDF/HDF-EOS Data Access Challenges
NASA HDF/HDF-EOS Data Access ChallengesNASA HDF/HDF-EOS Data Access Challenges
NASA HDF/HDF-EOS Data Access Challenges
 
Data Interoperability
Data InteroperabilityData Interoperability
Data Interoperability
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 

Similar to Python and HDF5: Overview

Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
eposthumus
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
Ted Habermann
 
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout MapsEnsuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
The HDF-EOS Tools and Information Center
 

Similar to Python and HDF5: Overview (20)

Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Parallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory TutorialParallel HDF5 Introductory Tutorial
Parallel HDF5 Introductory Tutorial
 
Module net cdf4
Module net cdf4 Module net cdf4
Module net cdf4
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
Using HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshootingUsing HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshooting
 
Enhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through OntologyEnhancing Domain Specific Language Implementations Through Ontology
Enhancing Domain Specific Language Implementations Through Ontology
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
 
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout MapsEnsuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
HDF
HDFHDF
HDF
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench247th ACS Meeting: The Eureka Research Workbench
247th ACS Meeting: The Eureka Research Workbench
 
HDF5 Tools in IDL
HDF5 Tools in IDLHDF5 Tools in IDL
HDF5 Tools in IDL
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 
An IDL-Based Validation Toolkit: Extensions to use the HDF-EOS Swath Format
An IDL-Based  Validation Toolkit: Extensions to  use the HDF-EOS Swath FormatAn IDL-Based  Validation Toolkit: Extensions to  use the HDF-EOS Swath Format
An IDL-Based Validation Toolkit: Extensions to use the HDF-EOS Swath Format
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Python and HDF5: Overview