SlideShare a Scribd company logo
1 of 18
1© 2016 The MathWorks, Inc.
MATLAB, Big Data, and HDF Server
Ellen Johnson
MathWorks
2
Overview
 MATLAB capabilities and domain areas
 Scientific data in MATLAB
 HDF5 interface
 NetCDF interface
 Big Data in MATLAB
 MATLAB data analytics workflows
 RESTful web service access
 Demo: Programmatically access HDF5 data served on HDF Server
3
CUSTOMERS IN
 Aerospace and defense
 Automotive
 Biotech and pharmaceutical
 Communications
 Education
 Electronics and semiconductors
 Energy production
 Financial services
 Industrial automation
and machinery
 Medical devices
 Software
 Internet
DESIGNED FOR
 Embedded system
development
 Engineering Education
 Aircraft and missile
guidance systems
 Control system design
 Communications
system design
 Earth Sciences
 Engineering research
 Robotics
 Online trading systems
 System optimization
 Computational Biology
4
Scientific Data in MATLAB
 Scientific data formats
• HDF5, HDF4, HDF-EOS2
• NetCDF (with OPeNDAP!)
• FITS, CDF, BIL, BIP, BSQ
 Image file formats
• TIFF, JPEG, HDR, PNG,
JPEG2000, and more
 Vector data file formats
• ESRI Shapefiles, KML, GPS
and more
 Raster data file formats
• GeoTIFF, NITF, USGS and SDTS
DEM, NIMA DTED, and more
 Web Map Service (WMS)
5
HDF5 in MATLAB
 High Level Interface (h5read, h5write, h5disp,
h5info)
h5disp('example.h5','/g4/lat');
data = h5read('example.h5','/g4/lat');
 Low Level Interface (Wraps HDF5 C APIs)
fid = H5F.open('example.h5');
dset_id = H5D.open(fid,'/g4/lat');
data = H5D.read(dset_id);
H5D.close(dset_id);
H5F.close(fid);
6
NetCDF in MATLAB
 High Level Interface (ncdisp, ncread, ncwrite,
ncinfo)
url = 'http://oceanwatch.pifsc.noaa.gov/thredds/
dodsC/goes-poes/2day';
ncdisp(url);
data = ncread(url,'sst');
 Low Level Interface (Wraps netCDF C APIs)
ncid = netcdf.open(url);
varid = netcdf.inqVarID(ncid,'sst');
netcdf.getVar(ncid,varid,'double');
netcdf.close(ncid);
7
Big Data in MATLAB
8
Scale Data
Memory and Data Access
 64-bit processors
 Memory Mapped Variables
 Disk Variables
 Databases
 Datastores
Programming Constructs
 Streaming
 Block Processing
 Parallel-for loops
 GPU Arrays
 SPMD and Distributed Arrays
 MapReduce
Platforms
 Desktop (Multicore, GPU)
 Clusters
 Cloud Computing (MDCS for EC2)
 Hadoop
9
Hadoop with MATLAB
Production Hadoop
• Create applications or
components that execute on
Hadoop
10
Access Big Data
datastore
 datastore for accessing large data sets
– Text or image files
– Single file or collection of files
 Preview data structure and format
 Select data to import using column names
 Incrementally read subsets of the data
 Access data stored in HDFS
airdata = datastore('*.csv');
airdata.SelectedVariables = {'Distance', 'ArrDelay‘};
data = read(airdata);
11
Analyze Big Data
mapreduce
 mapreduce uses datastore to process data in chunks
– Intermediate analysis results do not fit in memory
– Processing multiple keys
– Data resides in Hadoop
********************************
* MAPREDUCE PROGRESS *
********************************
Map 0% Reduce 0%
Map 20% Reduce 0%
Map 40% Reduce 0%
Map 60% Reduce 0%
Map 80% Reduce 0%
Map 100% Reduce 25%
Map 100% Reduce 50%
Map 100% Reduce 75%
Map 100% Reduce 100%
Work on the desktop
• Local data exploration, analysis, and algorithm development
Scale to Hadoop
• Interactive use with MATLAB Distributed Computing Server
• Deploy to production Hadoop instances using MATLAB Compiler
12
Data Analytics with MATLAB
Symbolic
Computing
Neural
Networks
OptimizationSignal
Processing
Image
Processing
Control
Systems
Financial
Modeling
Apps Language
Machine
Learning
Statistics
13
Presentation
Layer
Analytics
Layer
Data
Layer
DatabasesData Warehouses
Data Visualization
Computation
Layer
Cloud
MathWorks Cloud
Enterprise-Scale Data Analytics
14
Combining Big Data, RESTful Web Services, and MATLAB
 Big Data
– mapreduce and datastore functions
– table, categorical, and datetime data types are powerful in conjunction with big
data analysis
 RESTful web service access
– webread, webwrite, and weboptions
– JSON objects represented as struct arrays
– struct2table converts data into table as a collection of heterogeneous data
Data import
into
appropriate
data types
Data
Exploration
Data
Visualization
Data Analysis
Combine to support MATLAB data analytics workflow
15
webread Example: Read historical temperature data
Read historical temperature data from the World Bank Climate Data API
>> api = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/';
>> url = [api 'country/cru/tas/year/USA'];
>> S = webread(url)
S =
112x1 struct array with fields:
year
data
>> S(1)
ans =
year: 1901
data: 6.6187
16
Demo: Using MATLAB to programmatically access and
analyze data hosted on HDF Server
 HDF Server: A RESTful API providing remote access to HDF5 data
 Responses are JSON formatted text
 webread with weboptions provide data access
 table and datetime data types enable data analysis
 Example: Coral Reef Temperature Anomaly Database (CoRTAD)
 Version 3 CoRTAD products in HDF5 format
 1.8G dataset hosted on h5serv running on Amazon AWS
thermStress = sortrows(thermStress,'ThermalStressAnomaly','descend');
thermStress(1:10,:)
ans =
Latitude Longitude ThermalStressAnomaly
________ _________ ____________________
-8.2839 137.53 52
-2.0874 146.67 51
-8.2399 137.49 50
-8.2399 137.53 50
-15.447 145.22 50
-15.491 145.22 50
-10.13 148.34 50
-4.5924 135.99 49
17
Questions?
 www.mathworks.com
 www.mathworks.com/matlabcentral
 Examples:
 Using the high-level HDF5 Functions to Import Data
 Tackling Big Data with MATLAB
 Performing Numerical Simulation of an Oil Spill
 Reading Content from RESTful Web Service
Thank you!
18
References
 www.hdfgroup.org
 https://hdfgroup.org/wp/2015/04/hdf5-for-the-web-hdf-server/
 http://data.worldbank.org/developers/climate-data-api
 https://data.nasa.gov/data
 http://visibleearth.nasa.gov/
 http://www.nodc.noaa.gov/sog/cortad/
 http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:0068999

More Related Content

What's hot

What's hot (20)

SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Incorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product DesignerIncorporating ISO Metadata Using HDF Product Designer
Incorporating ISO Metadata Using HDF Product Designer
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
MATLAB, netCDF, and OPeNDAP
MATLAB, netCDF, and OPeNDAPMATLAB, netCDF, and OPeNDAP
MATLAB, netCDF, and OPeNDAP
 
Improved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the MassesImproved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the Masses
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
 
Summary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File FormatSummary of HDF-EOS5 Files, Data Model and File Format
Summary of HDF-EOS5 Files, Data Model and File Format
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
 
Scientific Computing and Visualization using HDF
Scientific Computing and Visualization using HDFScientific Computing and Visualization using HDF
Scientific Computing and Visualization using HDF
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
 

Viewers also liked (15)

HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
Breakthrough Listen
Breakthrough ListenBreakthrough Listen
Breakthrough Listen
 
Matlab netcdf guide
Matlab netcdf guideMatlab netcdf guide
Matlab netcdf guide
 
Using visualization tools to access HDF data via OPeNDAP
Using visualization tools to access HDF data via OPeNDAP Using visualization tools to access HDF data via OPeNDAP
Using visualization tools to access HDF data via OPeNDAP
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current future
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Unidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology SharingUnidata's Approach to Community Broadening through Data and Technology Sharing
Unidata's Approach to Community Broadening through Data and Technology Sharing
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
 
Big Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must KnowBig Data: The 4 Layers Everyone Must Know
Big Data: The 4 Layers Everyone Must Know
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 

Similar to Matlab, Big Data, and HDF Server

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...VMware Tanzu
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightAnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightŁukasz Grala
 
Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR OverviewKhalid Salama
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paperrestassure
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 

Similar to Matlab, Big Data, and HDF Server (20)

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...Federated Queries Across Both Different Storage Mediums and Different Data En...
Federated Queries Across Both Different Storage Mediums and Different Data En...
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightAnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight
 
Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR Overview
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Data Science
Data ScienceData Science
Data Science
 

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Matlab, Big Data, and HDF Server

  • 1. 1© 2016 The MathWorks, Inc. MATLAB, Big Data, and HDF Server Ellen Johnson MathWorks
  • 2. 2 Overview  MATLAB capabilities and domain areas  Scientific data in MATLAB  HDF5 interface  NetCDF interface  Big Data in MATLAB  MATLAB data analytics workflows  RESTful web service access  Demo: Programmatically access HDF5 data served on HDF Server
  • 3. 3 CUSTOMERS IN  Aerospace and defense  Automotive  Biotech and pharmaceutical  Communications  Education  Electronics and semiconductors  Energy production  Financial services  Industrial automation and machinery  Medical devices  Software  Internet DESIGNED FOR  Embedded system development  Engineering Education  Aircraft and missile guidance systems  Control system design  Communications system design  Earth Sciences  Engineering research  Robotics  Online trading systems  System optimization  Computational Biology
  • 4. 4 Scientific Data in MATLAB  Scientific data formats • HDF5, HDF4, HDF-EOS2 • NetCDF (with OPeNDAP!) • FITS, CDF, BIL, BIP, BSQ  Image file formats • TIFF, JPEG, HDR, PNG, JPEG2000, and more  Vector data file formats • ESRI Shapefiles, KML, GPS and more  Raster data file formats • GeoTIFF, NITF, USGS and SDTS DEM, NIMA DTED, and more  Web Map Service (WMS)
  • 5. 5 HDF5 in MATLAB  High Level Interface (h5read, h5write, h5disp, h5info) h5disp('example.h5','/g4/lat'); data = h5read('example.h5','/g4/lat');  Low Level Interface (Wraps HDF5 C APIs) fid = H5F.open('example.h5'); dset_id = H5D.open(fid,'/g4/lat'); data = H5D.read(dset_id); H5D.close(dset_id); H5F.close(fid);
  • 6. 6 NetCDF in MATLAB  High Level Interface (ncdisp, ncread, ncwrite, ncinfo) url = 'http://oceanwatch.pifsc.noaa.gov/thredds/ dodsC/goes-poes/2day'; ncdisp(url); data = ncread(url,'sst');  Low Level Interface (Wraps netCDF C APIs) ncid = netcdf.open(url); varid = netcdf.inqVarID(ncid,'sst'); netcdf.getVar(ncid,varid,'double'); netcdf.close(ncid);
  • 7. 7 Big Data in MATLAB
  • 8. 8 Scale Data Memory and Data Access  64-bit processors  Memory Mapped Variables  Disk Variables  Databases  Datastores Programming Constructs  Streaming  Block Processing  Parallel-for loops  GPU Arrays  SPMD and Distributed Arrays  MapReduce Platforms  Desktop (Multicore, GPU)  Clusters  Cloud Computing (MDCS for EC2)  Hadoop
  • 9. 9 Hadoop with MATLAB Production Hadoop • Create applications or components that execute on Hadoop
  • 10. 10 Access Big Data datastore  datastore for accessing large data sets – Text or image files – Single file or collection of files  Preview data structure and format  Select data to import using column names  Incrementally read subsets of the data  Access data stored in HDFS airdata = datastore('*.csv'); airdata.SelectedVariables = {'Distance', 'ArrDelay‘}; data = read(airdata);
  • 11. 11 Analyze Big Data mapreduce  mapreduce uses datastore to process data in chunks – Intermediate analysis results do not fit in memory – Processing multiple keys – Data resides in Hadoop ******************************** * MAPREDUCE PROGRESS * ******************************** Map 0% Reduce 0% Map 20% Reduce 0% Map 40% Reduce 0% Map 60% Reduce 0% Map 80% Reduce 0% Map 100% Reduce 25% Map 100% Reduce 50% Map 100% Reduce 75% Map 100% Reduce 100% Work on the desktop • Local data exploration, analysis, and algorithm development Scale to Hadoop • Interactive use with MATLAB Distributed Computing Server • Deploy to production Hadoop instances using MATLAB Compiler
  • 12. 12 Data Analytics with MATLAB Symbolic Computing Neural Networks OptimizationSignal Processing Image Processing Control Systems Financial Modeling Apps Language Machine Learning Statistics
  • 14. 14 Combining Big Data, RESTful Web Services, and MATLAB  Big Data – mapreduce and datastore functions – table, categorical, and datetime data types are powerful in conjunction with big data analysis  RESTful web service access – webread, webwrite, and weboptions – JSON objects represented as struct arrays – struct2table converts data into table as a collection of heterogeneous data Data import into appropriate data types Data Exploration Data Visualization Data Analysis Combine to support MATLAB data analytics workflow
  • 15. 15 webread Example: Read historical temperature data Read historical temperature data from the World Bank Climate Data API >> api = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/'; >> url = [api 'country/cru/tas/year/USA']; >> S = webread(url) S = 112x1 struct array with fields: year data >> S(1) ans = year: 1901 data: 6.6187
  • 16. 16 Demo: Using MATLAB to programmatically access and analyze data hosted on HDF Server  HDF Server: A RESTful API providing remote access to HDF5 data  Responses are JSON formatted text  webread with weboptions provide data access  table and datetime data types enable data analysis  Example: Coral Reef Temperature Anomaly Database (CoRTAD)  Version 3 CoRTAD products in HDF5 format  1.8G dataset hosted on h5serv running on Amazon AWS thermStress = sortrows(thermStress,'ThermalStressAnomaly','descend'); thermStress(1:10,:) ans = Latitude Longitude ThermalStressAnomaly ________ _________ ____________________ -8.2839 137.53 52 -2.0874 146.67 51 -8.2399 137.49 50 -8.2399 137.53 50 -15.447 145.22 50 -15.491 145.22 50 -10.13 148.34 50 -4.5924 135.99 49
  • 17. 17 Questions?  www.mathworks.com  www.mathworks.com/matlabcentral  Examples:  Using the high-level HDF5 Functions to Import Data  Tackling Big Data with MATLAB  Performing Numerical Simulation of an Oil Spill  Reading Content from RESTful Web Service Thank you!
  • 18. 18 References  www.hdfgroup.org  https://hdfgroup.org/wp/2015/04/hdf5-for-the-web-hdf-server/  http://data.worldbank.org/developers/climate-data-api  https://data.nasa.gov/data  http://visibleearth.nasa.gov/  http://www.nodc.noaa.gov/sog/cortad/  http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:0068999

Editor's Notes

  1. 3
  2. h5disp maps to h5dump try, catch don’t have to recompile your code to play with the lower level interfaces Run code as you type it
  3. ncdisp maps to ncdump
  4. Big data means many different things to different users. MATLAB provides numerous capabilities for processing data that is too cumbersome for the desktop, as well for supporting big data systems such as Hadoop: 64 bit processors along with memory mapped and disk variables optimize processing on the desktop, while databases and our new datastore functionality allow for analyzing your data in segments. MATLAB also provides for various programming constructs to address the wide variety of data characteristics. Use system objects for stream processing, process images using block processing techniques, process your data in parallel or on GPUs using distributed arrays or the new Mapreduce framework in MATLAB to further enhance the speed of analysis and the volume of data which can be analyzed. Theses capabilities will let you analyze big data on your desktop, and if more processing power or workspace is needed scale to a cluster. If your data happens to reside in the big data platform Hadoop, we have some new features to allow MATLAB to interoperate with this big data platform.
  5. Datastore provides a straightforward way to access big data that consists of a single text or image file or a large collection of such files. Point the datastore to a folder or use wildcards to specify all the files in a given directory Preview a subset of the data for easy exploration Identify columns to import using column names, and specify the format for each column of interest Step through files a chunk at a time
  6. MapReduce is a powerful programming technique for applying filtering, statistics and other general analysis methods to big data. You can use mapreduce on your desktop machine for applications where the intermediate results of your analysis will not fit into memory, when the analysis is being done on many keys, or to develop algorithms for later use on data stored in HDFS, Hadoop Distributed File System. You can execute MATLAB MapReduce based algorithms within Hadoop MapReduce, using MATLAB Distributed Computing Server You can package MapReduce based algorithms for deploying to production Hadoop systems, using MATLAB Compiler™
  7. Customer’s point of view, especially if talking to IT/Enterprise Archiect The key thing to take away from this slide is that there are many other companies in this space, but most of them should be considered complimentary to what we offer. Only a few are competitive: R, Python, SAS In the data layer, have the big data vendors, data warehouse vendors, … Story here is that we can work with customers to help them integrate with these Similar for the presentation layer…