SlideShare a Scribd company logo
V6.0
Getting Started With HDF5
• Why have we brought in a new data format?
• What actually is HDF5?
• How do I create HDF5 files?
• How do I read in HDF5 files
– Reading one file at a time
– Reading multiple files and selections
• Points to Note
• Future Developments
SEGY is great but…
• It is designed to be read sequentially from tape
– and our “index” file solution didn’t scale well to “big data”
– and our index file solution only allowed primary key access
• It only has 240 bytes of 32-bit integer headers defined
– and our extended trace headers didn’t scale well to “big data”
• Some processes require “n-key random access”
– “surface consistent” suite, PreSTM, 3DSRME etc.
• You need to read the whole file to access trace headers
– Some “database” systems offer more flexibility
• Parallel I/O doesn’t scale well on large clusters
So what is HDF5?
• Developed over the last 20 years
• Initially by National Centre for Supercomputing Applications http://www.ncsa.illinois.edu/
• Now developed by the HDF5 Group http//:www.hdfgroup.org
• A suite of technologies, not just a file format
• General purpose library and file format for storing scientific data
• Fully supported set of command line tools, APIs and interfaces
• A pan-industry open standard
• Used for storage by both MatLab and Scilab, can be read by Mathmatica
• Fully supported set of command line tools, APIs and interfaces
• A self describing format
• No ambiguity about integer or floating point types or storage in trace bytes
• Names can be allocated to components, as you would in a database structure
• Built for “big data”
• Petabyte+ scale datasets running on tens of thousands of cores
Our Implementation of HDF5
HDFView 2.9 : free, third party
tool, showing how any HDF5
application can open the new
format
Data, Processing History, 400-byte
reel header, 3200-byte text
header, history and trace headers
from Claritas extended SEGY all
present
Seismic samples displayed
graphically – could also be
displayed as a table
All trace headers – SEGY 240byte
and extended - opened in a
spreadsheet; full mathematical
operations
We have “encapsulated” the GLOBE Claritas SEGY in HDF5
The 400-byte binary reel header
opened as a table, so that values
can be edited or modified
Creating HDF5 Files : SEISWRITE
Specify a file name!
Optimisation controls; these have smart defaults set and
can be modified for managing very large datasets where
you know that non-sequential read-access will be
needed, or partial read of trace samples will be required
Replaces current use of DISCWRITE, although this will continue to be available
New functionality development will focus on SEISWRITE and HDF5 format data
Reading HDF5 files : SEISREAD
With HDF5 format, you use SEISREAD in place of the DISCxxxxx Modules
You don’t need to worry about the order of data on disc, just how you want to read it
Simple Reading
File Name
Primary key order;
default is
all, ascending
Secondary key order;
default is
all, ascending
Tertiary key order; only
when needed
You can read data in ANY order;
original order doesn’t matter
Selection and Repeats
6 Repeat copies specified
Primary key SHOTID with only
SHOTID 900 only selected; note
tolerance
Secondary key CHANNEL, all
selected, in ascending order (default)
Six copies of SHOTID 900 passed to the
processing flow, with REPEAT set from 1-6
More Complex Selections
Two copies of SHOTIDs from 100 to 900 with
an increment of 100, all channels in
ascending, with REPEAT set to 1 and 2
More complex SHOTID selection using
the same syntax as DISCREAD; note
tolerance is set to 0
Sorting to CDP (DISCGATH)
Identical to simple reading
Specify CDP and primary key
Specify CDPTRACE as secondary key
Default is to read all data in ascending
primary/secondary key order
Reading Multiple Files
Seismic File List used in the same
format as with DISCREAD, with
selections
SETRAEPEAT parameter used as per
DISCREAD to create panels, files are
merged if this is “no”
Primary Key defined here is used in the
Seismic File List definition
This last file has a “native”
ordering of
CDP, CDPTRACE, but will be
order to SHOT, CHANNEL on
read, automatically
Points to Note
• Can only specify a primary key in a Seismic File List
– Same as DISCWRITE, although the original data order no longer matters
• User needs to managed extended trace headers merge
– Use DELHDR prior to merging files; will be removed in future releases
• Files can be 10-15% larger than SEGY
• Compatible with Cluster File Systems (Gluster etc.)
• I/O above about 2Gbytes should be improved
Future development
• Improved PKEY/SKEY/TKEY selection handling
• Direct update of trace headers from applications
– Geometry, SV (FB picks) etc.
• Add HDF5 support in KPRET2D
– Only module where this is not available
• Add full parallel I/O to iMage suite
– Increase parallel scalability even further
• Algorithmic optimisation
– Re-write to take full advantage of random access

More Related Content

What's hot

Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
José Ramón Ríos Viqueira
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping project
ETLSolutions
 
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdfJuanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
FIWARE
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Databricks
 
Data Preprocessing- Data Warehouse & Data Mining
Data Preprocessing- Data Warehouse & Data MiningData Preprocessing- Data Warehouse & Data Mining
Data Preprocessing- Data Warehouse & Data Mining
Trinity Dwarka
 
Formal Logic - Lesson 5 - Logical Equivalence
Formal Logic - Lesson 5 - Logical EquivalenceFormal Logic - Lesson 5 - Logical Equivalence
Formal Logic - Lesson 5 - Logical Equivalence
Laguna State Polytechnic University
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Databricks
 
Spark sql
Spark sqlSpark sql
Spark sql
Zahra Eskandari
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Yashwant Rautela
 
The importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standardThe importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standard
Giorgia Lodi
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
Databricks
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
RAHULRAHU8
 
Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)
Sean Cribbs
 
Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducers
lucenerevolution
 
DNS Security
DNS SecurityDNS Security
DNS Security
inbroker
 
Minimum spanning Tree
Minimum spanning TreeMinimum spanning Tree
Minimum spanning Tree
Narendra Singh Patel
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
Mike Frampton
 

What's hot (20)

Information Retrieval Evaluation
Information Retrieval EvaluationInformation Retrieval Evaluation
Information Retrieval Evaluation
 
WITSML to PPDM mapping project
WITSML to PPDM mapping projectWITSML to PPDM mapping project
WITSML to PPDM mapping project
 
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdfJuanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
Juanjo Hierro - Introduction and overview of FIWARE Vision on Data Spaces.pdf
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
Incremental Processing on Large Analytical Datasets with Prasanna Rajaperumal...
 
Data Preprocessing- Data Warehouse & Data Mining
Data Preprocessing- Data Warehouse & Data MiningData Preprocessing- Data Warehouse & Data Mining
Data Preprocessing- Data Warehouse & Data Mining
 
Formal Logic - Lesson 5 - Logical Equivalence
Formal Logic - Lesson 5 - Logical EquivalenceFormal Logic - Lesson 5 - Logical Equivalence
Formal Logic - Lesson 5 - Logical Equivalence
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
The importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standardThe importance of metadata for datasets: The DCAT-AP European standard
The importance of metadata for datasets: The DCAT-AP European standard
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)
 
Text tagging with finite state transducers
Text tagging with finite state transducersText tagging with finite state transducers
Text tagging with finite state transducers
 
DNS Security
DNS SecurityDNS Security
DNS Security
 
Minimum spanning Tree
Minimum spanning TreeMinimum spanning Tree
Minimum spanning Tree
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 

Similar to A quick start guide to using HDF5 files in GLOBE Claritas

9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
Manoel Ribeiro
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
WasyihunSema2
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
Infochimps, a CSC Big Data Business
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
JasmineMichael1
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
SwarnaSLcse
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
SUSE Italy
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
The HDF-EOS Tools and Information Center
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
AakashBerlia1
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Hadoop
HadoopHadoop
Hadoop
avnishagr
 
Hadoop
HadoopHadoop
Hadoop
RittikaBaksi
 

Similar to A quick start guide to using HDF5 files in GLOBE Claritas (20)

9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 

More from Guy Maslen

Human error, brains and how agility helps
Human error, brains and how agility helpsHuman error, brains and how agility helps
Human error, brains and how agility helps
Guy Maslen
 
GLOBE Claritas V6.6 at a glance
GLOBE Claritas V6.6 at a glanceGLOBE Claritas V6.6 at a glance
GLOBE Claritas V6.6 at a glance
Guy Maslen
 
Globe Claritas v6.5 at a glance
Globe Claritas v6.5 at a glanceGlobe Claritas v6.5 at a glance
Globe Claritas v6.5 at a glance
Guy Maslen
 
Globe claritas v6.5 at a glance
Globe claritas v6.5 at a glanceGlobe claritas v6.5 at a glance
Globe claritas v6.5 at a glance
Guy Maslen
 
Exploring Bad Deconvolution Design - some examples
Exploring Bad Deconvolution Design - some examplesExploring Bad Deconvolution Design - some examples
Exploring Bad Deconvolution Design - some examples
Guy Maslen
 
GLOBE Claritas v6.2 at a Glance
GLOBE Claritas v6.2 at a GlanceGLOBE Claritas v6.2 at a Glance
GLOBE Claritas v6.2 at a Glance
Guy Maslen
 
Demultiple Routes
Demultiple RoutesDemultiple Routes
Demultiple Routes
Guy Maslen
 
GLOBE Claritas 2011-12
GLOBE Claritas 2011-12GLOBE Claritas 2011-12
GLOBE Claritas 2011-12
Guy Maslen
 

More from Guy Maslen (8)

Human error, brains and how agility helps
Human error, brains and how agility helpsHuman error, brains and how agility helps
Human error, brains and how agility helps
 
GLOBE Claritas V6.6 at a glance
GLOBE Claritas V6.6 at a glanceGLOBE Claritas V6.6 at a glance
GLOBE Claritas V6.6 at a glance
 
Globe Claritas v6.5 at a glance
Globe Claritas v6.5 at a glanceGlobe Claritas v6.5 at a glance
Globe Claritas v6.5 at a glance
 
Globe claritas v6.5 at a glance
Globe claritas v6.5 at a glanceGlobe claritas v6.5 at a glance
Globe claritas v6.5 at a glance
 
Exploring Bad Deconvolution Design - some examples
Exploring Bad Deconvolution Design - some examplesExploring Bad Deconvolution Design - some examples
Exploring Bad Deconvolution Design - some examples
 
GLOBE Claritas v6.2 at a Glance
GLOBE Claritas v6.2 at a GlanceGLOBE Claritas v6.2 at a Glance
GLOBE Claritas v6.2 at a Glance
 
Demultiple Routes
Demultiple RoutesDemultiple Routes
Demultiple Routes
 
GLOBE Claritas 2011-12
GLOBE Claritas 2011-12GLOBE Claritas 2011-12
GLOBE Claritas 2011-12
 

Recently uploaded

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 

Recently uploaded (20)

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 

A quick start guide to using HDF5 files in GLOBE Claritas

  • 2. Getting Started With HDF5 • Why have we brought in a new data format? • What actually is HDF5? • How do I create HDF5 files? • How do I read in HDF5 files – Reading one file at a time – Reading multiple files and selections • Points to Note • Future Developments
  • 3. SEGY is great but… • It is designed to be read sequentially from tape – and our “index” file solution didn’t scale well to “big data” – and our index file solution only allowed primary key access • It only has 240 bytes of 32-bit integer headers defined – and our extended trace headers didn’t scale well to “big data” • Some processes require “n-key random access” – “surface consistent” suite, PreSTM, 3DSRME etc. • You need to read the whole file to access trace headers – Some “database” systems offer more flexibility • Parallel I/O doesn’t scale well on large clusters
  • 4. So what is HDF5? • Developed over the last 20 years • Initially by National Centre for Supercomputing Applications http://www.ncsa.illinois.edu/ • Now developed by the HDF5 Group http//:www.hdfgroup.org • A suite of technologies, not just a file format • General purpose library and file format for storing scientific data • Fully supported set of command line tools, APIs and interfaces • A pan-industry open standard • Used for storage by both MatLab and Scilab, can be read by Mathmatica • Fully supported set of command line tools, APIs and interfaces • A self describing format • No ambiguity about integer or floating point types or storage in trace bytes • Names can be allocated to components, as you would in a database structure • Built for “big data” • Petabyte+ scale datasets running on tens of thousands of cores
  • 5. Our Implementation of HDF5 HDFView 2.9 : free, third party tool, showing how any HDF5 application can open the new format Data, Processing History, 400-byte reel header, 3200-byte text header, history and trace headers from Claritas extended SEGY all present Seismic samples displayed graphically – could also be displayed as a table All trace headers – SEGY 240byte and extended - opened in a spreadsheet; full mathematical operations We have “encapsulated” the GLOBE Claritas SEGY in HDF5 The 400-byte binary reel header opened as a table, so that values can be edited or modified
  • 6. Creating HDF5 Files : SEISWRITE Specify a file name! Optimisation controls; these have smart defaults set and can be modified for managing very large datasets where you know that non-sequential read-access will be needed, or partial read of trace samples will be required Replaces current use of DISCWRITE, although this will continue to be available New functionality development will focus on SEISWRITE and HDF5 format data
  • 7. Reading HDF5 files : SEISREAD With HDF5 format, you use SEISREAD in place of the DISCxxxxx Modules You don’t need to worry about the order of data on disc, just how you want to read it
  • 8. Simple Reading File Name Primary key order; default is all, ascending Secondary key order; default is all, ascending Tertiary key order; only when needed You can read data in ANY order; original order doesn’t matter
  • 9. Selection and Repeats 6 Repeat copies specified Primary key SHOTID with only SHOTID 900 only selected; note tolerance Secondary key CHANNEL, all selected, in ascending order (default) Six copies of SHOTID 900 passed to the processing flow, with REPEAT set from 1-6
  • 10. More Complex Selections Two copies of SHOTIDs from 100 to 900 with an increment of 100, all channels in ascending, with REPEAT set to 1 and 2 More complex SHOTID selection using the same syntax as DISCREAD; note tolerance is set to 0
  • 11. Sorting to CDP (DISCGATH) Identical to simple reading Specify CDP and primary key Specify CDPTRACE as secondary key Default is to read all data in ascending primary/secondary key order
  • 12. Reading Multiple Files Seismic File List used in the same format as with DISCREAD, with selections SETRAEPEAT parameter used as per DISCREAD to create panels, files are merged if this is “no” Primary Key defined here is used in the Seismic File List definition This last file has a “native” ordering of CDP, CDPTRACE, but will be order to SHOT, CHANNEL on read, automatically
  • 13. Points to Note • Can only specify a primary key in a Seismic File List – Same as DISCWRITE, although the original data order no longer matters • User needs to managed extended trace headers merge – Use DELHDR prior to merging files; will be removed in future releases • Files can be 10-15% larger than SEGY • Compatible with Cluster File Systems (Gluster etc.) • I/O above about 2Gbytes should be improved
  • 14. Future development • Improved PKEY/SKEY/TKEY selection handling • Direct update of trace headers from applications – Geometry, SV (FB picks) etc. • Add HDF5 support in KPRET2D – Only module where this is not available • Add full parallel I/O to iMage suite – Increase parallel scalability even further • Algorithmic optimisation – Re-write to take full advantage of random access