SlideShare a Scribd company logo
Eamonn Maguire, CERN
hepdata.net
HEPData Workshop
What is it?
HEP Scattering
experiments going
back to the 1950s
Each group of scientists will
analyse particular signals by
processing large numbers
of collisions.
The resulting analysis
will be published as
a paper.
But where does the
processed data go?
RE P P --> X
SQRT(S) IN GEV SIG IN MB
7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37
(sys,extrapolation)
What is it?
HEPDATA
Physics paper
Table description
RE P P --> X
SQRT(S) IN GEV SIG IN MB
7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37
(sys,extrapolation)
RE P P --> X
SQRT(S) IN GEV SIG IN MB
7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37
(sys,extrapolation)
Table 1
Table 2 (F1)
HEPData is the go to place for physicists to get access to the data
underlying plots and tables in a publication.
It also links to the scripts and ROOT files for instance used in the
analysis (for reproducibility).
What is it?
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
Data Consumers
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
What is new for you?
Whether you’re a data provider, or consumer,
the new HEPData has many functionalities
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
Submission Process
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
HEPData submission archive
YAML
Data Record
YAML
Data Record
ROOT PYTHON C++ ROOT
submission.yaml
data records
external data files & links
links the submission together by detailing the
data files to be loaded, their name and descrip-
tion, and their assocated analysis files and code.
YAML (or JSON) representation of the underlying
data files including value errors in a verbose format.
analysis files, code, links to code repositories, etc.
HEPdata submission archive
RE P P --> X
SQRT(S) IN GEV SIG IN MB
7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37
(sys,extrapolation)
HEPDATA
Table 1
{JSON}
Tables and plots
Processes YAML file, inserts records in to database
and links publication record with data and files.
Web Server
Table description
Plots rendered automatically
using a custom library built upon D3.js
Tables rendered from
JSON
DownloadScripts
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
Comprehensive Review System
Comprehensive Review System
Dashboard for Submission Management
Dashboard for Submission Management
Interactive Plotting Library
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
Versioning
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
DOIs
All HEPData records get
DOIs.
Each data table gets a
versioned DOI.
The whole HEPData
record is also given a DOI
to encompass the whole
collection.
Data Providers
1. A Simplified Submission Process
2. A standard entry data format
3. Full review management system
4. Versioning
5. DOI minting
6. Sandbox
Sandbox
Sandbox
Data Consumers
Get access to the data in many environments
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
Data Consumers
Get access to the data in many environments
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
The System - Demo
hepdata.net
Data Consumers
Get access to the data in many environments
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
HEPData
Coming soon!
Masters Project of Juan Luis Boya Garcia,
University of Salamanca, Spain.
You’ll be able to ask HEPData’s API for all data
matching some number of variables.
Retrieve all relevant data, then run your analysis
HEPData
Programmatically or through our interface
No need to manually go through every publication.
We do the hard work for you.
HEPData Search HEPData Submit Data Sign inSearch Data Query
Independent Variable
PT (GeV)
OR
Phrase
elastic scattering
Delete
10
20
30
Download as
5
15
25
Found 3,400 matching data points in 36 publications Download all asView publications
P (GeV) vs Dim 2 P (GeV) vs Dim 3
View publications Download asView publications
cmenergies
7000 GeV
3500 GeV
4.5 GeV
12
6
3.3 GeV
4
3
23 more
reactions
E+ E- --> HADRONS
P P --> X
P P --> P P
12
6
PBAR P --> X
4
3
45 more
observables
SIG
DSIG/DT
SLOPE
12
6
DSIG/DOMEGA
4
3
34 more
Data Query Engine
phrases
Exclusive
Inclusive
Single Differential
12
6
Strange production
4
3
34 more
AND
Independent Variable
COS(THETA)
Delete
AND
Phrase
cross section
DeleteDelete
Beta mode within the next month
The prototype
HEPData
A new type of search
Data driven instead of publication driven
Full data provenance captured
Every data point is recorded with its parent publication’s inspire
id and table number
HEPData
Fast data rendering
With WebGL, now capable of rendering 100s of 1000s of data
points in the browser.
API
Query HEPData directly from mathematica, ROOT, etc. and
process our results as you need
We’re interested in more use cases….
Please get in touch with us to get early access!
HEPData
Data Consumers
Get access to the data in many environments
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
Semantic Publishing
Every article is tagged with schema.org vocabulary.
Makes it possible for Google and other search engines
to understand your content.
https://hepdata.net/record/ins1397180
Google’s view
Google’s View
https://hepdata.net/search
Data Consumers
Get access to the data in many environments
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
Getting Data in, getting data out…
Converter
Convert from YAML to ROOT, YODA, CSV
Install via PIP, use as a web service, and contribute to
more conversions!
Interoperability with other tools
Conversion to many formats
Data Consumers
Get access to the data in many environments
1. Publication Driven Search
2. Data Driven Search
3. Semantic Publishing
4. Data Conversion
5. Access in Analysis Environments
The same can be said for use in ROOT or any analysis platform with
a file parser :)
e.g. use data directly in Mathematica
No need to leave the software
environment you like.
Example File here
Sustainability
HEPData is now hosted at CERN and deployed by the RCS-SIS-OA team.
Load Balancer
hepdata-lb1
Web nodes
hepdata-web1 hepdata-web2
Tasks
hepdata-task1hepdata-task2
Elastic Search Cluster
Client
hepdata-search-client
Master
hepdata-search-master
CERN DB on Demand
Data
hepdata-search-data2
Data
hepdata-search-data1
Task 2Task 1
Converter Node
hepdata-converter
Converter
Cache
Cache
hepdata-cache1
Data hosted on the new EOS file system.
Hosted on OpenStack, managed with Puppet
Everything on Github! http://www.github.com/hepdata
Acknowledgements
Eamonn Maguire
Jan Stypka
Salvatore Mele
HEPData @ CERN
Graeme Watt
Michael Whalley
Frank Kraus
HEPData @ Durham
Lukas Heinrich
HEPData @ NYU
Kyle Cramner
Alumni
Laura Rueda-Garcia
Michal Szoziak Summer Student
and all the Invenio team, and the inspire team including Javier Martin Montull,
Jan Age Lavik, and Samuel Kaplun for their support!
Juan Luis Boya Garcia
HEPData @ Salamanca
Thanks to Luca Maschetti and the EOS team for their superb assistance!
Any questions?

More Related Content

What's hot

Dr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumDr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with Palladium
PyData
 
07.bootstrapping
07.bootstrapping07.bootstrapping
07.bootstrapping
Donny Maha Putra
 
Honey I Shrunk the Database
Honey I Shrunk the DatabaseHoney I Shrunk the Database
Honey I Shrunk the DatabaseVanessa Hurst
 
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Databricks
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
Russell Jurney
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publish
Gleydson Lima
 
Agile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDBAgile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDB
Russell Jurney
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
Russell Jurney
 
20190110 tfug fukuoka
20190110 tfug fukuoka20190110 tfug fukuoka
20190110 tfug fukuoka
KSuzukiii
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
Russell Jurney
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
Russell Jurney
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
Russell Jurney
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
SlideCentral
 
Stacks
StacksStacks
Stacks
Acad
 
Integration of data ninja services with oracle spatial and graph
Integration of data ninja services with oracle spatial and graphIntegration of data ninja services with oracle spatial and graph
Integration of data ninja services with oracle spatial and graph
Data Ninja API
 
Mansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analyticsMansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analytics
MansiChowkkar
 
DF1 - R - Natekin - Improving Daily Analysis with data.table
DF1 - R - Natekin - Improving Daily Analysis with data.tableDF1 - R - Natekin - Improving Daily Analysis with data.table
DF1 - R - Natekin - Improving Daily Analysis with data.table
MoscowDataFest
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus Technologies
 

What's hot (19)

Dr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumDr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with Palladium
 
CS215 - Lec 9 indexing and reclaiming space in files
CS215 - Lec 9  indexing and reclaiming space in filesCS215 - Lec 9  indexing and reclaiming space in files
CS215 - Lec 9 indexing and reclaiming space in files
 
07.bootstrapping
07.bootstrapping07.bootstrapping
07.bootstrapping
 
Honey I Shrunk the Database
Honey I Shrunk the DatabaseHoney I Shrunk the Database
Honey I Shrunk the Database
 
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
Unafraid of Change: Optimizing ETL, ML, and AI in Fast-Paced Environments wit...
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Tech Talk - JPA and Query Optimization - publish
Tech Talk  -  JPA and Query Optimization - publishTech Talk  -  JPA and Query Optimization - publish
Tech Talk - JPA and Query Optimization - publish
 
Agile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDBAgile Data Science 2.0: Using Spark with MongoDB
Agile Data Science 2.0: Using Spark with MongoDB
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
20190110 tfug fukuoka
20190110 tfug fukuoka20190110 tfug fukuoka
20190110 tfug fukuoka
 
Agile Data Science 2.0
Agile Data Science 2.0Agile Data Science 2.0
Agile Data Science 2.0
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Stacks
StacksStacks
Stacks
 
Integration of data ninja services with oracle spatial and graph
Integration of data ninja services with oracle spatial and graphIntegration of data ninja services with oracle spatial and graph
Integration of data ninja services with oracle spatial and graph
 
Mansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analyticsMansi chowkkar programming_in_data_analytics
Mansi chowkkar programming_in_data_analytics
 
DF1 - R - Natekin - Improving Daily Analysis with data.table
DF1 - R - Natekin - Improving Daily Analysis with data.tableDF1 - R - Natekin - Improving Daily Analysis with data.table
DF1 - R - Natekin - Improving Daily Analysis with data.table
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 

Viewers also liked

Visual Compression of Workflow Visualizations with Automated Detection of Mac...
Visual Compression of Workflow Visualizations with Automated Detection of Mac...Visual Compression of Workflow Visualizations with Automated Detection of Mac...
Visual Compression of Workflow Visualizations with Automated Detection of Mac...
Eamonn Maguire
 
Taxonomy-Based Glyph Design
Taxonomy-Based Glyph DesignTaxonomy-Based Glyph Design
Taxonomy-Based Glyph Design
Eamonn Maguire
 
Visualization of Publication Impact
Visualization of Publication ImpactVisualization of Publication Impact
Visualization of Publication Impact
Eamonn Maguire
 
ISA tools presentation
ISA tools presentationISA tools presentation
ISA tools presentationEamonn Maguire
 
Web valley talk - usability, visualization and mobile app development
Web valley talk - usability, visualization and mobile app developmentWeb valley talk - usability, visualization and mobile app development
Web valley talk - usability, visualization and mobile app development
Eamonn Maguire
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
Eamonn Maguire
 

Viewers also liked (6)

Visual Compression of Workflow Visualizations with Automated Detection of Mac...
Visual Compression of Workflow Visualizations with Automated Detection of Mac...Visual Compression of Workflow Visualizations with Automated Detection of Mac...
Visual Compression of Workflow Visualizations with Automated Detection of Mac...
 
Taxonomy-Based Glyph Design
Taxonomy-Based Glyph DesignTaxonomy-Based Glyph Design
Taxonomy-Based Glyph Design
 
Visualization of Publication Impact
Visualization of Publication ImpactVisualization of Publication Impact
Visualization of Publication Impact
 
ISA tools presentation
ISA tools presentationISA tools presentation
ISA tools presentation
 
Web valley talk - usability, visualization and mobile app development
Web valley talk - usability, visualization and mobile app developmentWeb valley talk - usability, visualization and mobile app development
Web valley talk - usability, visualization and mobile app development
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
 

Similar to HEPData workshop talk

High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
DataWorks Summit
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
restassure
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
Jim Dowling
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013Luis Daniel Ibáñez
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
Ian Foster
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Yuanyuan Tian
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Michael Häusler
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
Bringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASBringing OpenClinica Data into SAS
Bringing OpenClinica Data into SAS
Rick Watts
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
Spencer Fox
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
The HDF-EOS Tools and Information Center
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
Malla Reddy University
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
Christopher Williams
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Sudhir Mallem
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
Bart Vandewoestyne
 
070416 Egu Vienna Husar
070416 Egu Vienna Husar070416 Egu Vienna Husar
070416 Egu Vienna Husar
Rudolf Husar
 

Similar to HEPData workshop talk (20)

High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Hpdw 2015-v10-paper
Hpdw 2015-v10-paperHpdw 2015-v10-paper
Hpdw 2015-v10-paper
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Bringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASBringing OpenClinica Data into SAS
Bringing OpenClinica Data into SAS
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Matlab, Big Data, and HDF Server
Matlab, Big Data, and HDF ServerMatlab, Big Data, and HDF Server
Matlab, Big Data, and HDF Server
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
070416 Egu Vienna Husar
070416 Egu Vienna Husar070416 Egu Vienna Husar
070416 Egu Vienna Husar
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

HEPData workshop talk

  • 2. What is it? HEP Scattering experiments going back to the 1950s Each group of scientists will analyse particular signals by processing large numbers of collisions. The resulting analysis will be published as a paper. But where does the processed data go? RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation)
  • 3. What is it? HEPDATA Physics paper Table description RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation) RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation) Table 1 Table 2 (F1) HEPData is the go to place for physicists to get access to the data underlying plots and tables in a publication. It also links to the scripts and ROOT files for instance used in the analysis (for reproducibility).
  • 5. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox Data Consumers 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments What is new for you? Whether you’re a data provider, or consumer, the new HEPData has many functionalities
  • 6. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 7. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 9. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 10. HEPData submission archive YAML Data Record YAML Data Record ROOT PYTHON C++ ROOT submission.yaml data records external data files & links links the submission together by detailing the data files to be loaded, their name and descrip- tion, and their assocated analysis files and code. YAML (or JSON) representation of the underlying data files including value errors in a verbose format. analysis files, code, links to code repositories, etc.
  • 11. HEPdata submission archive RE P P --> X SQRT(S) IN GEV SIG IN MB 7000 95.35 ± 0.38 (stat) ± 1.25 (sys,experimental) ± 0.37 (sys,extrapolation) HEPDATA Table 1 {JSON} Tables and plots Processes YAML file, inserts records in to database and links publication record with data and files. Web Server Table description Plots rendered automatically using a custom library built upon D3.js Tables rendered from JSON DownloadScripts
  • 12. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 18. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 20. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 21. DOIs All HEPData records get DOIs. Each data table gets a versioned DOI. The whole HEPData record is also given a DOI to encompass the whole collection.
  • 22. Data Providers 1. A Simplified Submission Process 2. A standard entry data format 3. Full review management system 4. Versioning 5. DOI minting 6. Sandbox
  • 23.
  • 26. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  • 27. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  • 28.
  • 29. The System - Demo hepdata.net
  • 30. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  • 31. HEPData Coming soon! Masters Project of Juan Luis Boya Garcia, University of Salamanca, Spain.
  • 32. You’ll be able to ask HEPData’s API for all data matching some number of variables. Retrieve all relevant data, then run your analysis HEPData Programmatically or through our interface No need to manually go through every publication. We do the hard work for you.
  • 33. HEPData Search HEPData Submit Data Sign inSearch Data Query Independent Variable PT (GeV) OR Phrase elastic scattering Delete 10 20 30 Download as 5 15 25 Found 3,400 matching data points in 36 publications Download all asView publications P (GeV) vs Dim 2 P (GeV) vs Dim 3 View publications Download asView publications cmenergies 7000 GeV 3500 GeV 4.5 GeV 12 6 3.3 GeV 4 3 23 more reactions E+ E- --> HADRONS P P --> X P P --> P P 12 6 PBAR P --> X 4 3 45 more observables SIG DSIG/DT SLOPE 12 6 DSIG/DOMEGA 4 3 34 more Data Query Engine phrases Exclusive Inclusive Single Differential 12 6 Strange production 4 3 34 more AND Independent Variable COS(THETA) Delete AND Phrase cross section DeleteDelete
  • 34. Beta mode within the next month The prototype HEPData
  • 35.
  • 36. A new type of search Data driven instead of publication driven Full data provenance captured Every data point is recorded with its parent publication’s inspire id and table number HEPData Fast data rendering With WebGL, now capable of rendering 100s of 1000s of data points in the browser. API Query HEPData directly from mathematica, ROOT, etc. and process our results as you need
  • 37. We’re interested in more use cases…. Please get in touch with us to get early access! HEPData
  • 38. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  • 39. Semantic Publishing Every article is tagged with schema.org vocabulary. Makes it possible for Google and other search engines to understand your content. https://hepdata.net/record/ins1397180 Google’s view Google’s View https://hepdata.net/search
  • 40. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  • 41. Getting Data in, getting data out… Converter Convert from YAML to ROOT, YODA, CSV Install via PIP, use as a web service, and contribute to more conversions! Interoperability with other tools
  • 43. Data Consumers Get access to the data in many environments 1. Publication Driven Search 2. Data Driven Search 3. Semantic Publishing 4. Data Conversion 5. Access in Analysis Environments
  • 44. The same can be said for use in ROOT or any analysis platform with a file parser :) e.g. use data directly in Mathematica No need to leave the software environment you like.
  • 46. Sustainability HEPData is now hosted at CERN and deployed by the RCS-SIS-OA team.
  • 47. Load Balancer hepdata-lb1 Web nodes hepdata-web1 hepdata-web2 Tasks hepdata-task1hepdata-task2 Elastic Search Cluster Client hepdata-search-client Master hepdata-search-master CERN DB on Demand Data hepdata-search-data2 Data hepdata-search-data1 Task 2Task 1 Converter Node hepdata-converter Converter Cache Cache hepdata-cache1 Data hosted on the new EOS file system. Hosted on OpenStack, managed with Puppet
  • 48. Everything on Github! http://www.github.com/hepdata
  • 49. Acknowledgements Eamonn Maguire Jan Stypka Salvatore Mele HEPData @ CERN Graeme Watt Michael Whalley Frank Kraus HEPData @ Durham Lukas Heinrich HEPData @ NYU Kyle Cramner Alumni Laura Rueda-Garcia Michal Szoziak Summer Student and all the Invenio team, and the inspire team including Javier Martin Montull, Jan Age Lavik, and Samuel Kaplun for their support! Juan Luis Boya Garcia HEPData @ Salamanca Thanks to Luca Maschetti and the EOS team for their superb assistance!