SlideShare a Scribd company logo
1 of 26
Daniel Bustamante | Supervisor: Clemens Lange | July 08, 2019
Checking the
CMS datasets
Non-Member State
Summer Student Programme
CERN have made
PUBLIC data collected
by the CMS detector
BACKGROUND
Checking the CMS datasets 02/16
BACKGROUND
Checking the CMS datasets 03/16
First of all, you need to know that ROOT is the
framework used to work with
the collected data
BACKGROUND
Checking the CMS datasets 04/16
The primary data of CMS is in
AOD (Analysis Object Data)
files
They store raw data that
contain all the information
needed for analysis
What does that mean?
So, how can I read them?
ROOT is required to read the
files and understand the
reconstructed data
© CERN, 2014–2019
CERN Open Data Portal contains
datasets of real data recorded with
the CMS detector
Inside the datasets there is
information of different events and
physics object collections
The information is stored in the aforementioned
AOD files, which are conveniently listed in index
files
PROBLEM
Checking the CMS datasets 09/16
That's the problem!
With such a large list of files,
deletion, corruption or loss of data could occur
How do we make sure that is not happening now?
PROPOSED SOLUTION
Checking the CMS datasets 10/16
GFAL (Grid File Access Library) version 2
provides useful command line tools…
gfal-ls is equivalent to the system ls command
Support protocol (root://)
-l option allows long listing format (including size)
PROPOSED SOLUTION
Checking the CMS datasets 11/16
On the other hand…
JSON (JavaScript Object Notation) version of the index
files provides more organized and detailed information
about the ROOT files of each dataset
… just do the replacement .txt → .json
There we can find the size that we expect the ROOT
file to have in case there have been no
manipulations
A SMALL EXAMPLE
Checking the CMS datasets 13/16
gfal-ls will tell us if the file is still available…
It exists (expected result)
gfal-ls error: 2 (No such file or directory) - Failed to
stat file (No such file or directory)
root://eospublic.cern.ch//eos/opendata/cms/Run2010B/Ele
ctron/PATtuples/Electron_PAT_data_500files_1.root
It has not been found (Houston, we have a problem)
$ gfal-ls root://eospublic.cern.ch//eos/opendata/cms/Run2010B/
Electron/PATtuples/Electron_PAT_data_500files_1.root
Possibleresults
A SMALL EXAMPLE
Checking the CMS datasets 14/16
If it exists, we must check the size using gfal-ls -l,
comparing it with the JSON file
Result
$ gfal-ls -l root://eospublic.cern.ch//eos/opendata/cms/
Run2010B/Electron/PATtuples/Electron_PAT_data_500files_1.root
-r-------- 1 1399 125433 12117591860 Sep 4 2014
root://eospublic.cern.ch//eos/opendata/cms/Run2010B/Electron
/PATtuples/Electron_PAT_data_500files_1.root
Do you remember the JSON file?
RESULTS
Checking the CMS datasets 16/16
files with
broken link
index files with
reading problems

More Related Content

What's hot

Loose-Schema Databases and Heterogenous Data
Loose-Schema Databases and Heterogenous DataLoose-Schema Databases and Heterogenous Data
Loose-Schema Databases and Heterogenous Datareeder29
 
A guide to spss - statistical package for the social sciences by statistics h...
A guide to spss - statistical package for the social sciences by statistics h...A guide to spss - statistical package for the social sciences by statistics h...
A guide to spss - statistical package for the social sciences by statistics h...Tutorspoint
 
Sap abap database table
Sap abap database tableSap abap database table
Sap abap database tableDucat
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
Interview questions(programming)
Interview questions(programming)Interview questions(programming)
Interview questions(programming)sunilbhaisora1
 
Database - R.D.Sivakumar
Database - R.D.SivakumarDatabase - R.D.Sivakumar
Database - R.D.SivakumarSivakumar R D .
 
Built in data structures in python
Built in data structures in pythonBuilt in data structures in python
Built in data structures in pythonMaria786439
 
Aplied systems- vocabulary
Aplied systems- vocabularyAplied systems- vocabulary
Aplied systems- vocabularySebastian Silva
 
Progress Report 20091009
Progress Report 20091009Progress Report 20091009
Progress Report 20091009xoanon
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaStuart Chalk
 
Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...
Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...
Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...David de la Fuente
 

What's hot (19)

Gunavathy
GunavathyGunavathy
Gunavathy
 
Loose-Schema Databases and Heterogenous Data
Loose-Schema Databases and Heterogenous DataLoose-Schema Databases and Heterogenous Data
Loose-Schema Databases and Heterogenous Data
 
A guide to spss - statistical package for the social sciences by statistics h...
A guide to spss - statistical package for the social sciences by statistics h...A guide to spss - statistical package for the social sciences by statistics h...
A guide to spss - statistical package for the social sciences by statistics h...
 
Sap abap database table
Sap abap database tableSap abap database table
Sap abap database table
 
Data resources
Data resourcesData resources
Data resources
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
Lecture 2b lists
Lecture 2b listsLecture 2b lists
Lecture 2b lists
 
Dbms
DbmsDbms
Dbms
 
What is a Database?
What is a Database?What is a Database?
What is a Database?
 
Interview questions(programming)
Interview questions(programming)Interview questions(programming)
Interview questions(programming)
 
Database - R.D.Sivakumar
Database - R.D.SivakumarDatabase - R.D.Sivakumar
Database - R.D.Sivakumar
 
Built in data structures in python
Built in data structures in pythonBuilt in data structures in python
Built in data structures in python
 
Spreadsheet
SpreadsheetSpreadsheet
Spreadsheet
 
Aplied systems- vocabulary
Aplied systems- vocabularyAplied systems- vocabulary
Aplied systems- vocabulary
 
1 db terms
1 db terms1 db terms
1 db terms
 
Progress Report 20091009
Progress Report 20091009Progress Report 20091009
Progress Report 20091009
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 
DBMS
DBMSDBMS
DBMS
 
Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...
Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...
Introduction to databases by Ringoth for the TIC class of CIDEB 2014 in Costa...
 

Similar to Checking the CMS datasets

File system is full - what do i do
File system is full - what do i doFile system is full - what do i do
File system is full - what do i doNizar Fanany
 
Data carving using artificial headers info sec conference
Data carving using artificial headers   info sec conferenceData carving using artificial headers   info sec conference
Data carving using artificial headers info sec conferenceRobert Daniel
 
Chapter 13 silbershatz operating systems
Chapter 13 silbershatz operating systemsChapter 13 silbershatz operating systems
Chapter 13 silbershatz operating systemsGiulianoRanauro
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Sql introduction
Sql introductionSql introduction
Sql introductionvimal_guru
 
Bt0066 database management system1
Bt0066 database management system1Bt0066 database management system1
Bt0066 database management system1Techglyphs
 
The Storage Systems
The Storage Systems The Storage Systems
The Storage Systems Dhaivat Zala
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATAAishwarya Saseendran
 
Lecture-1-Windows-Artefacts.pdf
Lecture-1-Windows-Artefacts.pdfLecture-1-Windows-Artefacts.pdf
Lecture-1-Windows-Artefacts.pdfssuserfd0132
 
introduction to information security and management
introduction to information security and managementintroduction to information security and management
introduction to information security and managementChyonChyon
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Conceptsoudesign
 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaApex
 
File Handling and Preprocessor Directives
File Handling and Preprocessor DirectivesFile Handling and Preprocessor Directives
File Handling and Preprocessor DirectivesSelvaraj Seerangan
 
ch12-File-System Implementation (1).pptx
ch12-File-System Implementation (1).pptxch12-File-System Implementation (1).pptx
ch12-File-System Implementation (1).pptxTulasi72
 

Similar to Checking the CMS datasets (20)

File system is full - what do i do
File system is full - what do i doFile system is full - what do i do
File system is full - what do i do
 
02010 ppt ch01
02010 ppt ch0102010 ppt ch01
02010 ppt ch01
 
Data carving using artificial headers info sec conference
Data carving using artificial headers   info sec conferenceData carving using artificial headers   info sec conference
Data carving using artificial headers info sec conference
 
Chapter 5
Chapter 5Chapter 5
Chapter 5
 
Chapter 13 silbershatz operating systems
Chapter 13 silbershatz operating systemsChapter 13 silbershatz operating systems
Chapter 13 silbershatz operating systems
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
1-IntroDB.ppt
1-IntroDB.ppt1-IntroDB.ppt
1-IntroDB.ppt
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
Bt0066 database management system1
Bt0066 database management system1Bt0066 database management system1
Bt0066 database management system1
 
The Storage Systems
The Storage Systems The Storage Systems
The Storage Systems
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
 
Lecture-1-Windows-Artefacts.pdf
Lecture-1-Windows-Artefacts.pdfLecture-1-Windows-Artefacts.pdf
Lecture-1-Windows-Artefacts.pdf
 
File Carving
File CarvingFile Carving
File Carving
 
introduction to information security and management
introduction to information security and managementintroduction to information security and management
introduction to information security and management
 
Week 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental ConceptsWeek 1 Before the Advent of Database Systems & Fundamental Concepts
Week 1 Before the Advent of Database Systems & Fundamental Concepts
 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)Sunita
 
File Handling and Preprocessor Directives
File Handling and Preprocessor DirectivesFile Handling and Preprocessor Directives
File Handling and Preprocessor Directives
 
ch12-File-System Implementation (1).pptx
ch12-File-System Implementation (1).pptxch12-File-System Implementation (1).pptx
ch12-File-System Implementation (1).pptx
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Checking the CMS datasets

  • 1. Daniel Bustamante | Supervisor: Clemens Lange | July 08, 2019 Checking the CMS datasets Non-Member State Summer Student Programme
  • 2. CERN have made PUBLIC data collected by the CMS detector BACKGROUND Checking the CMS datasets 02/16
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. BACKGROUND Checking the CMS datasets 03/16 First of all, you need to know that ROOT is the framework used to work with the collected data
  • 8. BACKGROUND Checking the CMS datasets 04/16 The primary data of CMS is in AOD (Analysis Object Data) files They store raw data that contain all the information needed for analysis What does that mean? So, how can I read them? ROOT is required to read the files and understand the reconstructed data © CERN, 2014–2019
  • 9.
  • 10.
  • 11. CERN Open Data Portal contains datasets of real data recorded with the CMS detector
  • 12.
  • 13. Inside the datasets there is information of different events and physics object collections
  • 14.
  • 15. The information is stored in the aforementioned AOD files, which are conveniently listed in index files
  • 16.
  • 17.
  • 18. PROBLEM Checking the CMS datasets 09/16 That's the problem! With such a large list of files, deletion, corruption or loss of data could occur How do we make sure that is not happening now?
  • 19. PROPOSED SOLUTION Checking the CMS datasets 10/16 GFAL (Grid File Access Library) version 2 provides useful command line tools… gfal-ls is equivalent to the system ls command Support protocol (root://) -l option allows long listing format (including size)
  • 20. PROPOSED SOLUTION Checking the CMS datasets 11/16 On the other hand… JSON (JavaScript Object Notation) version of the index files provides more organized and detailed information about the ROOT files of each dataset … just do the replacement .txt → .json
  • 21.
  • 22. There we can find the size that we expect the ROOT file to have in case there have been no manipulations
  • 23. A SMALL EXAMPLE Checking the CMS datasets 13/16 gfal-ls will tell us if the file is still available… It exists (expected result) gfal-ls error: 2 (No such file or directory) - Failed to stat file (No such file or directory) root://eospublic.cern.ch//eos/opendata/cms/Run2010B/Ele ctron/PATtuples/Electron_PAT_data_500files_1.root It has not been found (Houston, we have a problem) $ gfal-ls root://eospublic.cern.ch//eos/opendata/cms/Run2010B/ Electron/PATtuples/Electron_PAT_data_500files_1.root Possibleresults
  • 24. A SMALL EXAMPLE Checking the CMS datasets 14/16 If it exists, we must check the size using gfal-ls -l, comparing it with the JSON file Result $ gfal-ls -l root://eospublic.cern.ch//eos/opendata/cms/ Run2010B/Electron/PATtuples/Electron_PAT_data_500files_1.root -r-------- 1 1399 125433 12117591860 Sep 4 2014 root://eospublic.cern.ch//eos/opendata/cms/Run2010B/Electron /PATtuples/Electron_PAT_data_500files_1.root Do you remember the JSON file?
  • 25.
  • 26. RESULTS Checking the CMS datasets 16/16 files with broken link index files with reading problems