SlideShare a Scribd company logo
1 of 26
HEP and its data
                    What is the trouble?
                 A possible way forward


Salvatore Mele         Costs, Benefits and Motivations
                               for Digital Preservation
CERN                          Nice - November 28th 2008
High-Energy Physics (or Particle Physics)
HEP aims to understand how our Universe works:
— by discovering the most elementary constituents
of matter and energy
— by probing their interactions
— by exploring the basic nature of space and time

In other words, try to answer two eternal questions:
— quot;What is the world made of?”
— quot;What holds it together?”

Build the largest scientific instruments ever to reach
energy densities close to the Big Bang; write theories
  to predict and describe the observed phenomena
CERN: European Organization for
     Nuclear Research (since 1954)
•   The world leading HEP laboratory, Geneva (CH)
•   2500 staff (mostly engineers, from Europe)
•   9000 users (mostly physicists, from 580 institutes in 85 countries)
•   3 Nobel prizes (Accelerators, Detectors, Discoveries)
•   Invented the web
•   Just switched on the 27-km (6bn€) LHC accelerator
•   Runs a 1-million objects Digital Library

CERN Convention (1953): ante-litteram Open Access mandate
“… the results of its experimental and theoretical work shall be
  published or otherwise made generally available”
The Large Hadron Collider

•Largest scientific instrument
 ever built, 27km of circumference
•The “coolest” place in the Universe
 -271˚C
•10000 people involved in its
 design and construction
•Worldwide budget of 6bn€

•Will collide protons to reproduce
  conditions at the birth of the
  Universe...
...40 million times a second
The LHC experiments:
   about 100 million “sensors” each
   [think your 6MP digital camera...
...taking 40 million pictures a second]
                       ATLAS   CMS
five-storey building
The LHC data
• 40 million events (pictures) per second
• Select (on the fly) the ~200 interesting
  events per second to write on tape
• “Reconstruct” data and convert for analysis:
  “physics data” [inventing the grid...]
    (x4 experiments x15 years)   Per event   Per year
    Raw data                       1.6 MB    3200 TB
    Reconstructed data             1.0 MB    2000 TB
    Physics data                   0.1 MB     200 TB
Preservation, re-use and (Open)
Access to HEP data



Problem
            Opportunity
                          Challenge
Some other HEP facilities
       (recently stopped or about to stop)
No real long-term archival strategy...
                  ...against billions of invested funds!
 Energy frontier
                             HERA@DESY
      TEVATRON@FNAL                         BELLE@KEK
                               KLOE@LNF
                    LEP@CERN
 BABAR@SLAC




                         Precision frontier
Why shall we care?


•We cared producing these data in first instance
•Unique, extremely costly, non-reproducible
•Might need to go back to the past (it happened)

•A peculiar community (the web, arXiv, the grid...)
•“Worst-case scenario” complex data, no
preservation... “If it works here, will work in many
other places”
Preservation, re-use and (open)
 access continua (who and when)
• The same researchers who took the data, after the
  closure of the facility (~1 year, ~10 years)
• Researchers working at similar experiments at the
  same time (~1 day, week, month, year)
• Researchers of future experiments (~20 years)
• Theoretical physicists who may want to re-
  interpret the data (~1 month, ~1 year, ~10 years)
• Theoretical physicists who may want to test future
  ideas (~1 year, ~10 years, ~20 years)
Data preservation, circa 2004




                 140 pages of tables
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
HEP, Open Access & Repositories
• HEP is decades ahead in thinking Open Access:
  – Mountains of paper preprints shipped around the
    world for 40 years (at author/institute expenses!)
  – Launched arXiv (1991), archetypal Open Archive
  – >90% HEP production self-archived in repositories
  – 100% HEP production indexed in SPIRES(community
    run database, first WWW server on US soil)
• OA is second nature: posting on arXiv before
  submitting to a journal is common practice
  – No mandate, no debate. Author-driven.
• HEP scholars have the tradition of arXiving
  their output (helas, articles) somewhere
 Data repositories would be a natural concept
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
Life-cycle of previous-generation
   CERN experiment L3 at LEP
1984   Begin of construction
1989   Start of data taking
2000   End of data taking
2002   End of in-silico experiments
2005   End of (most) data analysis
  Storage and migration of data
  at the CERN computing centre
1993   ~150’000 9track  3480  0.2GB
1997   ~250’000 3480  Redwood 20GB
2001   ~25’000 Redwood  9940   60GB
2004   ~5’000 9940A  9940B    200GB
2007   ~22’000 9940B  T1000A 500GB
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?
Life-cycle of previous-generation
    CERN experiment L3 at LEP
1984    Begin of construction
1989    Start of data taking
2000    End of data taking
2002    End of in-silico experiments
2005    End of (most) data analysis
       Computing environment of
       the L3 experiment at LEP
1989-2001    VAX for data taking
1986-1994    IBM for data analysis
1992-1998    Apollo (HP) workstations
1996-2001    SGI mainframe
1997-2007    Linux boxes
Life-cycle of previous-generation
    CERN experiment L3 at LEP
1984    Begin of construction
1989    Start of data taking
2000    End of data taking
2002    End of in-silico experiments
2005    End of (most) data analysis
       Computing environment of
       the L3 experiment at LEP
1989-2001    VAX for data taking
1986-1994    IBM for data analysis
1992-1998    Apollo (HP) workstations
1996-2001    SGI mainframe
1997-2007    Linux boxes
What is the trouble with
 preserving HEP data?


Where to put them ?
Hardware migration ?
Software migration/emulation?

The HEP data !
Balloon

       Preserving HEP data?                             (30 km)



• The HEP data model is highly complex.             CD stack with
                                                    1 year LHC data!
  Data are traditionally not re-used as in          (~ 20 km)


  Astronomy or Climate science.
• Raw data  calibrated data  skimmed
  data  high-level objects  physics
  analyses  results.                           Concorde
                                                (15 km)
• All of the above duplicated for in-silico
  experiments, necessary to interpret the
  highly-complex data.
• Final results depend on the grey
  literature on calibration constants,          Mt. Blanc
                                                (4.8 km)
  human knowledge and algorithms
  needed for each pass...oral tradition!
• Years of training for a successful analysis
Need a“parallel way” to
 publish/preserve/re-use/OpenAccess
• In addition to experiment data models, elaborate a
  parallel format for (re-)usable high-level objects
   – In times of need (to combine data of “competing”
     experiments) this approach has worked
   – Embed the “oral” and “additional” knowledge
• A format understandable and thus re-usable by
  practitioners in other experiments and theorists
• Start from tables and work back towards primary data
• How much additional work? 1%, 5%, 10%?
Issues with the “parallel” way
• A small fraction of a big number gives a large number
• Activity in competition with research time
• 1000s person-years for parallel data models need
  enormous (impossible?) academic incentives for
  realization      ...or additional (external) funds
• Need insider knowledge to produce parallel data
• Address issues of (open) access, credit, accountability,
  “careless measurements”, “careless discoveries”,
  reproducibility of results, depth of peer-reviewing
• A monolithic way of doing business needs rethinking
Preservation, re-use and (Open)
   Access to HEP data... first steps!
•Outgrowing an institutionalized state of denial
•A difficult and costly way ahead
•An issue which starts surfacing on the agenda
  — Part of an integrated digital-library vision
  — CERN joined the Alliance for Permanent Access
  — CERN part of the FP7 PARSE.Insight project
•PARSE.Insight case study on HEP preservation:
  — HEP tradition: community-inspired and community-built
  (e-)infrastructures for research
  — Understand the desires and concerns of the community
  — Chart a way out for data preservation in the “worst-
  case scenario” of HEP, benefiting other communities
  — Debates and workshops planned in next months
HEP Community Survey - First findings
500+ scientists answering a survey in its 1st week. 50% from Europe, 25% from the US.
34% left their e-mail address to hear more. 17% wants to be interviewed to say more!


             0.4%

                    7.7%

                                               24.2%

                                                                        39.0%

                                                       28.8%


•92%: data preservation is important, very important or crucial
•40%: thinks important HEP data have been lost
•45%: access to old data would have improved my scientific
results
•74%: a trans-national infrastructure for long-term preservation
of all kinds of scientific data should be built
  Two dozen more questions about do’s and dont’s in HEP preservation
Conclusions
• HEP spearheaded (Open) Access to Scientific
  Information: 50 years of preprints, 16 of repositories
  ... but data preservation is not yet on the radar
• Heterogeneous continua to preserve data for
• No insurmountable technical problems
• The issue is the data model itself
   – (Primary) data intelligible only to the producers
   – A monolithic culture need a paradigm shift
   – Preservation implies massive person-power costs
• Need cultural consensus and financial support
     Preservation is appearing on the agenda...
         Powerful synergy between CERN and
 the Alliance for Permanent Access & PARSE.Insight
           Exciting times are ahead!

More Related Content

What's hot

Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Parton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCParton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCjuanrojochacon
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsJose Enrique Ruiz
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailOverlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailJose Antonio Coarasa Perez
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distddm314
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 

What's hot (20)

Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Big Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS ExampleBig Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS Example
 
Parton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCParton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHC
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailOverlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in Detail
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 dist
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 

Viewers also liked (9)

Long Term Preservation Dale Peters
Long Term Preservation Dale PetersLong Term Preservation Dale Peters
Long Term Preservation Dale Peters
 
just check it
just check itjust check it
just check it
 
funny kids
funny kidsfunny kids
funny kids
 
Infrastructure Training Session
Infrastructure Training SessionInfrastructure Training Session
Infrastructure Training Session
 
Presentazione Lunghi
Presentazione LunghiPresentazione Lunghi
Presentazione Lunghi
 
Trm Vilnius Metadata New
Trm Vilnius Metadata NewTrm Vilnius Metadata New
Trm Vilnius Metadata New
 
think about it
think about itthink about it
think about it
 
Lunghi Lisbon
Lunghi LisbonLunghi Lisbon
Lunghi Lisbon
 
Appraisal, Selection And Preservation Hans Hofman
Appraisal, Selection And Preservation Hans HofmanAppraisal, Selection And Preservation Hans Hofman
Appraisal, Selection And Preservation Hans Hofman
 

Similar to Preservation And Reuse In High Energy Physics Salvatore Mele

Computing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron ColliderComputing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron Colliderinside-BigData.com
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTimeScience
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...jaxLondonConference
 
Introducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data PortalIntroducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data PortalGlobus
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...ExtremeEarth
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big DiscoveriesGovnet Events
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...Brad Evans
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Integration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresIntegration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresMark Borkum
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...EarthCube
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 

Similar to Preservation And Reuse In High Energy Physics Salvatore Mele (20)

Computing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron ColliderComputing Challenges at the Large Hadron Collider
Computing Challenges at the Large Hadron Collider
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
Jarp big data_sydney_v7
Jarp big data_sydney_v7Jarp big data_sydney_v7
Jarp big data_sydney_v7
 
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystemTraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
TraitCapture: NextGen Monitoring and Visualization from seed to ecosystem
 
Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...Big data from the LHC commissioning: practical lessons from big science - Sim...
Big data from the LHC commissioning: practical lessons from big science - Sim...
 
Eln Jisc Mrc 18dec07 Nsb
Eln Jisc Mrc 18dec07 NsbEln Jisc Mrc 18dec07 Nsb
Eln Jisc Mrc 18dec07 Nsb
 
Introducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data PortalIntroducing the HACC Simulation Data Portal
Introducing the HACC Simulation Data Portal
 
Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...Artificial Intelligence in the Earth Observation Domain: Current European Res...
Artificial Intelligence in the Earth Observation Domain: Current European Res...
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
 
Big data
Big dataBig data
Big data
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Integration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structuresIntegration of oreChem with the eCrystals repository for crystal structures
Integration of oreChem with the eCrystals repository for crystal structures
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 

More from DigitalPreservationEurope

Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigitalPreservationEurope
 
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...DigitalPreservationEurope
 
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerSignificant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerDigitalPreservationEurope
 
Scalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingScalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingDigitalPreservationEurope
 
Preservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallPreservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallDigitalPreservationEurope
 

More from DigitalPreservationEurope (20)

Drm Training Session
Drm Training SessionDrm Training Session
Drm Training Session
 
2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor
 
Trusted Repositories
Trusted RepositoriesTrusted Repositories
Trusted Repositories
 
Preservation Metadata
Preservation MetadataPreservation Metadata
Preservation Metadata
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Introduction to Planets
Introduction to PlanetsIntroduction to Planets
Introduction to Planets
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and Requirements
 
The Planets Preservation Planning workflow
The Planets Preservation Planning workflowThe Planets Preservation Planning workflow
The Planets Preservation Planning workflow
 
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
 
Preservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCCPreservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCC
 
PLATTER - Jan Hutar
PLATTER - Jan HutarPLATTER - Jan Hutar
PLATTER - Jan Hutar
 
Sustainability Clive Billenness
Sustainability Clive  BillennessSustainability Clive  Billenness
Sustainability Clive Billenness
 
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerSignificant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
 
Shaman Project Hemmje
Shaman Project  HemmjeShaman Project  Hemmje
Shaman Project Hemmje
 
Scalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross KingScalable Services For Digital Preservation Ross King
Scalable Services For Digital Preservation Ross King
 
Risks Benefits And Motivations Seamus Ross
Risks Benefits And Motivations Seamus RossRisks Benefits And Motivations Seamus Ross
Risks Benefits And Motivations Seamus Ross
 
Representation Information Steve Rankin
Representation Information Steve RankinRepresentation Information Steve Rankin
Representation Information Steve Rankin
 
Preservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallPreservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian Upshall
 
Platter Colin Rosenthal
Platter Colin RosenthalPlatter Colin Rosenthal
Platter Colin Rosenthal
 
Planets Testbed Brian Aitken
Planets Testbed Brian AitkenPlanets Testbed Brian Aitken
Planets Testbed Brian Aitken
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Preservation And Reuse In High Energy Physics Salvatore Mele

  • 1. HEP and its data What is the trouble? A possible way forward Salvatore Mele Costs, Benefits and Motivations for Digital Preservation CERN Nice - November 28th 2008
  • 2. High-Energy Physics (or Particle Physics) HEP aims to understand how our Universe works: — by discovering the most elementary constituents of matter and energy — by probing their interactions — by exploring the basic nature of space and time In other words, try to answer two eternal questions: — quot;What is the world made of?” — quot;What holds it together?” Build the largest scientific instruments ever to reach energy densities close to the Big Bang; write theories to predict and describe the observed phenomena
  • 3. CERN: European Organization for Nuclear Research (since 1954) • The world leading HEP laboratory, Geneva (CH) • 2500 staff (mostly engineers, from Europe) • 9000 users (mostly physicists, from 580 institutes in 85 countries) • 3 Nobel prizes (Accelerators, Detectors, Discoveries) • Invented the web • Just switched on the 27-km (6bn€) LHC accelerator • Runs a 1-million objects Digital Library CERN Convention (1953): ante-litteram Open Access mandate “… the results of its experimental and theoretical work shall be published or otherwise made generally available”
  • 4. The Large Hadron Collider •Largest scientific instrument ever built, 27km of circumference •The “coolest” place in the Universe -271˚C •10000 people involved in its design and construction •Worldwide budget of 6bn€ •Will collide protons to reproduce conditions at the birth of the Universe... ...40 million times a second
  • 5. The LHC experiments: about 100 million “sensors” each [think your 6MP digital camera... ...taking 40 million pictures a second] ATLAS CMS five-storey building
  • 6. The LHC data • 40 million events (pictures) per second • Select (on the fly) the ~200 interesting events per second to write on tape • “Reconstruct” data and convert for analysis: “physics data” [inventing the grid...] (x4 experiments x15 years) Per event Per year Raw data 1.6 MB 3200 TB Reconstructed data 1.0 MB 2000 TB Physics data 0.1 MB 200 TB
  • 7. Preservation, re-use and (Open) Access to HEP data Problem Opportunity Challenge
  • 8. Some other HEP facilities (recently stopped or about to stop) No real long-term archival strategy... ...against billions of invested funds! Energy frontier HERA@DESY TEVATRON@FNAL BELLE@KEK KLOE@LNF LEP@CERN BABAR@SLAC Precision frontier
  • 9. Why shall we care? •We cared producing these data in first instance •Unique, extremely costly, non-reproducible •Might need to go back to the past (it happened) •A peculiar community (the web, arXiv, the grid...) •“Worst-case scenario” complex data, no preservation... “If it works here, will work in many other places”
  • 10. Preservation, re-use and (open) access continua (who and when) • The same researchers who took the data, after the closure of the facility (~1 year, ~10 years) • Researchers working at similar experiments at the same time (~1 day, week, month, year) • Researchers of future experiments (~20 years) • Theoretical physicists who may want to re- interpret the data (~1 month, ~1 year, ~10 years) • Theoretical physicists who may want to test future ideas (~1 year, ~10 years, ~20 years)
  • 11. Data preservation, circa 2004 140 pages of tables
  • 12. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 13. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 14. HEP, Open Access & Repositories • HEP is decades ahead in thinking Open Access: – Mountains of paper preprints shipped around the world for 40 years (at author/institute expenses!) – Launched arXiv (1991), archetypal Open Archive – >90% HEP production self-archived in repositories – 100% HEP production indexed in SPIRES(community run database, first WWW server on US soil) • OA is second nature: posting on arXiv before submitting to a journal is common practice – No mandate, no debate. Author-driven. • HEP scholars have the tradition of arXiving their output (helas, articles) somewhere Data repositories would be a natural concept
  • 15. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 16. Life-cycle of previous-generation CERN experiment L3 at LEP 1984 Begin of construction 1989 Start of data taking 2000 End of data taking 2002 End of in-silico experiments 2005 End of (most) data analysis Storage and migration of data at the CERN computing centre 1993 ~150’000 9track  3480 0.2GB 1997 ~250’000 3480  Redwood 20GB 2001 ~25’000 Redwood  9940 60GB 2004 ~5’000 9940A  9940B 200GB 2007 ~22’000 9940B  T1000A 500GB
  • 17. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation?
  • 18. Life-cycle of previous-generation CERN experiment L3 at LEP 1984 Begin of construction 1989 Start of data taking 2000 End of data taking 2002 End of in-silico experiments 2005 End of (most) data analysis Computing environment of the L3 experiment at LEP 1989-2001 VAX for data taking 1986-1994 IBM for data analysis 1992-1998 Apollo (HP) workstations 1996-2001 SGI mainframe 1997-2007 Linux boxes
  • 19. Life-cycle of previous-generation CERN experiment L3 at LEP 1984 Begin of construction 1989 Start of data taking 2000 End of data taking 2002 End of in-silico experiments 2005 End of (most) data analysis Computing environment of the L3 experiment at LEP 1989-2001 VAX for data taking 1986-1994 IBM for data analysis 1992-1998 Apollo (HP) workstations 1996-2001 SGI mainframe 1997-2007 Linux boxes
  • 20. What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation? The HEP data !
  • 21. Balloon Preserving HEP data? (30 km) • The HEP data model is highly complex. CD stack with 1 year LHC data! Data are traditionally not re-used as in (~ 20 km) Astronomy or Climate science. • Raw data  calibrated data  skimmed data  high-level objects  physics analyses  results. Concorde (15 km) • All of the above duplicated for in-silico experiments, necessary to interpret the highly-complex data. • Final results depend on the grey literature on calibration constants, Mt. Blanc (4.8 km) human knowledge and algorithms needed for each pass...oral tradition! • Years of training for a successful analysis
  • 22. Need a“parallel way” to publish/preserve/re-use/OpenAccess • In addition to experiment data models, elaborate a parallel format for (re-)usable high-level objects – In times of need (to combine data of “competing” experiments) this approach has worked – Embed the “oral” and “additional” knowledge • A format understandable and thus re-usable by practitioners in other experiments and theorists • Start from tables and work back towards primary data • How much additional work? 1%, 5%, 10%?
  • 23. Issues with the “parallel” way • A small fraction of a big number gives a large number • Activity in competition with research time • 1000s person-years for parallel data models need enormous (impossible?) academic incentives for realization ...or additional (external) funds • Need insider knowledge to produce parallel data • Address issues of (open) access, credit, accountability, “careless measurements”, “careless discoveries”, reproducibility of results, depth of peer-reviewing • A monolithic way of doing business needs rethinking
  • 24. Preservation, re-use and (Open) Access to HEP data... first steps! •Outgrowing an institutionalized state of denial •A difficult and costly way ahead •An issue which starts surfacing on the agenda — Part of an integrated digital-library vision — CERN joined the Alliance for Permanent Access — CERN part of the FP7 PARSE.Insight project •PARSE.Insight case study on HEP preservation: — HEP tradition: community-inspired and community-built (e-)infrastructures for research — Understand the desires and concerns of the community — Chart a way out for data preservation in the “worst- case scenario” of HEP, benefiting other communities — Debates and workshops planned in next months
  • 25. HEP Community Survey - First findings 500+ scientists answering a survey in its 1st week. 50% from Europe, 25% from the US. 34% left their e-mail address to hear more. 17% wants to be interviewed to say more! 0.4% 7.7% 24.2% 39.0% 28.8% •92%: data preservation is important, very important or crucial •40%: thinks important HEP data have been lost •45%: access to old data would have improved my scientific results •74%: a trans-national infrastructure for long-term preservation of all kinds of scientific data should be built Two dozen more questions about do’s and dont’s in HEP preservation
  • 26. Conclusions • HEP spearheaded (Open) Access to Scientific Information: 50 years of preprints, 16 of repositories ... but data preservation is not yet on the radar • Heterogeneous continua to preserve data for • No insurmountable technical problems • The issue is the data model itself – (Primary) data intelligible only to the producers – A monolithic culture need a paradigm shift – Preservation implies massive person-power costs • Need cultural consensus and financial support Preservation is appearing on the agenda... Powerful synergy between CERN and the Alliance for Permanent Access & PARSE.Insight Exciting times are ahead!