SlideShare a Scribd company logo
1 of 29
Download to read offline
PIC Deployment Scenarios
V. Acín, J. Casals, M. Delfino, J. Delgado
ARCHIVER Open Market Consultation event
London Stansted, May 23rd
2019
About PIC (scientific support)
● Port d’Informació Científica (PIC, the Scientific Information Harbour) is
Spain’s largest scientific data centre for Particle Physics and Astrophysics
● CERN’s Large Hadron Collider (LHC):
One of the 12 first-level (Tier-1) data processing centres.
● Imaging Atmospheric Cherenkov Telescopes: Custodial data centre for the
MAGIC Telescopes and the first Large Scale Telescope prototype for the
next-generation Cherenkov Telescope Array (CTA, an ESFRI landmark).
● Observational Cosmology: One of the 9 Science Data Centres for ESA’s
Euclid mission, custodial data centre for huge simulations of the Universe
expansion and the Physics of the Accelerating Universe (PAU) survey.
● Innovative “Big Data” platform for massive analysis of big datasets.
● 20 people, 50% engineers and 50% Ph.D.s (Comp. Sci, Physics, Chemistry)
2
About PIC (technical)
● 8500 x86 cores (mostly bare-metal, scheduled through HTCondor)
● 11 PiB disk (dCache) + 25 PiB tape (Enstore) with active HSM
● Overprovisioned 10Gbps LAN (moving to 100 Gbps next year)
● 2x10 Gbps WAN, optical paths to CERN and ORM
● Hadoop cluster for data analysis
○ 16 nodes, 2 TiB RAM, 192 TiB HDD
○ Prototyping with NVMe and NVDIMM
● GPU: proof of concept in training neural nets
● Heavily automated installation: puppet, Icinga, grafana, etc.
● Compact, highly energy efficient installation
3
PIC’s liquid immersion cooling installation
4
Bottom up description of scenarios and workflows
● Actors:
○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain)
○ Private Data Center (PIC near Barcelona, Spain)
○ Instrument Analysts (closed group of users)
○ External users (scientists not members of MAGIC, public access)
● Scenarios:
○ Large file safe-keeping
○ + Mixed-size file safe-keeping
○ + In-archive data processing
○ + Data distribution to Instrument Analysts
○ + Data utilization by External users
5
Large file safe-keeping workflow
6
Example:
MAGIC Telescopes
located at
Observatorio del
Roque de los
Muchachos, La
Palma, Canary
Islands
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
7
Original service used shared 1 Gbps
general connection ORM-RedIRIS
10 Gbps λ implemented to ensure
compliance with the 8-hour window
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
Data characteristics:
Inmutable (read-only)
Binary private format
Single bit error in a file renders it useless
Two metadata items: filename, checksum
8
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
Data characteristics:
Inmutable (read-only)
Binary private format
Single bit error in a file renders it useless
Two metadata items: filename, checksum
Data stewardship:
Year 1: Data accumulates: 150k 2 GB files = 300 TB
Years 1-6: Data are bit-preserved
Random time(s) in years 2-6:
Full 300 TB recalled to disk and reprocessed
9
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
Data characteristics:
Inmutable (read-only)
Binary private format
Single bit error in a file renders it useless
Two metadata items: filename, checksum
Data stewardship:
Year 1: Data accumulates: 150k 2 GB files = 300 TB
Years 1-6: Data are bit-preserved
Random time(s) in years 2-6:
Full 300 TB recalled to disk and reprocessed
Challenges:
365 days/year x 8 hour time window to complete data transfer, storage and verification
Non-predictable full recall must be accomplished in 30 days or less. (Mitigation: Advance notification)
Cost expectation: compatible with telescope maintenance costs → < 30k euros per 300 TB stored 5 yrs
10
Some motivations for moving to commercial service
11
Some motivations for moving to commercial service
12
Some motivations for moving to commercial service
13
● Cherenkov Telescope Array uses OAIS as the basis for their archive
● Pressure to focus on Layer 3 and 4 services
● Cost evolution and hyper-scaling
● Limited physical space on campus
● Disaster recovery
● Uncertainties on availability of tape equipment for on-premises installation
But there is also a list of motivations NOT to move to a commercial service !!!!!
● Main item: Distrust of commercial services
Commercial safe-keeping deployment scenario
Daily: 500-1000 2 GB files
RedIRIS 10 Gbps λ
Commercial
Provider
RedIRIS+Géant
2-week notice
Full recall complete in 30 days
Interface:
put/get with full error recovery + status check
CLI + scriptable + programmable
Secure with one expert user.
Any reasonable AA method compatible with interfacing requirements.
300 TB
kept for
5 years
14
Commercial safe-keeping deployment scenario
Daily: 500-1000 2 GB files
Commercial
Provider
2-week notice
Full recall complete in 30 days
Interface:
put/get with full error recovery + status check
CLI + scriptable + programmable
Secure with one expert user.
Any reasonable AA method compatible with interfacing requirements.
300 TB
kept for
5 years
15
RedIRIS+Géant
Commercial safe-keeping deployment scenario
Daily: 500-1000 2 GB files
RedIRIS 10 Gbps λ
Commercial
Provider
Scrubbing: Every file
re-read and checksummed
once per year without buyer
intervention
RedIRIS+Géant
RedIRIS+Géant
2-week notice
Full recall complete in 30 days
2-week notice
Heartbeat: Random
1% sample recalled monthly
Future: Trust through OAIS/ISO?
Interface:
put/get with full error recovery + status check
CLI + scriptable + programmable
Secure with one expert user. Any reasonable AA
method compatible with interfacing requirements.
300 TB
kept for
5 years
16
Mixed file safe-keeping workflow and scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
17
Mixed file safe-keeping workflow and scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Additional metadata tag: version
18
Mixed file safe-keeping workflow and scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Challenges:
put/get directly by reprocessing workflow @PIC
150k files input gives 450k files output to be stored
Cost compatible with maintenance costs:
< 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess
Additional metadata tag: version
19
+ In-archive processing scenario workflow/scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Additional metadata tag: version
Commercial provider
in-archive processing
20
+ In-archive processing scenario workflow/scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Challenges:
put/get directly by reprocessing workflow @PIC
150k files input gives 450k files output to be stored
Cost compatible with maintenance costs:
< 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess
+ competitive price for CPU with appropriate I/O
Additional metadata tag: version
Commercial provider
in-archive processing
21
Data distribution to Instrument Analysts
Commercial Provider
Scrubbing
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
+ metadata handling
Metadata produced in origin
Extensible
metadata
generated
by experts
22
Data distribution to Instrument Analysts
Commercial Provider
Scrubbing
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
+ metadata handling
Metadata query
Download subset of files
XXX AAI
(AD@Azure)
MAGIC AAI
(ldap@PIC)
Optional addtl. methods
mount+file system emulation
Selective sync-and-share
Metadata produced in origin
Worldwide
users
Extensible
metadata
generated
by experts
23
Data distribution to Instrument Analysts
Commercial Provider
Scrubbing
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
+ metadata handling
Metadata query
Download subset of files
XXX AAI
(AD@Azure)
MAGIC AAI
(ldap@PIC)
Optional addtl. methods
mount+file system emulation
Selective sync-and-share
Metadata produced in origin
Worldwide
users
Extensible
metadata
generated
by experts
24
Challenges:
Interface to multiple, existing, external AA systems + create ACL-type environment
Data must be “online” - “Raw” data component could be excluded
Provide extensible metadata handling system and drive file access by metadata queries
Cost compatible with maintenance costs: < 60 k€ for 5 years of v1 service
What in-archive data analysis and presentation may look like
25
Extension to External Users
● From other scientific projects
○ Add additional AAI providers and use group management tools (Co-manage or Grouper)
○ If too many AAI providers, still use group management tools and
■ Move to edugain or
■ Move to ORCID
● “Open” data
○ Open ≠ uncontrolled
○ Need to know who accessed data
■ Citation control
■ Statistical information to demonstrate value
● Most likely both will need In-archive analysis / viewing tools
Work in progress
26
1-year MAGIC Telescope as example. Others...
● Cherenkov Telescope Array will have two sites (Northern and Southern
hemispheres) with 10 large telescopes and 100s of smaller ones
● Studies of the expansion of the Universe with Optical Telescopes which
produce data from one-week campaigns to 365 days/year
● Supercomputer simulation production can look a lot like an instrument that
produces a lot of data during a short time
● High volume applications such as High Luminosity LHC
● etc…
27
Bottom up description of scenarios and workflows
● Actors:
○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain)
○ Private Data Center (PIC near Barcelona, Spain)
○ Instrument Analysts (closed group of users)
○ External users (scientists not members of MAGIC, public access)
● Scenarios:
○ Large file safe-keeping
○ + Mixed-size file safe-keeping
○ + In-archive data processing
○ + Data distribution to Instrument Analysts
○ + Data utilization by External users
28
29
Helping to turn Information into Knowledge

More Related Content

What's hot

What's hot (20)

Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Open Access Repository Junction
Open Access Repository JunctionOpen Access Repository Junction
Open Access Repository Junction
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Helix Nebula Cloud Procurement Activities
Helix Nebula Cloud Procurement ActivitiesHelix Nebula Cloud Procurement Activities
Helix Nebula Cloud Procurement Activities
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
RES data projects report 2021
RES data projects report 2021RES data projects report 2021
RES data projects report 2021
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
 
UK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas WorkshopUK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas Workshop
 
HNSciCloud Status Update
HNSciCloud Status UpdateHNSciCloud Status Update
HNSciCloud Status Update
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Helix Nebula - The Science Cloud, Status Update
Helix Nebula - The Science Cloud, Status UpdateHelix Nebula - The Science Cloud, Status Update
Helix Nebula - The Science Cloud, Status Update
 
GeoKnow: Making the Web an Exploratory Place for Spatial Data
GeoKnow: Making the Web an Exploratory Place for Spatial DataGeoKnow: Making the Web an Exploratory Place for Spatial Data
GeoKnow: Making the Web an Exploratory Place for Spatial Data
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at Bristol
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 

Similar to Pic archiver stansted

Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosMuehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
EUscreen
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 

Similar to Pic archiver stansted (20)

Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
Kafka Summit SF 2017 - Accelerating Particles to Explore the Mysteries of the...
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
 
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosMuehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
 
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
 
20170926 cern cloud v4
20170926 cern cloud v420170926 cern cloud v4
20170926 cern cloud v4
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 

More from Archiver

More from Archiver (20)

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶
 
Overview of the EOSC¶
Overview of the EOSC¶Overview of the EOSC¶
Overview of the EOSC¶
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender Requirements
 
Project update - João Fernandes
Project update - João FernandesProject update - João Fernandes
Project update - João Fernandes
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stansted
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fim
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_final
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overview
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestino
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project Overview
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geant
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overview
 

Recently uploaded

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 

Recently uploaded (20)

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 

Pic archiver stansted

  • 1. PIC Deployment Scenarios V. Acín, J. Casals, M. Delfino, J. Delgado ARCHIVER Open Market Consultation event London Stansted, May 23rd 2019
  • 2. About PIC (scientific support) ● Port d’Informació Científica (PIC, the Scientific Information Harbour) is Spain’s largest scientific data centre for Particle Physics and Astrophysics ● CERN’s Large Hadron Collider (LHC): One of the 12 first-level (Tier-1) data processing centres. ● Imaging Atmospheric Cherenkov Telescopes: Custodial data centre for the MAGIC Telescopes and the first Large Scale Telescope prototype for the next-generation Cherenkov Telescope Array (CTA, an ESFRI landmark). ● Observational Cosmology: One of the 9 Science Data Centres for ESA’s Euclid mission, custodial data centre for huge simulations of the Universe expansion and the Physics of the Accelerating Universe (PAU) survey. ● Innovative “Big Data” platform for massive analysis of big datasets. ● 20 people, 50% engineers and 50% Ph.D.s (Comp. Sci, Physics, Chemistry) 2
  • 3. About PIC (technical) ● 8500 x86 cores (mostly bare-metal, scheduled through HTCondor) ● 11 PiB disk (dCache) + 25 PiB tape (Enstore) with active HSM ● Overprovisioned 10Gbps LAN (moving to 100 Gbps next year) ● 2x10 Gbps WAN, optical paths to CERN and ORM ● Hadoop cluster for data analysis ○ 16 nodes, 2 TiB RAM, 192 TiB HDD ○ Prototyping with NVMe and NVDIMM ● GPU: proof of concept in training neural nets ● Heavily automated installation: puppet, Icinga, grafana, etc. ● Compact, highly energy efficient installation 3
  • 4. PIC’s liquid immersion cooling installation 4
  • 5. Bottom up description of scenarios and workflows ● Actors: ○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain) ○ Private Data Center (PIC near Barcelona, Spain) ○ Instrument Analysts (closed group of users) ○ External users (scientists not members of MAGIC, public access) ● Scenarios: ○ Large file safe-keeping ○ + Mixed-size file safe-keeping ○ + In-archive data processing ○ + Data distribution to Instrument Analysts ○ + Data utilization by External users 5
  • 6. Large file safe-keeping workflow 6 Example: MAGIC Telescopes located at Observatorio del Roque de los Muchachos, La Palma, Canary Islands
  • 7. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ 7 Original service used shared 1 Gbps general connection ORM-RedIRIS 10 Gbps λ implemented to ensure compliance with the 8-hour window
  • 8. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum 8
  • 9. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum Data stewardship: Year 1: Data accumulates: 150k 2 GB files = 300 TB Years 1-6: Data are bit-preserved Random time(s) in years 2-6: Full 300 TB recalled to disk and reprocessed 9
  • 10. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum Data stewardship: Year 1: Data accumulates: 150k 2 GB files = 300 TB Years 1-6: Data are bit-preserved Random time(s) in years 2-6: Full 300 TB recalled to disk and reprocessed Challenges: 365 days/year x 8 hour time window to complete data transfer, storage and verification Non-predictable full recall must be accomplished in 30 days or less. (Mitigation: Advance notification) Cost expectation: compatible with telescope maintenance costs → < 30k euros per 300 TB stored 5 yrs 10
  • 11. Some motivations for moving to commercial service 11
  • 12. Some motivations for moving to commercial service 12
  • 13. Some motivations for moving to commercial service 13 ● Cherenkov Telescope Array uses OAIS as the basis for their archive ● Pressure to focus on Layer 3 and 4 services ● Cost evolution and hyper-scaling ● Limited physical space on campus ● Disaster recovery ● Uncertainties on availability of tape equipment for on-premises installation But there is also a list of motivations NOT to move to a commercial service !!!!! ● Main item: Distrust of commercial services
  • 14. Commercial safe-keeping deployment scenario Daily: 500-1000 2 GB files RedIRIS 10 Gbps λ Commercial Provider RedIRIS+Géant 2-week notice Full recall complete in 30 days Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 14
  • 15. Commercial safe-keeping deployment scenario Daily: 500-1000 2 GB files Commercial Provider 2-week notice Full recall complete in 30 days Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 15 RedIRIS+Géant
  • 16. Commercial safe-keeping deployment scenario Daily: 500-1000 2 GB files RedIRIS 10 Gbps λ Commercial Provider Scrubbing: Every file re-read and checksummed once per year without buyer intervention RedIRIS+Géant RedIRIS+Géant 2-week notice Full recall complete in 30 days 2-week notice Heartbeat: Random 1% sample recalled monthly Future: Trust through OAIS/ISO? Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 16
  • 17. Mixed file safe-keeping workflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 17
  • 18. Mixed file safe-keeping workflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Additional metadata tag: version 18
  • 19. Mixed file safe-keeping workflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Challenges: put/get directly by reprocessing workflow @PIC 150k files input gives 450k files output to be stored Cost compatible with maintenance costs: < 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess Additional metadata tag: version 19
  • 20. + In-archive processing scenario workflow/scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Additional metadata tag: version Commercial provider in-archive processing 20
  • 21. + In-archive processing scenario workflow/scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Challenges: put/get directly by reprocessing workflow @PIC 150k files input gives 450k files output to be stored Cost compatible with maintenance costs: < 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess + competitive price for CPU with appropriate I/O Additional metadata tag: version Commercial provider in-archive processing 21
  • 22. Data distribution to Instrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata produced in origin Extensible metadata generated by experts 22
  • 23. Data distribution to Instrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata query Download subset of files XXX AAI (AD@Azure) MAGIC AAI (ldap@PIC) Optional addtl. methods mount+file system emulation Selective sync-and-share Metadata produced in origin Worldwide users Extensible metadata generated by experts 23
  • 24. Data distribution to Instrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata query Download subset of files XXX AAI (AD@Azure) MAGIC AAI (ldap@PIC) Optional addtl. methods mount+file system emulation Selective sync-and-share Metadata produced in origin Worldwide users Extensible metadata generated by experts 24 Challenges: Interface to multiple, existing, external AA systems + create ACL-type environment Data must be “online” - “Raw” data component could be excluded Provide extensible metadata handling system and drive file access by metadata queries Cost compatible with maintenance costs: < 60 k€ for 5 years of v1 service
  • 25. What in-archive data analysis and presentation may look like 25
  • 26. Extension to External Users ● From other scientific projects ○ Add additional AAI providers and use group management tools (Co-manage or Grouper) ○ If too many AAI providers, still use group management tools and ■ Move to edugain or ■ Move to ORCID ● “Open” data ○ Open ≠ uncontrolled ○ Need to know who accessed data ■ Citation control ■ Statistical information to demonstrate value ● Most likely both will need In-archive analysis / viewing tools Work in progress 26
  • 27. 1-year MAGIC Telescope as example. Others... ● Cherenkov Telescope Array will have two sites (Northern and Southern hemispheres) with 10 large telescopes and 100s of smaller ones ● Studies of the expansion of the Universe with Optical Telescopes which produce data from one-week campaigns to 365 days/year ● Supercomputer simulation production can look a lot like an instrument that produces a lot of data during a short time ● High volume applications such as High Luminosity LHC ● etc… 27
  • 28. Bottom up description of scenarios and workflows ● Actors: ○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain) ○ Private Data Center (PIC near Barcelona, Spain) ○ Instrument Analysts (closed group of users) ○ External users (scientists not members of MAGIC, public access) ● Scenarios: ○ Large file safe-keeping ○ + Mixed-size file safe-keeping ○ + In-archive data processing ○ + Data distribution to Instrument Analysts ○ + Data utilization by External users 28
  • 29. 29 Helping to turn Information into Knowledge