PIC Deployment Scenarios
V. Acín, J. Casals, M. Delfino, J. Delgado
ARCHIVER Open Market Consultation event
London Stansted, May 23rd
2019
About PIC (scientific support)
● Port d’Informació Científica (PIC, the Scientific Information Harbour) is
Spain’s largest scientific data centre for Particle Physics and Astrophysics
● CERN’s Large Hadron Collider (LHC):
One of the 12 first-level (Tier-1) data processing centres.
● Imaging Atmospheric Cherenkov Telescopes: Custodial data centre for the
MAGIC Telescopes and the first Large Scale Telescope prototype for the
next-generation Cherenkov Telescope Array (CTA, an ESFRI landmark).
● Observational Cosmology: One of the 9 Science Data Centres for ESA’s
Euclid mission, custodial data centre for huge simulations of the Universe
expansion and the Physics of the Accelerating Universe (PAU) survey.
● Innovative “Big Data” platform for massive analysis of big datasets.
● 20 people, 50% engineers and 50% Ph.D.s (Comp. Sci, Physics, Chemistry)
2
About PIC (technical)
● 8500 x86 cores (mostly bare-metal, scheduled through HTCondor)
● 11 PiB disk (dCache) + 25 PiB tape (Enstore) with active HSM
● Overprovisioned 10Gbps LAN (moving to 100 Gbps next year)
● 2x10 Gbps WAN, optical paths to CERN and ORM
● Hadoop cluster for data analysis
○ 16 nodes, 2 TiB RAM, 192 TiB HDD
○ Prototyping with NVMe and NVDIMM
● GPU: proof of concept in training neural nets
● Heavily automated installation: puppet, Icinga, grafana, etc.
● Compact, highly energy efficient installation
3
PIC’s liquid immersion cooling installation
4
Bottom up description of scenarios and workflows
● Actors:
○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain)
○ Private Data Center (PIC near Barcelona, Spain)
○ Instrument Analysts (closed group of users)
○ External users (scientists not members of MAGIC, public access)
● Scenarios:
○ Large file safe-keeping
○ + Mixed-size file safe-keeping
○ + In-archive data processing
○ + Data distribution to Instrument Analysts
○ + Data utilization by External users
5
Large file safe-keeping workflow
6
Example:
MAGIC Telescopes
located at
Observatorio del
Roque de los
Muchachos, La
Palma, Canary
Islands
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
7
Original service used shared 1 Gbps
general connection ORM-RedIRIS
10 Gbps λ implemented to ensure
compliance with the 8-hour window
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
Data characteristics:
Inmutable (read-only)
Binary private format
Single bit error in a file renders it useless
Two metadata items: filename, checksum
8
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
Data characteristics:
Inmutable (read-only)
Binary private format
Single bit error in a file renders it useless
Two metadata items: filename, checksum
Data stewardship:
Year 1: Data accumulates: 150k 2 GB files = 300 TB
Years 1-6: Data are bit-preserved
Random time(s) in years 2-6:
Full 300 TB recalled to disk and reprocessed
9
Large file safe-keeping workflow
10:00 Daily data available
18:00 Daily data safe off-telescope
500-1000 files @ 2 GB/file = 1-2 TB
RedIRIS 10 Gbps λ
Data characteristics:
Inmutable (read-only)
Binary private format
Single bit error in a file renders it useless
Two metadata items: filename, checksum
Data stewardship:
Year 1: Data accumulates: 150k 2 GB files = 300 TB
Years 1-6: Data are bit-preserved
Random time(s) in years 2-6:
Full 300 TB recalled to disk and reprocessed
Challenges:
365 days/year x 8 hour time window to complete data transfer, storage and verification
Non-predictable full recall must be accomplished in 30 days or less. (Mitigation: Advance notification)
Cost expectation: compatible with telescope maintenance costs → < 30k euros per 300 TB stored 5 yrs
10
Some motivations for moving to commercial service
11
Some motivations for moving to commercial service
12
Some motivations for moving to commercial service
13
● Cherenkov Telescope Array uses OAIS as the basis for their archive
● Pressure to focus on Layer 3 and 4 services
● Cost evolution and hyper-scaling
● Limited physical space on campus
● Disaster recovery
● Uncertainties on availability of tape equipment for on-premises installation
But there is also a list of motivations NOT to move to a commercial service !!!!!
● Main item: Distrust of commercial services
Commercial safe-keeping deployment scenario
Daily: 500-1000 2 GB files
RedIRIS 10 Gbps λ
Commercial
Provider
RedIRIS+Géant
2-week notice
Full recall complete in 30 days
Interface:
put/get with full error recovery + status check
CLI + scriptable + programmable
Secure with one expert user.
Any reasonable AA method compatible with interfacing requirements.
300 TB
kept for
5 years
14
Commercial safe-keeping deployment scenario
Daily: 500-1000 2 GB files
Commercial
Provider
2-week notice
Full recall complete in 30 days
Interface:
put/get with full error recovery + status check
CLI + scriptable + programmable
Secure with one expert user.
Any reasonable AA method compatible with interfacing requirements.
300 TB
kept for
5 years
15
RedIRIS+Géant
Commercial safe-keeping deployment scenario
Daily: 500-1000 2 GB files
RedIRIS 10 Gbps λ
Commercial
Provider
Scrubbing: Every file
re-read and checksummed
once per year without buyer
intervention
RedIRIS+Géant
RedIRIS+Géant
2-week notice
Full recall complete in 30 days
2-week notice
Heartbeat: Random
1% sample recalled monthly
Future: Trust through OAIS/ISO?
Interface:
put/get with full error recovery + status check
CLI + scriptable + programmable
Secure with one expert user. Any reasonable AA
method compatible with interfacing requirements.
300 TB
kept for
5 years
16
Mixed file safe-keeping workflow and scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
17
Mixed file safe-keeping workflow and scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Additional metadata tag: version
18
Mixed file safe-keeping workflow and scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Challenges:
put/get directly by reprocessing workflow @PIC
150k files input gives 450k files output to be stored
Cost compatible with maintenance costs:
< 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess
Additional metadata tag: version
19
+ In-archive processing scenario workflow/scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Additional metadata tag: version
Commercial provider
in-archive processing
20
+ In-archive processing scenario workflow/scenario
Commercial Provider
Scrubbing
RedIRIS+Géant
Full recall complete in 30 days
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
Daily: 500-1000 2 GB files
+ 1500-3000 <200 MB files
2-week notice
50 TB reprocessed output in 30 days
Challenges:
put/get directly by reprocessing workflow @PIC
150k files input gives 450k files output to be stored
Cost compatible with maintenance costs:
< 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess
+ competitive price for CPU with appropriate I/O
Additional metadata tag: version
Commercial provider
in-archive processing
21
Data distribution to Instrument Analysts
Commercial Provider
Scrubbing
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
+ metadata handling
Metadata produced in origin
Extensible
metadata
generated
by experts
22
Data distribution to Instrument Analysts
Commercial Provider
Scrubbing
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
+ metadata handling
Metadata query
Download subset of files
XXX AAI
(AD@Azure)
MAGIC AAI
(ldap@PIC)
Optional addtl. methods
mount+file system emulation
Selective sync-and-share
Metadata produced in origin
Worldwide
users
Extensible
metadata
generated
by experts
23
Data distribution to Instrument Analysts
Commercial Provider
Scrubbing
Heartbeat
300 TB kept for 5 years
+ < 100 TB a posteriori
+ metadata handling
Metadata query
Download subset of files
XXX AAI
(AD@Azure)
MAGIC AAI
(ldap@PIC)
Optional addtl. methods
mount+file system emulation
Selective sync-and-share
Metadata produced in origin
Worldwide
users
Extensible
metadata
generated
by experts
24
Challenges:
Interface to multiple, existing, external AA systems + create ACL-type environment
Data must be “online” - “Raw” data component could be excluded
Provide extensible metadata handling system and drive file access by metadata queries
Cost compatible with maintenance costs: < 60 k€ for 5 years of v1 service
What in-archive data analysis and presentation may look like
25
Extension to External Users
● From other scientific projects
○ Add additional AAI providers and use group management tools (Co-manage or Grouper)
○ If too many AAI providers, still use group management tools and
■ Move to edugain or
■ Move to ORCID
● “Open” data
○ Open ≠ uncontrolled
○ Need to know who accessed data
■ Citation control
■ Statistical information to demonstrate value
● Most likely both will need In-archive analysis / viewing tools
Work in progress
26
1-year MAGIC Telescope as example. Others...
● Cherenkov Telescope Array will have two sites (Northern and Southern
hemispheres) with 10 large telescopes and 100s of smaller ones
● Studies of the expansion of the Universe with Optical Telescopes which
produce data from one-week campaigns to 365 days/year
● Supercomputer simulation production can look a lot like an instrument that
produces a lot of data during a short time
● High volume applications such as High Luminosity LHC
● etc…
27
Bottom up description of scenarios and workflows
● Actors:
○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain)
○ Private Data Center (PIC near Barcelona, Spain)
○ Instrument Analysts (closed group of users)
○ External users (scientists not members of MAGIC, public access)
● Scenarios:
○ Large file safe-keeping
○ + Mixed-size file safe-keeping
○ + In-archive data processing
○ + Data distribution to Instrument Analysts
○ + Data utilization by External users
28
29
Helping to turn Information into Knowledge

Pic archiver stansted

  • 1.
    PIC Deployment Scenarios V.Acín, J. Casals, M. Delfino, J. Delgado ARCHIVER Open Market Consultation event London Stansted, May 23rd 2019
  • 2.
    About PIC (scientificsupport) ● Port d’Informació Científica (PIC, the Scientific Information Harbour) is Spain’s largest scientific data centre for Particle Physics and Astrophysics ● CERN’s Large Hadron Collider (LHC): One of the 12 first-level (Tier-1) data processing centres. ● Imaging Atmospheric Cherenkov Telescopes: Custodial data centre for the MAGIC Telescopes and the first Large Scale Telescope prototype for the next-generation Cherenkov Telescope Array (CTA, an ESFRI landmark). ● Observational Cosmology: One of the 9 Science Data Centres for ESA’s Euclid mission, custodial data centre for huge simulations of the Universe expansion and the Physics of the Accelerating Universe (PAU) survey. ● Innovative “Big Data” platform for massive analysis of big datasets. ● 20 people, 50% engineers and 50% Ph.D.s (Comp. Sci, Physics, Chemistry) 2
  • 3.
    About PIC (technical) ●8500 x86 cores (mostly bare-metal, scheduled through HTCondor) ● 11 PiB disk (dCache) + 25 PiB tape (Enstore) with active HSM ● Overprovisioned 10Gbps LAN (moving to 100 Gbps next year) ● 2x10 Gbps WAN, optical paths to CERN and ORM ● Hadoop cluster for data analysis ○ 16 nodes, 2 TiB RAM, 192 TiB HDD ○ Prototyping with NVMe and NVDIMM ● GPU: proof of concept in training neural nets ● Heavily automated installation: puppet, Icinga, grafana, etc. ● Compact, highly energy efficient installation 3
  • 4.
    PIC’s liquid immersioncooling installation 4
  • 5.
    Bottom up descriptionof scenarios and workflows ● Actors: ○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain) ○ Private Data Center (PIC near Barcelona, Spain) ○ Instrument Analysts (closed group of users) ○ External users (scientists not members of MAGIC, public access) ● Scenarios: ○ Large file safe-keeping ○ + Mixed-size file safe-keeping ○ + In-archive data processing ○ + Data distribution to Instrument Analysts ○ + Data utilization by External users 5
  • 6.
    Large file safe-keepingworkflow 6 Example: MAGIC Telescopes located at Observatorio del Roque de los Muchachos, La Palma, Canary Islands
  • 7.
    Large file safe-keepingworkflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ 7 Original service used shared 1 Gbps general connection ORM-RedIRIS 10 Gbps λ implemented to ensure compliance with the 8-hour window
  • 8.
    Large file safe-keepingworkflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum 8
  • 9.
    Large file safe-keepingworkflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum Data stewardship: Year 1: Data accumulates: 150k 2 GB files = 300 TB Years 1-6: Data are bit-preserved Random time(s) in years 2-6: Full 300 TB recalled to disk and reprocessed 9
  • 10.
    Large file safe-keepingworkflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum Data stewardship: Year 1: Data accumulates: 150k 2 GB files = 300 TB Years 1-6: Data are bit-preserved Random time(s) in years 2-6: Full 300 TB recalled to disk and reprocessed Challenges: 365 days/year x 8 hour time window to complete data transfer, storage and verification Non-predictable full recall must be accomplished in 30 days or less. (Mitigation: Advance notification) Cost expectation: compatible with telescope maintenance costs → < 30k euros per 300 TB stored 5 yrs 10
  • 11.
    Some motivations formoving to commercial service 11
  • 12.
    Some motivations formoving to commercial service 12
  • 13.
    Some motivations formoving to commercial service 13 ● Cherenkov Telescope Array uses OAIS as the basis for their archive ● Pressure to focus on Layer 3 and 4 services ● Cost evolution and hyper-scaling ● Limited physical space on campus ● Disaster recovery ● Uncertainties on availability of tape equipment for on-premises installation But there is also a list of motivations NOT to move to a commercial service !!!!! ● Main item: Distrust of commercial services
  • 14.
    Commercial safe-keeping deploymentscenario Daily: 500-1000 2 GB files RedIRIS 10 Gbps λ Commercial Provider RedIRIS+Géant 2-week notice Full recall complete in 30 days Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 14
  • 15.
    Commercial safe-keeping deploymentscenario Daily: 500-1000 2 GB files Commercial Provider 2-week notice Full recall complete in 30 days Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 15 RedIRIS+Géant
  • 16.
    Commercial safe-keeping deploymentscenario Daily: 500-1000 2 GB files RedIRIS 10 Gbps λ Commercial Provider Scrubbing: Every file re-read and checksummed once per year without buyer intervention RedIRIS+Géant RedIRIS+Géant 2-week notice Full recall complete in 30 days 2-week notice Heartbeat: Random 1% sample recalled monthly Future: Trust through OAIS/ISO? Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 16
  • 17.
    Mixed file safe-keepingworkflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 17
  • 18.
    Mixed file safe-keepingworkflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Additional metadata tag: version 18
  • 19.
    Mixed file safe-keepingworkflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Challenges: put/get directly by reprocessing workflow @PIC 150k files input gives 450k files output to be stored Cost compatible with maintenance costs: < 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess Additional metadata tag: version 19
  • 20.
    + In-archive processingscenario workflow/scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Additional metadata tag: version Commercial provider in-archive processing 20
  • 21.
    + In-archive processingscenario workflow/scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Challenges: put/get directly by reprocessing workflow @PIC 150k files input gives 450k files output to be stored Cost compatible with maintenance costs: < 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess + competitive price for CPU with appropriate I/O Additional metadata tag: version Commercial provider in-archive processing 21
  • 22.
    Data distribution toInstrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata produced in origin Extensible metadata generated by experts 22
  • 23.
    Data distribution toInstrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata query Download subset of files XXX AAI (AD@Azure) MAGIC AAI (ldap@PIC) Optional addtl. methods mount+file system emulation Selective sync-and-share Metadata produced in origin Worldwide users Extensible metadata generated by experts 23
  • 24.
    Data distribution toInstrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata query Download subset of files XXX AAI (AD@Azure) MAGIC AAI (ldap@PIC) Optional addtl. methods mount+file system emulation Selective sync-and-share Metadata produced in origin Worldwide users Extensible metadata generated by experts 24 Challenges: Interface to multiple, existing, external AA systems + create ACL-type environment Data must be “online” - “Raw” data component could be excluded Provide extensible metadata handling system and drive file access by metadata queries Cost compatible with maintenance costs: < 60 k€ for 5 years of v1 service
  • 25.
    What in-archive dataanalysis and presentation may look like 25
  • 26.
    Extension to ExternalUsers ● From other scientific projects ○ Add additional AAI providers and use group management tools (Co-manage or Grouper) ○ If too many AAI providers, still use group management tools and ■ Move to edugain or ■ Move to ORCID ● “Open” data ○ Open ≠ uncontrolled ○ Need to know who accessed data ■ Citation control ■ Statistical information to demonstrate value ● Most likely both will need In-archive analysis / viewing tools Work in progress 26
  • 27.
    1-year MAGIC Telescopeas example. Others... ● Cherenkov Telescope Array will have two sites (Northern and Southern hemispheres) with 10 large telescopes and 100s of smaller ones ● Studies of the expansion of the Universe with Optical Telescopes which produce data from one-week campaigns to 365 days/year ● Supercomputer simulation production can look a lot like an instrument that produces a lot of data during a short time ● High volume applications such as High Luminosity LHC ● etc… 27
  • 28.
    Bottom up descriptionof scenarios and workflows ● Actors: ○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain) ○ Private Data Center (PIC near Barcelona, Spain) ○ Instrument Analysts (closed group of users) ○ External users (scientists not members of MAGIC, public access) ● Scenarios: ○ Large file safe-keeping ○ + Mixed-size file safe-keeping ○ + In-archive data processing ○ + Data distribution to Instrument Analysts ○ + Data utilization by External users 28
  • 29.
    29 Helping to turnInformation into Knowledge