Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pic archiver stansted

165 views

Published on

Pic archiver stansted

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Pic archiver stansted

  1. 1. PIC Deployment Scenarios V. Acín, J. Casals, M. Delfino, J. Delgado ARCHIVER Open Market Consultation event London Stansted, May 23rd 2019
  2. 2. About PIC (scientific support) ● Port d’Informació Científica (PIC, the Scientific Information Harbour) is Spain’s largest scientific data centre for Particle Physics and Astrophysics ● CERN’s Large Hadron Collider (LHC): One of the 12 first-level (Tier-1) data processing centres. ● Imaging Atmospheric Cherenkov Telescopes: Custodial data centre for the MAGIC Telescopes and the first Large Scale Telescope prototype for the next-generation Cherenkov Telescope Array (CTA, an ESFRI landmark). ● Observational Cosmology: One of the 9 Science Data Centres for ESA’s Euclid mission, custodial data centre for huge simulations of the Universe expansion and the Physics of the Accelerating Universe (PAU) survey. ● Innovative “Big Data” platform for massive analysis of big datasets. ● 20 people, 50% engineers and 50% Ph.D.s (Comp. Sci, Physics, Chemistry) 2
  3. 3. About PIC (technical) ● 8500 x86 cores (mostly bare-metal, scheduled through HTCondor) ● 11 PiB disk (dCache) + 25 PiB tape (Enstore) with active HSM ● Overprovisioned 10Gbps LAN (moving to 100 Gbps next year) ● 2x10 Gbps WAN, optical paths to CERN and ORM ● Hadoop cluster for data analysis ○ 16 nodes, 2 TiB RAM, 192 TiB HDD ○ Prototyping with NVMe and NVDIMM ● GPU: proof of concept in training neural nets ● Heavily automated installation: puppet, Icinga, grafana, etc. ● Compact, highly energy efficient installation 3
  4. 4. PIC’s liquid immersion cooling installation 4
  5. 5. Bottom up description of scenarios and workflows ● Actors: ○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain) ○ Private Data Center (PIC near Barcelona, Spain) ○ Instrument Analysts (closed group of users) ○ External users (scientists not members of MAGIC, public access) ● Scenarios: ○ Large file safe-keeping ○ + Mixed-size file safe-keeping ○ + In-archive data processing ○ + Data distribution to Instrument Analysts ○ + Data utilization by External users 5
  6. 6. Large file safe-keeping workflow 6 Example: MAGIC Telescopes located at Observatorio del Roque de los Muchachos, La Palma, Canary Islands
  7. 7. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ 7 Original service used shared 1 Gbps general connection ORM-RedIRIS 10 Gbps λ implemented to ensure compliance with the 8-hour window
  8. 8. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum 8
  9. 9. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum Data stewardship: Year 1: Data accumulates: 150k 2 GB files = 300 TB Years 1-6: Data are bit-preserved Random time(s) in years 2-6: Full 300 TB recalled to disk and reprocessed 9
  10. 10. Large file safe-keeping workflow 10:00 Daily data available 18:00 Daily data safe off-telescope 500-1000 files @ 2 GB/file = 1-2 TB RedIRIS 10 Gbps λ Data characteristics: Inmutable (read-only) Binary private format Single bit error in a file renders it useless Two metadata items: filename, checksum Data stewardship: Year 1: Data accumulates: 150k 2 GB files = 300 TB Years 1-6: Data are bit-preserved Random time(s) in years 2-6: Full 300 TB recalled to disk and reprocessed Challenges: 365 days/year x 8 hour time window to complete data transfer, storage and verification Non-predictable full recall must be accomplished in 30 days or less. (Mitigation: Advance notification) Cost expectation: compatible with telescope maintenance costs → < 30k euros per 300 TB stored 5 yrs 10
  11. 11. Some motivations for moving to commercial service 11
  12. 12. Some motivations for moving to commercial service 12
  13. 13. Some motivations for moving to commercial service 13 ● Cherenkov Telescope Array uses OAIS as the basis for their archive ● Pressure to focus on Layer 3 and 4 services ● Cost evolution and hyper-scaling ● Limited physical space on campus ● Disaster recovery ● Uncertainties on availability of tape equipment for on-premises installation But there is also a list of motivations NOT to move to a commercial service !!!!! ● Main item: Distrust of commercial services
  14. 14. Commercial safe-keeping deployment scenario Daily: 500-1000 2 GB files RedIRIS 10 Gbps λ Commercial Provider RedIRIS+Géant 2-week notice Full recall complete in 30 days Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 14
  15. 15. Commercial safe-keeping deployment scenario Daily: 500-1000 2 GB files Commercial Provider 2-week notice Full recall complete in 30 days Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 15 RedIRIS+Géant
  16. 16. Commercial safe-keeping deployment scenario Daily: 500-1000 2 GB files RedIRIS 10 Gbps λ Commercial Provider Scrubbing: Every file re-read and checksummed once per year without buyer intervention RedIRIS+Géant RedIRIS+Géant 2-week notice Full recall complete in 30 days 2-week notice Heartbeat: Random 1% sample recalled monthly Future: Trust through OAIS/ISO? Interface: put/get with full error recovery + status check CLI + scriptable + programmable Secure with one expert user. Any reasonable AA method compatible with interfacing requirements. 300 TB kept for 5 years 16
  17. 17. Mixed file safe-keeping workflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 17
  18. 18. Mixed file safe-keeping workflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Additional metadata tag: version 18
  19. 19. Mixed file safe-keeping workflow and scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Challenges: put/get directly by reprocessing workflow @PIC 150k files input gives 450k files output to be stored Cost compatible with maintenance costs: < 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess Additional metadata tag: version 19
  20. 20. + In-archive processing scenario workflow/scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Additional metadata tag: version Commercial provider in-archive processing 20
  21. 21. + In-archive processing scenario workflow/scenario Commercial Provider Scrubbing RedIRIS+Géant Full recall complete in 30 days Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori Daily: 500-1000 2 GB files + 1500-3000 <200 MB files 2-week notice 50 TB reprocessed output in 30 days Challenges: put/get directly by reprocessing workflow @PIC 150k files input gives 450k files output to be stored Cost compatible with maintenance costs: < 40k€ per 300 TB stored 5 yrs (v1)+7.5k€ per reprocess + competitive price for CPU with appropriate I/O Additional metadata tag: version Commercial provider in-archive processing 21
  22. 22. Data distribution to Instrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata produced in origin Extensible metadata generated by experts 22
  23. 23. Data distribution to Instrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata query Download subset of files XXX AAI (AD@Azure) MAGIC AAI (ldap@PIC) Optional addtl. methods mount+file system emulation Selective sync-and-share Metadata produced in origin Worldwide users Extensible metadata generated by experts 23
  24. 24. Data distribution to Instrument Analysts Commercial Provider Scrubbing Heartbeat 300 TB kept for 5 years + < 100 TB a posteriori + metadata handling Metadata query Download subset of files XXX AAI (AD@Azure) MAGIC AAI (ldap@PIC) Optional addtl. methods mount+file system emulation Selective sync-and-share Metadata produced in origin Worldwide users Extensible metadata generated by experts 24 Challenges: Interface to multiple, existing, external AA systems + create ACL-type environment Data must be “online” - “Raw” data component could be excluded Provide extensible metadata handling system and drive file access by metadata queries Cost compatible with maintenance costs: < 60 k€ for 5 years of v1 service
  25. 25. What in-archive data analysis and presentation may look like 25
  26. 26. Extension to External Users ● From other scientific projects ○ Add additional AAI providers and use group management tools (Co-manage or Grouper) ○ If too many AAI providers, still use group management tools and ■ Move to edugain or ■ Move to ORCID ● “Open” data ○ Open ≠ uncontrolled ○ Need to know who accessed data ■ Citation control ■ Statistical information to demonstrate value ● Most likely both will need In-archive analysis / viewing tools Work in progress 26
  27. 27. 1-year MAGIC Telescope as example. Others... ● Cherenkov Telescope Array will have two sites (Northern and Southern hemispheres) with 10 large telescopes and 100s of smaller ones ● Studies of the expansion of the Universe with Optical Telescopes which produce data from one-week campaigns to 365 days/year ● Supercomputer simulation production can look a lot like an instrument that produces a lot of data during a short time ● High volume applications such as High Luminosity LHC ● etc… 27
  28. 28. Bottom up description of scenarios and workflows ● Actors: ○ Instrument (example used will be MAGIC Telescope in La Palma, Canary Islands, Spain) ○ Private Data Center (PIC near Barcelona, Spain) ○ Instrument Analysts (closed group of users) ○ External users (scientists not members of MAGIC, public access) ● Scenarios: ○ Large file safe-keeping ○ + Mixed-size file safe-keeping ○ + In-archive data processing ○ + Data distribution to Instrument Analysts ○ + Data utilization by External users 28
  29. 29. 29 Helping to turn Information into Knowledge

×