Your SlideShare is downloading. ×
0
Petabye Scale Data Challenge - Worldwide LHC Computing Grid                 ASGC/Jason Shih            Computex, Jun 2nd, ...
Outline  Objectives & Milestones  WLCG experiment and ASGC Tier-1 Center  Petabyte Scale Challenge  Storage Management Sys...
Objectives Building sustainable research and collaborationinfrastructure Support research by e-Science, on data intensives...
ASGC Milestone  Operational from the deployment of LCG0 since 2002  ASGC CA establish on 2005 (IGTF in same year)  Tier-1 ...
LHC First Beam – Computing at the Petascale  General Purpose, pp, heavy ionsLHCb: B-physics, CP Violation                 ...
Size of LHC Detector                             ATLAS        Bld. 40                       CMS
Standard Cosmology                   Good model from 0.01 sec                   after Big Bang                            ...
WLCG Timeline First Beam on LHC, Sep.10, 2008 Severe Incident after 3woperation (3.5TeV)
Max CERN/T1-ASGC Point2Point                                                                   Inbound : 9.3 Gbps    ASGC ...
Collaborating e-Infrastructures                                                          TWGRID                           ...
WLCG Computing Model   - The Tier Structure Tier-0 (CERN)   Data recording   Initial data reconstruction   Data distributi...
Enabling Grids for E-sciencE Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fus...
Why Petabyte? Challenges Why Petabyte?   Experiment Computing Model   Comparing with conventional data management Challeng...
Tier Model and Data Management Components
WLCG Experiment Computing Model
ATLAS T1 Data Flow                                                            RAW                                       RA...
WLCG Tier-1   - Defined Minimum Levels of Services. Define response time refer to max delay before taking action. Mean tim...
WLCG MoU & ASGC Resource Level   - Pledged Resources and Projection                  Year     CPU (HEP2k6)     Disk (PB)  ...
Data Management SystemCASTOR V1  CERN Advanced STORage  Satisfactorily serving 10s of 1K Req/day/TB of Disk Cache  Limitat...
CASTOR Configurations   - Current InfrastructureShared cores services   Serving: Atlas and CMS   Services:     Stager, NS,...
CASTOR Configurations (cont’)   - Disk Cache Disk pools & servers   Performance (IOPS)    With 0.5kB IO size: 76.4k and 54...
at  la       sG       RO              Total Capacity (TB)     bi UP        om D                         0                 ...
Distribution of Free Capacity     - Per Disk Servers vs. per Pool                   Standby                 dteamD0T0     ...
Storage Server Generation     - Drive vs. Total Capacity Total Capacity of Storage                             800        ...
CASTOR Configurations (cont’)   - Core Service OverviewServices      OS Level     Release          Remark Type  Core      ...
CASTOR Configurations (cont’)     - CMS Disk Cache: Current Resource Level  Space Token                    Capacity/    Di...
CASTOR Configurations (cont’)    - Atlas Disk Cache: Current Resource Level Space Token       Cap/JobLimit   DiskServers T...
IDC Collocation Facility install complete at Mar 27th Tape system delay after Apr 9th   Realignment   RMA for faulty parts
Storage Farm ~ 110 raid subsystem deployed since 2003. Supporting both Tier1 and 2 storage fabric DAS connection to fronte...
CASTOR Configurations (cont’)   - Tape Pool                  Capacity      Drive      LTO3/4   Tape Pool                 (...
MSS Monitoring ServicesStd. Nagios Probes  NRPE + customized plugins  SMS to OSE/SM for all types of critical alarmsAvaila...
MSS Tape System    - Expansion/Upgrade PlanningBefore incident:   LTO3 * 8 + LTO4 * 4   720TB with LTO3   530TB with LTO4M...
Expansion Planning 2008   0.5PB expansion of Tape system in Q2   Meet MOU target mid of Nov.   1.3MSI2k per rack base on r...
Computing/Storage System Infrastructure                                                                                   ...
Throughput of WLCG Experiments Throughput defined as Job Eff. x # Jobs running Characteristic of 4 LHC Exp. depicting in-e...
Reliability From Different View Perspective
Summary Deploy highly-scalable DM system and performance drivenstorage infrastructure   Eliminate possible complexity of S...
Upcoming SlideShare
Loading in...5
×

Petabye scale data challenge

558

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
558
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Petabye scale data challenge"

  1. 1. Petabye Scale Data Challenge - Worldwide LHC Computing Grid ASGC/Jason Shih Computex, Jun 2nd, 2010
  2. 2. Outline Objectives & Milestones WLCG experiment and ASGC Tier-1 Center Petabyte Scale Challenge Storage Management System System Architecture, Configuration and Performance
  3. 3. Objectives Building sustainable research and collaborationinfrastructure Support research by e-Science, on data intensivesciences and applications require cross disciplinarydistributed collaboration
  4. 4. ASGC Milestone Operational from the deployment of LCG0 since 2002 ASGC CA establish on 2005 (IGTF in same year) Tier-1 Center responsibility start from 2005 Federated Taiwan Tier-2 center (Taiwan Analysis Facility, TAF)is also collocated in ASGC Rep. of EGEE e-Science Asia Federation while joining EGEEfrom 2004 Providing Asia Pacific Regional Operation Center (APROC)services to regional-wide WLCG/EGEE productioninfrastructure from 2005 Initiate Avian Flu Drug Discovery Project and collaborate withEGEE in 2006 Start of EUAsiaGrid Project from April 2008
  5. 5. LHC First Beam – Computing at the Petascale General Purpose, pp, heavy ionsLHCb: B-physics, CP Violation ALICE: Heavy ions, pp CMS: General Purpose, pp, heavy ions ATLAS: General Purpose, pp, heavy ions
  6. 6. Size of LHC Detector ATLAS Bld. 40 CMS
  7. 7. Standard Cosmology Good model from 0.01 sec after Big Bang Energy, Density, Temperature Supported by considerable observational evidence Time Elementary Particle Physics From the Standard Model into the unknown: towards energies of 1 TeV and beyond: the Terascale Towards Quantum Gravity From the unknown into the unknown... http://www.damtp.cam.ac.uk/user/gr/public/bb_history.html UNESCO Information 7Preservation debate, April 2007 - Jamie Shiers@cern ch
  8. 8. WLCG Timeline First Beam on LHC, Sep.10, 2008 Severe Incident after 3woperation (3.5TeV)
  9. 9. Max CERN/T1-ASGC Point2Point Inbound : 9.3 Gbps ASGC - Introduction 1. Most Reliable T1: 98.83% 2. Very Highly Performing and most Stable Site in CCRC08 Asia Pacific Regional A Worldwide Grid Operation Center Infrastructure >250 sites, 48 countries >68,000 CPUs, >25 PetaBytes >10,000 users, >200 VOs >150,000 jobs/day Best Demo Award of EGEE’07Grid Application Platform Avian Flu Drug Discovery Lightweight Problem Solving Large Hadron Collider (LHC) Framework 21
  10. 10. Collaborating e-Infrastructures TWGRID EUAsiaGrid Potential for linking ~80 countries“Production” =Reliable, sustainable, with commitments to quality of service
  11. 11. WLCG Computing Model - The Tier Structure Tier-0 (CERN) Data recording Initial data reconstruction Data distribution Tier-1 (11 countries) Permanent storage Re-processing Analysis Tier-2 (~130 countries) Simulation End-user analysis
  12. 12. Enabling Grids for E-sciencE Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences …EGEE-II INFSO-RI-031688 EGEE07, Budapest, 1-5 October 2007 4
  13. 13. Why Petabyte? Challenges Why Petabyte? Experiment Computing Model Comparing with conventional data management Challenges Performance: LAN and WAN activities Sufficient B/W between CPU Farm Eliminate Uplink Bottleneck (Switch Tires) Fast responding of Critical Events Fabric Infrastructure & Service Level Agreement Scalability and Manageability Robust DB engine (Oracle RAC) KB and Adequate Administration (Training)
  14. 14. Tier Model and Data Management Components
  15. 15. WLCG Experiment Computing Model
  16. 16. ATLAS T1 Data Flow RAW RAW Tape ESD (2x) AODm (10x) ESD2 RAW AODm2 1 Hz 1.6 GB/file 0.02 Hz 0.044 Hz 85K f/day 1.7K f/day 3.74K f/day 720 MB/s 32 MB/s 44 MB/s 2.7 TB/day 3.66 TB/day Tier-0 AODm1 AODm2 Disk 500 MB/file 500 MB/file 0.04 Hz 0.04 Hz ESD1 AODm1 RAW AOD2 Buffer ESD2 AOD2 AODm2 3.4K f/day 3.4K f/day 20 MB/s 20 MB/s 1.6 TB/day 1.6 TB/day 0.5 GB/file 500 MB/file 1.6 GB/file 10 MB/file 0.5 GB/file 10 MB/file 500 MB/file 0.02 Hz 0.04 Hz 0.02 Hz 0.2 Hz 0.02 Hz 0.2 Hz 0.004 Hz 1.7K f/day 3.4K f/day 1.7K f/day 17K f/day 1.7K f/day 17K f/day 0.34K f/day 10 MB/s 20 MB/s 32 MB/s 2 MB/s 10 MB/s 2 MB/s 2 MB/s 0.8 TB/day 1.6 TB/day 2.7 TB/day 0.16 TB/day 0.8 TB/day 0.16 TB/day 0.16 TB/day Each T1 Tier-2 T1 CPU Plus simulation and ESD2 AODm2 0.5 GB/file 500 MB/file Farm analysis data flow 0.02 Hz 0.036 Hz 1.7K f/day 3.1K f/day ESD2 AODm2 10 MB/s 18 MB/s 0.5 GB/file 500 MB/file ESD2 AODm2 0.8 TB/day 1.44 TB/day 0.02 Hz 0.004 Hz 0.5 GB/file 500 MB/file 1.7K f/day 0.34K f/day 0.02 Hz 0.036 Hz 10 MB/s 2 MB/s 1.7K f/day 3.1K f/day 0.8 TB/day 0.16 TB/day 10 MB/s 18 MB/s Other 0.8 TB/day 1.44 TB/day Other T1 Disk T1 Tier-1s T1 Tier-1s T1 Storage
  17. 17. WLCG Tier-1 - Defined Minimum Levels of Services. Define response time refer to max delay before taking action. Mean time repairing the service is also crucial but coverindirectly through required availability target.
  18. 18. WLCG MoU & ASGC Resource Level - Pledged Resources and Projection Year CPU (HEP2k6) Disk (PB) Tape (PB) End 2009 29.5K 2.6 2.4 Mou 2009 20K 3.0 3.0 Mou 2010 28K 3.5 3.5 6000 CPU MoU 6000 CPU 5000 5000 (Unit KSI2k) Disk TeraByte 4000 Tape 4000 DISK MoU 3000 3000 Tape MoU 2000 2000 1000 1000 0 0 2005 2006 2007 2008 2009 2010
  19. 19. Data Management SystemCASTOR V1 CERN Advanced STORage Satisfactorily serving 10s of 1K Req/day/TB of Disk Cache Limitation: 1M files in cache Tape movement API not flexible CASTOR V2 Centric DB Arch. Scheduling Feature GSI and Kerberos Resource Mgmt Resource Handling
  20. 20. CASTOR Configurations - Current InfrastructureShared cores services Serving: Atlas and CMS Services: Stager, NS, DLF, Repack, and LSF DB cluster Two DB Clusters (SRM and NS) 5 Services (DB) split into two clusters 5 Oracle Instances Total capacity: 0.63PB and 0.7PB for CMS and Atlas resp. Current usage: 63% and 44% for CMS and Atlas
  21. 21. CASTOR Configurations (cont’) - Disk Cache Disk pools & servers Performance (IOPS) With 0.5kB IO size: 76.4k and 54k for read & write resp. Slightly decrease around 9% for both read and write when inc. IO size to 4kB. 80 disk servers (+6 will be online end of 3rdw Oct) Total capacity: 1.67PB (0.3PB allocate dynamically) Current usage: 0.79PB (~58% usage) 14 disk pools (8 for atlas and 3 for CMS, another three for bio, SAM, and dynamic)
  22. 22. at la sG RO Total Capacity (TB) bi UP om D 0 50 100 150 200 250 300 350 400 at ed ISK la D 450 cm sH 1T sW otD 0 at A is la N k sP O rd UT at D la 0 dt sS T1 at ea tagat l a m e la sM D Install Capacity sS C 0T Disk Pool Configuration c T 0 - T1 MSS (CASTOR) at rat AP l a ch E Num of Disk Servers sP D cm rd isk D at sL 1T l a TD 0 cm sM 0T sP CD 1 rd ISK D S t 1T an 0 db Free Capacity y 0 2 4 6 8 10 12 14 16
  23. 23. Distribution of Free Capacity - Per Disk Servers vs. per Pool Standby dteamD0T0 cmsWANOUT cmsPrdD1T0 cmsLTD0T1 biomedD1T0Disk Pool atlasStage atlasScratchDisk atlasPrdD1T0 atlasPrdD0T1 atlasMCTAPE atlasMCDISK atlasHotDisk atlasGROUPDISK 0 50 100 150 200 250 Free Capacity (TB)
  24. 24. Storage Server Generation - Drive vs. Total Capacity Total Capacity of Storage 800 37 700 23 741TB Generation (TB) 600 683TB 500 400 300 6 18 200 238TB 235.5TB 100 0 0 10 20 30 40 Numer of Raid Subsystem
  25. 25. CASTOR Configurations (cont’) - Core Service OverviewServices OS Level Release Remark Type Core SLC 4.7/x86-64 2.1.7-19 Stager/NS/DLF SRM SLC 4.7/x86-64 2.7-18 3 Head NodesDisk Svr. SLC 4.7/x86-64 2.1.7-19 80 Q3 2k9 (20+ in Q4)Tape Svr. SLC 4.7/32 + 64 2.1.8-8 X86-64 OS deployed
  26. 26. CASTOR Configurations (cont’) - CMS Disk Cache: Current Resource Level Space Token Capacity/ Disk TapePool/ Disk Pool Job Limit Servers Capacity cmsLTD0T1 278TB/488 9 * cmsPrdD1T0 284TB/1560 13 cmsWanOut 72TB/220 4* Dep. on tape family.
  27. 27. CASTOR Configurations (cont’) - Atlas Disk Cache: Current Resource Level Space Token Cap/JobLimit DiskServers TapePool/Cap. atlasMCDISK 163TB/790 8 - atlasMCTAPE 38TB/80 2 atlasMCtp/39TB atlasPrdD1T0 278TB/810 15 - atlasPrdtp/105T atlasPrdD0T1 61TB/210 3 BatlasGROUPDISK 19T/40 1 -atlasScratchDisk 28TB/80 1 - atlasHotDisk 2/40TB 2 - Total 950TB/1835 46 -
  28. 28. IDC Collocation Facility install complete at Mar 27th Tape system delay after Apr 9th Realignment RMA for faulty parts
  29. 29. Storage Farm ~ 110 raid subsystem deployed since 2003. Supporting both Tier1 and 2 storage fabric DAS connection to frontend blade server Flexible switching front end server upon performance requirement 4-8G fiber channel connectivity
  30. 30. CASTOR Configurations (cont’) - Tape Pool Capacity Drive LTO3/4 Tape Pool (TB)/Usage Dedication Mixed atlasMCtp 8.98/40% N Y atlasPrdtp 101/65% N YcmsCSA08cruzet 15.6/46% N N cmsCSA08reco 5/0% N N cmsCSAtp 639/99% N Y cmsLTtp 34.4/44% N N dteamTest 3.5/1% N N
  31. 31. MSS Monitoring ServicesStd. Nagios Probes NRPE + customized plugins SMS to OSE/SM for all types of critical alarmsAvailability metricsTape metrics (SLS)Throughput, capacity & scheduler per VO and Diskpool
  32. 32. MSS Tape System - Expansion/Upgrade PlanningBefore incident: LTO3 * 8 + LTO4 * 4 720TB with LTO3 530TB with LTO4May 2009: Two LOT3 drives MES: 6 LTO4 drives end of May Capacity: 1.3PB (old, LTO3,4 mixed) + 0.8PB (LTO4)New S54 model introduce mid of 2009 2K slots with tier model Required: Upgrade ALMS Enhanced gripperMES Q3 2009 18 LTO4 drives HA implementation resume in Q4
  33. 33. Expansion Planning 2008 0.5PB expansion of Tape system in Q2 Meet MOU target mid of Nov. 1.3MSI2k per rack base on recent E5450 processor. 2009 Q1 150 SMP/QC blade servers Raid subsystem consider 2TB per drive 42TB net capacity per chassis and 0.75PB in total 2009 Q3-4 18 LTO4 drives – mid of Oct. 330 Xeon QC (SMP, Intel 5450) blades servers 2nd phase TAPE MES - 5 LTO4 drives + HA 3rd phase TAPE MES – 6 LTO4 drives ETA 0.8PB expansion delivery: mid of Nov
  34. 34. Computing/Storage System Infrastructure Da ta Ce nte ASGC CASTOR2 Disk Farm r – CASTOR2 Tape C3 Servers CASTOR2 Disk servers Ar ch ive Ro om 2 * GE (LX) to 4F M160 (links to HK, JP Tier-2s) 2 * GE (LX) to 4F 20 x Quanta Blades - TaipeiGigaPoP-7609 WN Core Services – CE, (links to TW Tier-2s) BladeCenter RB, DPM, PX, BDII etc. 1 10GBASE-X 2 3 4 10G4X 41611 Diag 1 Stat 10GBASE-X 10G4X 41611 1 2 3 4 Diag 2 Stat 10GBASE-X 10G4X 41611 1 2 3 4 4 * GE (SX) to ASGC Distribution D iag 3 Stat 10GBASE-X 10G4X 41611 1 2 3 4 Switch in Rack#49 Diag 4 Stat 10/100/1000BASE-T G48T 41511 1 5 Diag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Diag 25 Stat 1 25 2 26 3 27 4 28 5 (links to Tier-1 Servers) 29 6 30 7 31 8 32 9 33 10 34 11 35 12 36 13 37 14 38 15 39 16 40 17 41 18 42 19 43 20 44 21 45 22 46 23 G 4 8X a 47 24 48 4 1 54 2 A 6 B 64 x IBM HS20 Stat 1 25 2 26 3 27 4 28 5 29 6 30 7 31 8 32 9 23 10 34 11 35 12 36 13 37 14 38 15 39 16 40 17 41 18 42 19 43 20 44 21 45 22 46 23 47 24 48 7 Blade system - 8 WN 9 BladeCenter 10 DC SMR 48V / 100A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 142 x IBM HS21 Battery Battery Blade system - #1 + #2 #3 + #4 WN
  35. 35. Throughput of WLCG Experiments Throughput defined as Job Eff. x # Jobs running Characteristic of 4 LHC Exp. depicting in-efficiencyis due to poor coding.
  36. 36. Reliability From Different View Perspective
  37. 37. Summary Deploy highly-scalable DM system and performance drivenstorage infrastructure Eliminate possible complexity of SRM abstraction layer Resource utilization, provisioning and optimization From POC to Production, the challenges remains: Data Challenge, Service Challenge, CCRC08, STEP09, etc. Motivation appear clear for: Medical, Climate, Cosmological Operation wide: Robust Database setup KB for fabric infrastructure operation Fast enough event processing and documentation Consider beyond the data management use cases in WLCG: commonality in many other disciplines in EGEE infrastructure actively participate in e-Science collaboration within the region
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×