On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Specify the main parameters characterizing the Model’s performance: throughputs, latencies
Determine classes of Computing Models feasible for LHC (matched to network capacity and data handling resources)
Develop “Baseline Models” in the “feasible” category
Verify resource requirement baselines: (computing, data handling, networks)
Define the Analysis Process
Define Regional Center Architectures
Provide Guidelines for the final Models
622 Mbits/s 622 Mbits/s Desk tops CERN n.10 7 MIPS m Pbyte Robot University n.10 6 MIPS m Tbyte Robot FNAL 4.10 7 MIPS 110 Tbyte Robot 622 Mbits/s N x 622 Mbits/s Optional Air Freight 622Mbits/s 622 Mbits/s Desk tops Desk tops
Baseline architecture for regional centres, Technology tracking, Survey of computing model of current HENP experiments
Analysis Model WG
Evaluation of LHC data analysis model and use cases
Develop a simulation tool set for performance evaluation of the computing models
Evaluate the performance of ODBMS, network in the distributed environment
General Need for distributed data access and analysis:
Potential problems of a single centralized computing center include:
- scale of LHC experiments: difficulty of accumulating and managing all resources at one location
- geographic spread of LHC experiments: providing equivalent location independent access to data for physicists
- help desk, support and consulting in same time zone
- cost of LHC experiments: optimizing use of resources located world wide
Motivations for Regional Centers
A distributed computing architecture based on regional centers offers:
A way of utilizing the expertise and resources residing in computing centers all over the world
Provide local consulting and support
To maximize the intellectual contribution of physicists all over the world without requiring their physical presence at CERN
Acknowledgement of possible limitations of network bandwidth
Allows people to make choices on how they analyze data based on availability or proximity of various resources such as CPU, data, or network bandwidth .
Future Experiment Survey
From the previous survey, we saw many sites contributed to Monte Carlo generation
This is now the norm
New experiments trying to use the Regional Center concept
BaBar has Regional Centers at IN2P3 and RAL, a smaller one in Rome
STAR has Regional Center at LBL/NERSC
CDF and D0 offsite institutions paying more attention as run gets closer.
Future Experiment Survey
Other observations/ requirements
In the last survey, we pointed out the following requirements for RC’s:
software development team
diverse body of users
good, clear documentation of all s/w and s/w tools
The following are requirements for the central site (I.e. CERN)
Central code repository easy to use and easily accessible for remote sites
be “sensitive” to remote sites in database handling, raw data handling and machine flavors
provide good, clear documentation of all s/w and s/w tools
The experiments in this survey achieving the most in distributed computing are following these guidelines
Tier1: National “Regional” Center
Tier2: Regional Center
Tier3: Institute Workgroup Server
Tier4: Individual Desktop
Total 5 Levels
250 Gbps 0.8 Gbps 8 Gbps ………… 1400 boxes 160 clusters 40 sub-farms 12 Gbps* 480 Gbps* 3 Gbps* 1.5 Gbps 100 drives 12 Gbps 5400 disks 340 arrays …… ... LAN-SAN routers LAN-WAN routers CMS Offline Farm at CERN circa 2006 lmr for Monarc study- april 1999 tapes 0.8 Gbps (daq) 0.8 Gbps 5 Gbps disks processors storage network storage network farm network * assumes all disk & tape traffic on storage network double these numbers if all disk & tape traffic through LAN-SAN router CERN
Processor cluster basic box four 100 SI95 processors standard network connection (~2 Gbps) 15% of systems configured as I/O servers (disk server, disk-tape mover, Objy AMS, ..) with additional connection to the storage network cluster 9 basic boxes with a network switch (<10 Gbps) sub-farm 4 clusters - with a second-level network switch (<50 Gbps) one sub-farm fits in one rack cluster and sub-farm sizing adjusted to fit conveniently the capabilities of network switch, racking, power distribution components sub-farm: 36 boxes, 144 cpus, 5 m 2 lmr for Monarc study- april 1999 3 Gbps* 1.5 Gbps configured as I/O servers storage network farm network
Regional Centers will
Provide all technical services and data services required to do the analysis
Maintain all (or a large fraction of) the processed analysis data. Possibly may only have large subsets based on physics channels. Maintain a fixed fraction of fully reconstructed and raw data
Cache or mirror the calibration constants
Maintain excellent network connectivity to CERN and excellent connectivity to users in the region. Data transfer over the network is preferred for all transactions but transfer of very large datasets on removable data volumes is not ruled out.
Share/develop common maintenance, validation, and production software with CERN and the collaboration
Provide services to physicists in the region, contribute a fair share to post-reconstruction processing and data analysis, collaborate with other RCs and CERN on common projects, and provide services to members of other regions on a best effort basis to further the science of the experiment
Provide support services, training, documentation, trouble shooting to RC and remote users in the region
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim-->ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups Individual Analysis AOD-->DPD and plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Support Services
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Production Reconstruction Raw/Sim-->ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups Individual Analysis AOD-->DPD and plots Chaotic Physicists Desktops Tier 2 Local institutes CERN Tapes Data Input Rate from CERN: Raw Data - 5% 50TB/yr ESD Data - 50% 50TB/yr AOD Data - All 10TB/yr Revised ESD - 20TB/yr Data Input from Tier 2: Revised ESD and AOD - 10TB/yr Data Input from Simulation Centers: Raw Data - 100TB/yr Data Output Rate to CERN: AOD Data - 8 TB/yr Recalculated ESD - 10 TB/yr Simulation ESD data - 10 TB/yr Data Output to Tier 2: Revised ESD and AOD - 15 TB/yr Data Output to local institutes: ESD, AOD, DPD data - 20TB/yr Total Storage: Robotic Mass Storage - 300TB Raw Data: 50TB 5*10**7 events (5% of 1 year) Raw (Simulated) Data: 100TB 10**8 events EDS (Reconstructed Data): 100TB - 10**9 events (50% of 2 years) AOD (Physics Object) Data: 20TB 2*10**9 events (100% of 2 years) Tag Data: 2TB (all) Calibration/Conditions data base: 10TB (only latest version of most data types kept here) Central Disk Cache - 100TB (per user demand) CPU Required for AMS database servers: ??*10**3 SI95 power
Physics Sftware Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Production Reconstruction Raw/Sim-->ESD Scheduled experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups Individual Analysis AOD-->DPD and plots Chaotic Physicists Tier 2 Local institutes CERN Tapes Farms of low cost commodity computers, limited I/O rate, modest local disk cache ----------------------------------------------------- Reconstruction Jobs: Reprocessing of raw data: 10**8 events/year (10%) Initial processing of simulated data: 10**8/year 1000 SI95-sec/event ==> 10**4 SI95 capacity: 100 processing nodes of 100 SI95 power Event Selection Jobs: 10 physics groups * 10**8 events (10%samples) * 3 times/yr based on ESD and latest AOD data 50 SI95/evt ==> 5000 SI95 power Physics Object creation Jobs: 10 physics groups * 10**7 events (1% samples) * 8 times/yr based on selected event sample ESD data 200 SI95/event ==> 5000 SI95 power Derived Physics data creation Jobs: 10 physics groups * 10**7 events * 20 times/yr based on selected AOD samples, generates “ canonical” derived physics data 50 SI95/evt ==> 3000 SI95 power Total 110 nodes of 100 SI95 power Derived Physics data creation Jobs: 200 physicists * 10**7 events * 20 times/yr based on selected AOD and DPD samples 20 SI95/evt ==> 30,000 SI95 power Total 300 nodes of 100 SI95 power Desktops
MONARC Analysis Process Example
Model and Simulation parameters
Have a new set of parameters common to all simulating groups.
More realistic values, but still to be discussed/agreed on the basis of Experiment’s information.
Reconstruction of 10 7 events : RAW--> ESD --> AOD --> TAG at CERN It’s the production while the data are coming from the DAQ (100 days of running collecting a billion of events per year)
Analysis of 5 Working Groups each of 25 analyzers on TAG only (no request to higher level data samples). Every analyzer submit 4 sequential jobs on 10 6 events. Each analyzer work start-time is a flat random choice in the range of 3000 seconds. Each analyzer data sample of 10 6 events is a random choice in the complete data sample of TAG DataBase consisting of 10 7 events.
Transfer (FTP) of a 10 7 events ESD, AOD and TAG from CERN to RC
CERN Activities : Reconstruction, 5 WG Analysis, FTP transfer
RC Activities : 5 (uncorrelated) WG Analysis, receive FTP transfer
Job’s “paper estimate”:
Single Analysis Job : 1.67 CPU hours at CERN = 6000 sec at CERN (same at RC)
Reconstruction at CERN for 1/500 RAW to ESD : 3.89 CPU hours = 14000 sec
Reconstruction at CERN for 1/500 ESD to AOD : 0.03 CPU hours = 100 sec
Resources: LAN speeds ?!
In our Models the DB Servers are uncorrelated and thus one activity uses a single Server. The bottlenecks are the “read” and “write” speed to and from the Server. In order to use the CPU power at reasonable percentage we need a read speed of at least 300 MB/s and a write speed of 100 MB/s (milestone already met today)
We use 100 MB/s in current simulations (10 Gbits/sec switched LANs in 2005 may be possible).
Processing node link speed is negligible in our simulations.
Of course the “real” implementation of the Farms can be different, but the results of the simulation do not depend on “real” implementation: they are based on usable resources.
See following slides
More realistic values for CERN and RC
Data Link speeds at 100 MB/sec (all values) except :
Node_Link_Speed at 10 MB/sec
WAN Link speeds at 40 MB/sec
1000 Processing nodes each of 500 SI95
200 Processing nodes each of 500 SI95
1000 Processing nodes times 500SI95 = 500kSI95 about the CPU power of CERN Tier0 disk space as for the number of DBs 100kSI95 processing Power = 20% CERN disk space as for the number of DBs
MONARC simulation tools are:
sophisticated enough to allow modeling of complex distributed analysis scenarios
simple enough to be used by non experts
Initial modeling runs are alkready showing interesting results
Future work will help identify bottlenecks and understand constraints on architectures
MONARC Phase 3
More Realistic Computing Model Development
Confrontation of Models with Realistic Prototypes;
At Every Stage: Assess Use Cases Based on Actual Simulation, Reconstruction and Physics Analyses;
Participate in the setup of the prototyopes
We will further validate and develop MONARC simulation system using the results of these use cases (positive feedback)
Continue to Review Key Inputs to the Model
CPU Times at Various Phases
Data Rate to Storage
Tape Storage: Speed and I/O
Employ MONARC simulation and testbeds to study CM variations, and suggest strategy improvements
MONARC Phase 3
Reclustering, Restructuring; transport operations
Caching, migration (HMSM), etc.
QoS Mechanisms: Identify Which are important
Distributed System Resource Management and Query Estimators
(Queue management and Load Balancing)
Development of MONARC Simulation Visualization Tools for interactive Computing Model analysis
Relation to GRID
The GRID project is great!
Development of s/w tools needed for implementing realistic LHC Computing Models
farm management, WAN resource and data management, etc….
Help in getting funds for real life testbed systems (RC prototypes)
Complementarity GRID-MONARC hierarchical RC Model
Hierarchy of RC is a safe option. If GRID will bring big advancements, less hierarchical models should alo become possible
Timings well matched
MONARC Phase-3 to last ~1 year: bridge to GRID project starting early in 2001
Afterwards common work by LHC experiments for developping the computing models will surely be still needed: in which project framework and for how long we will see then...