Your SlideShare is downloading. ×
0
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
hepix2005-infn.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

hepix2005-infn.ppt

232

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
232
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF
  • 2. Introduction <ul><li>Location: INFN-CNAF, Bologna (Italy) </li></ul><ul><ul><li>one of the main nodes of GARR network </li></ul></ul><ul><li>Computing facility for INFN HNEP community </li></ul><ul><ul><li>Partecipating to LCG, EGEE, INFNGRID projects </li></ul></ul><ul><li>Multi-Experiment TIER1 </li></ul><ul><ul><li>LHC experiments </li></ul></ul><ul><ul><li>VIRGO </li></ul></ul><ul><ul><li>CDF </li></ul></ul><ul><ul><li>BABAR </li></ul></ul><ul><ul><li>AMS, MAGIC, ARGO, ... </li></ul></ul><ul><li>Resources assigned to experiments on a yearly Plan. </li></ul>
  • 3. Services <ul><li>Computing servers (CPU FARMS) </li></ul><ul><li>Access to on-line data (Disks) </li></ul><ul><li>Mass Storage/Tapes </li></ul><ul><li>Broad-band network access </li></ul><ul><li>System administration </li></ul><ul><li>Database administration </li></ul><ul><li>Experiment specific library software </li></ul><ul><li>Coordination with TIER0, other TIER1s and TIER2s </li></ul>
  • 4. Infrastructure <ul><li>Hall in the basement (-2 nd floor): ~ 1000 m 2 of total space </li></ul><ul><ul><li>Easily accessible with lorries from the road </li></ul></ul><ul><ul><li>Not suitable for office use (remote control) </li></ul></ul><ul><li>Electric power </li></ul><ul><ul><li>220 V mono-phase (computers) </li></ul></ul><ul><ul><ul><li>4 x 16A PDU needed for 3.0 GHz Xeon racks </li></ul></ul></ul><ul><ul><li>380 V three-phase for other devices (tape libraries, air conditioning etc…) </li></ul></ul><ul><ul><li>UPS: 800 KVA (~ 640 KW) </li></ul></ul><ul><ul><ul><li>needs a separate room (conditioned and ventilated). </li></ul></ul></ul><ul><ul><li>Electric Generator: 1250 KVA (~ 1000 KW) </li></ul></ul><ul><ul><ul><li> up to 160 racks (~100 with 3.0 GHz Xeon) </li></ul></ul></ul>
  • 5. HW Resources <ul><li>CPU: </li></ul><ul><ul><li>320 old biprocessor boxes 0.8-2.4 GHz </li></ul></ul><ul><ul><li>350 new biprocessor boxes 3 GHz (+70 servers +55 Babar + 48 CDF +30 LHCb) </li></ul></ul><ul><ul><ul><li>1300 KSi2K Total </li></ul></ul></ul><ul><li>Disk: </li></ul><ul><ul><li>FC, IDE, SCSI, NAS </li></ul></ul><ul><li>Tapes: </li></ul><ul><ul><li>Stk L180 18 TB </li></ul></ul><ul><ul><li>Stk 5500 </li></ul></ul><ul><ul><ul><li>6 LTO-2 with 2000 tapes  400 TB </li></ul></ul></ul><ul><ul><ul><li>2 9940B with 800 tapes  200 TB </li></ul></ul></ul><ul><li>Networking: </li></ul><ul><ul><li>30 rack switches  46 FE UTP + 2 GE FO </li></ul></ul><ul><ul><li>2 core switches  96 GE FO + 120 GE FO + 4x10 GE </li></ul></ul><ul><ul><li>2x1Gbps links to WAN </li></ul></ul>
  • 6. Networking <ul><li>GARR- G Backbone with 2.5 Gbps F/O will be upgraded to Dark Fiber (Q3 2005) </li></ul><ul><li>INFN-TIER1 access is now 1 Gbps (+1 Gbps for SC) and will be 10 Gbps soon (September 2005) </li></ul><ul><ul><li>Gigapop is colocated with INFN-CNAF </li></ul></ul><ul><li>International Connectivity via Geant: 10 Gbps access in Milan already in place. </li></ul>
  • 7. CNAF Network Setup 1Gb/s Dedicated to SC GARR Italian Research Network ER16 BD SSR 8860 ServiceCahllenge Summit 400 10Gb/s CNAF Internal Services 1Gb/s n x 10Gb/s Back Door 1Gb/s Production link FarmSW3(IBM) FarmSWG1 FarmSW1 FarmSW2(Dell) LHCBSW1 FarmSW4(IBM3) Catalyst3550 Farmsw4 Catalyst3550 SW-04-06 SW-04-07 SW-04-08 SW-04-09 SW-04-10 SW-03-06 FarmSW12 ServSW2 Catalyst3550 SW-03-07 SW-05-01 SW-05-02 SW-05-03 SW-05-04 SW-05-05 SW-05-06 SW-05-07 SW-06-06 SW-06-01 FarmSWG3 HP Babar FarmSWtest SW-05-08 SW-05-09 SW-08-04
  • 8. Tier1 LAN <ul><li>Each CPU rack (36 WNs) equipped with FE switch with 2xGb uplinks to core switch </li></ul><ul><li>Disk servers connected via GE to core switch </li></ul><ul><li>Foreseen upgrade to rack Gb switch </li></ul><ul><ul><li>10 GB core switch already installed </li></ul></ul>
  • 9. LAN model layout SAN STK Tape Library NAS Rack_farm Rack_farm
  • 10. Networking resources <ul><li>30 Switches (14 switches 10Gb Ready) </li></ul><ul><li>3 Core Switch/Router (SSR8600, ER16, BD) </li></ul><ul><ul><li>SSR8600 is also the WAN access router with firewalling functions) </li></ul></ul><ul><ul><li>A new Black Diamond 10808 is already installed with 120 GE and 12x10GE ports (it can scale up to 480 GE or 48x10GE) </li></ul></ul><ul><ul><li>New access router (with 4x10GE and 4xGE interfaces) in order to substitute the SSR8600 as WAN access (ER 16 and Black Diamond will aggregate all the Tier1’s resources). </li></ul></ul><ul><li>3 Switch L2/L3 (with 48xGE and 2x10GE) to be used during the “Service Challenge”. </li></ul>
  • 11. Farming <ul><li>Team composition </li></ul><ul><ul><li>2 FTE for general purpose farm </li></ul></ul><ul><ul><li>1 FTE for hosted farms (CDF and Babar) </li></ul></ul><ul><ul><li>~3 FTE clearly not enough to manage ~800 WNs  more people needed </li></ul></ul><ul><li>Tasks </li></ul><ul><ul><li>Installation &amp; management of Tier1 WNs and servers </li></ul></ul><ul><ul><ul><li>Using Quattor (still some legacy lcfgng nodes around) </li></ul></ul></ul><ul><ul><ul><ul><li>Deployment &amp; configuration of OS &amp; LCG middleware </li></ul></ul></ul></ul><ul><ul><ul><li>HW maintenance management </li></ul></ul></ul><ul><ul><li>Management of batch scheduler (LSF, torque) </li></ul></ul>
  • 12. Access to Batch system “ Legacy” non Grid Access CE LSF Wn1 WNn SE Grid Access UI UI UI UI Grid
  • 13. Farming: where were we last year <ul><li>Computing resources not fully used </li></ul><ul><ul><li>Batch system (Torque+maui) showed not to be scalable </li></ul></ul><ul><ul><li>Present policy assignment does not allow resource use optimization </li></ul></ul><ul><li>Interoperability issues </li></ul><ul><ul><li>Full experiment integration still to be achieved </li></ul></ul><ul><ul><li>Difficulty to deal with 3 different farms </li></ul></ul>
  • 14. Farming: evolution (1) <ul><li>Migration of whole farm to SLC 3.0.4 (CERN version!) almost complete </li></ul><ul><ul><li>Quattor deployment successful (more than 500 WNs) </li></ul></ul><ul><ul><li>Standard configuration of WNs for all experiments </li></ul></ul><ul><ul><li>To be completed in a couple of weeks </li></ul></ul><ul><li>Migration from torque+maui to LSF (v6.0) </li></ul><ul><ul><li>LSF farm running successfully </li></ul></ul><ul><ul><li>Process to be completed together with OS migration </li></ul></ul><ul><ul><li>Fair sharing model for resource access </li></ul></ul><ul><ul><li>Progressive inclusion of BABAR &amp; CDF WNs into general farm </li></ul></ul>
  • 15. Farming: evolution (2) <ul><li>LCG upgrade to 2.4.0 release </li></ul><ul><ul><li>Installation via quattor ( project- [email_address] ) </li></ul></ul><ul><ul><ul><li>Dropped lcfgng </li></ul></ul></ul><ul><ul><ul><li>Different packaging from yaim </li></ul></ul></ul><ul><ul><ul><li>Deployed upgrade to 500 nodes in one day </li></ul></ul></ul><ul><ul><li>Still some problems with VOMS integration to be investigated </li></ul></ul><ul><ul><li>1 legacy LCG 2.2.0 CE still running </li></ul></ul><ul><li>Access to resources centrally managed with Kerberos (authc) and LDAP (authz) </li></ul>
  • 16. c=it o=infn ou=cnaf ou=cr U U U ou=grid ou=ui ou=wn ou=private ou=public AFS: infn.it ou=cr ou=grid bastion G G U ou=local A WorkerN (pool accounts) UserInt o=cnaf U G Generic CNAF users U ext.acess CR.WorkerNode CR.UserInterface locaUser gridUser AFSinfn.it user infn user infn user A A public A ou=afs Authorization with LDAP
  • 17. Some numbers (LSF output) <ul><li>QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP </li></ul><ul><li>alice 40 Open:Active - - - - 50 0 50 0 </li></ul><ul><li>cdf 40 Open:Active - - - - 840 152 688 0 </li></ul><ul><li>dteam 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>atlas 40 Open:Active - - - - 26 12 14 0 </li></ul><ul><li>cms 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>lhcb 40 Open:Active - - - - 7 7 0 0 </li></ul><ul><li>babar_test 40 Open:Active 300 300 - - 302 2 300 0 </li></ul><ul><li>babar 40 Open:Active 20 20 - - 2073 2060 13 0 </li></ul><ul><li>virgo 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>argo 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>magic 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>ams 40 Open:Active - - - - 136 0 136 0 </li></ul><ul><li>infngrid 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>guest 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>test 40 Open:Active - - - - 0 0 0 0 </li></ul><ul><li>1200 jobs running! </li></ul>
  • 18. Storage &amp; Database <ul><li>Team composition </li></ul><ul><ul><li>~ 1.5 FTE for general storage </li></ul></ul><ul><ul><li>~ 1 FTE for CASTOR </li></ul></ul><ul><ul><li>~ 1 FTE for DBs </li></ul></ul><ul><li>Tasks </li></ul><ul><ul><li>DISK (SAN, NAS) - HW/SW installation and maintenance, remote (gridSE) and local (rfiod/nfs/GPFS) access service, clustered/parallel filesystem tests, participation to SC </li></ul></ul><ul><ul><ul><li>2 SAN systems (~ 225 TB) </li></ul></ul></ul><ul><ul><ul><li>4 NAS systems (~ 60TB) </li></ul></ul></ul><ul><ul><li>CASTOR HSM system - HW/SW installation and maintenance, gridftp and SRM access service </li></ul></ul><ul><ul><ul><li>STK library with 6 LTO2 and 2 9940B drives (+4 to install) </li></ul></ul></ul><ul><ul><ul><ul><li>1200 LTO2 (200 GB) tapes </li></ul></ul></ul></ul><ul><ul><ul><ul><li>680 9940B (200 GB) tapes </li></ul></ul></ul></ul><ul><ul><li>DB (Oracle for Castor &amp; RLS test, Tier 1 “global” Hardware db) </li></ul></ul>
  • 19. Storage setup <ul><li>Physical access to main storage (Fast-T900) via SAN </li></ul><ul><ul><li>Level1 disk servers connected via FC </li></ul></ul><ul><ul><ul><li>Usually also in GPFS cluster </li></ul></ul></ul><ul><ul><ul><ul><li>Easiness of administration </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Load balancing and redundancy </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Lustre under evaluation </li></ul></ul></ul></ul><ul><ul><li>Can be level2 disk servers connected to storage only via GPFS </li></ul></ul><ul><ul><ul><li>LCG and FC dependencies on OS decoupled </li></ul></ul></ul><ul><li>WNs are not members of GPFS cluster (no scalability on large number of WNs) </li></ul><ul><ul><li>Storage available to WNs via rfio, xrootd (BABAR) or NFS (few cases, see next slide) </li></ul></ul><ul><ul><li>NFS used mainly to share experiment sw on WNs </li></ul></ul><ul><ul><ul><li>But not suitable for data access </li></ul></ul></ul>
  • 20. NFS stress test NFS server ... ... 1) Connectathon (NFS/RPC test) 2) iozone (NFS-I/O test) NFS clients fibre channell <ul><li>Parameters </li></ul><ul><li>client kernel </li></ul><ul><li>server kernel </li></ul><ul><li>rsize </li></ul><ul><li>wsize </li></ul><ul><li>protocol </li></ul><ul><li>test execution time </li></ul><ul><li>I/O disk write </li></ul><ul><li>I/O wait server threads </li></ul><ul><li>Results </li></ul><ul><li>problems with 2.4.21-27.0.2 </li></ul><ul><li>kernel 2.6 is better than 2.4 (even with sdr patch) </li></ul><ul><li>UDP protocol has better performance than TCP </li></ul><ul><li>better rsize is 32768 </li></ul><ul><li>better wsize is 8192 </li></ul><ul><li>nfs protocol may scale over 200 clients without </li></ul><ul><li>aggregate performance degradation. </li></ul>job scheduler WNs NAS LSF
  • 21. Disk storage tests <ul><li>Data processing for the LHC experiments at Tier1 facilities requires the access to PetaBytes of data from thousands of nodes simultaneously at high rate </li></ul><ul><li>None of old-fashioned storage systems allow to handle these requirements </li></ul><ul><li>We are in the process of defining the required hardware and software components of the system </li></ul><ul><ul><li>It is emerging that a Storage Area Network approach with a Parallel File System on top can make the job </li></ul></ul>V. Vagnoni – LHCb Bologna
  • 22. Hardware testbed <ul><li>Disk storage </li></ul><ul><ul><li>3 controllers of a IBM DS4500 </li></ul></ul><ul><ul><li>Each controller serves 2 RAID5 arrays, 4 TB each (17 x 250 GB disks + 1 hot spare) </li></ul></ul><ul><ul><li>Each RAID5 is further subdivided in two LUNs of 2 TB each </li></ul></ul><ul><ul><li>12 LUNs and 24 TB of disk space in total (102 x 250GB disks + 8 hot spares) </li></ul></ul><ul><li>File System Servers </li></ul><ul><ul><li>6 IBM xseries 346, dual Xeon, 2 GB RAM, Gigabit NIC </li></ul></ul><ul><ul><li>QLogic fiber channel PCI card on each server connected to the DS4500 via a Brocade switch </li></ul></ul><ul><ul><li>6 Gb/s available bandwidth to/from the clients </li></ul></ul><ul><li>Clients </li></ul><ul><ul><li>36 dual Xeon, 2 GB RAM, Gigabit NIC </li></ul></ul>V. Vagnoni – LHCb Bologna
  • 23. Parallel File Systems <ul><li>We evaluated the two main-stream products on the market </li></ul><ul><ul><li>GPFS (version 2.3) by IBM </li></ul></ul><ul><ul><li>Lustre (version 1.4.1) by CFS </li></ul></ul><ul><li>Both come with advanced management and configuration features, SAN-oriented failover mechanism, data recovery </li></ul><ul><ul><li>But can be used as well on top of standard disks and arrays </li></ul></ul><ul><li>By using GPFS and Lustre, the 12 DS4500 LUNs in use were aggregated in one 24 TB file system from the servers, and mounted by the clients through the Gigabit network </li></ul><ul><ul><li>Both the file systems are 100% POSIX-compliant from the client side </li></ul></ul><ul><ul><li>The file systems appear to the clients as ordinary local mount points </li></ul></ul>V. Vagnoni – LHCb Bologna
  • 24. Performance (1) <ul><li>A home-made benchmarking tool oriented to HEP applications has been written </li></ul><ul><ul><li>Allows simultaneous sequential read/write from an arbitrary number of clients and processes per client </li></ul></ul>Raw ethernet throughput vs time (20 x 1GB file simultaneous reads) V. Vagnoni – LHCb Bologna
  • 25. Performance (2) Results of read/write (1GB different files) V. Vagnoni – LHCb Bologna Net throughput (Gb/s) # of simultaneous read/writes
  • 26. CASTOR issues <ul><li>At present STK library with 6xLTO2 and 2x9940B drives </li></ul><ul><ul><li>2000 x 200GB LTO-2 tapes  400TB </li></ul></ul><ul><ul><li>800 x 200GB 9940B tapes  160TB (free) </li></ul></ul><ul><ul><li>Tender for upgrade with 2500 x 200GB tapes (500TB) </li></ul></ul><ul><li>In general CASTOR performances (as other HSM software) increase with clever pre-staging of files (ideally ~ 90%) </li></ul><ul><li>LTO-2 drives not usable in a real production environment with present CASTOR release </li></ul><ul><ul><li>hangs on locate/fskip occur every 50-100 not-sequential reading operations or checksum and not terminated tape (RDONLY) every 50/100GB data written (also STK assistance is needed) </li></ul></ul><ul><ul><li>Usable only with medium file size of 20 MB or more </li></ul></ul><ul><ul><li>Good reliability on optimized (sequential or pre-staged) operations </li></ul></ul><ul><ul><li>Fixes with CASTOR v.2 (Q2 2005) ? </li></ul></ul><ul><li>CERN and PIC NEVER reported HW problems with 9940B drives during last year data-challenges. </li></ul>
  • 27. Service challenge (1) <ul><li>WAN </li></ul><ul><li>Dedicated 1Gb/s link connected via GARR+Geant </li></ul><ul><li>10 Gb/s link available in September ‘05 </li></ul><ul><li>LAN </li></ul><ul><li>Extreme Summit 400 (48xGE+2x10GE) dedicated to Service Challenge. </li></ul>CERN 10Gb in September 2005 1Gb Summit 400 48 Gb/s+2x10Gb/s GARR Italian Research Network 11 Servers + Internal HDs
  • 28. Service challenge (2) <ul><li>11 SUN Fire V20 Dual Opteron (2,2 Ghz) </li></ul><ul><ul><li>2x 73 GB U320 SCSI HD </li></ul></ul><ul><ul><li>2x Gbit Ethernet interfaces </li></ul></ul><ul><ul><li>2x PCI-X Slots </li></ul></ul><ul><li>OS: SLC3.0.4 (arch x86_64), the kernel is 2.4.21-27 </li></ul><ul><ul><li>Tests bonnie++/IOzone on local disks  ~ 60 MB/s r and w </li></ul></ul><ul><ul><li>Tests Netperf/Iperf on LAN  ~950 Mb/s </li></ul></ul><ul><li>Globus (GridFTP) v2.4.3 Installed on all cluster nodes </li></ul><ul><li>CASTOR SRM v1.2.12, Stager CASTOR v1.7.1.5. </li></ul>
  • 29. SC2: Transfer cluster <ul><li>10 machines used as GridFTP/SRM servers, 1 as CASTOR Stager/SRM-repository </li></ul><ul><ul><li>Internal disks used (70x10=700GB). </li></ul></ul><ul><ul><li>For SC3 also CASTOR tape servers with IBM LTO2 or STK 9940B drives will be used </li></ul></ul><ul><li>Load balancing implemented assigning to a CNAME the IP addresses of all the 10 servers with round-robin algorithm </li></ul><ul><li>SC2 goal reached: 100 MBps disk-to-disk for 2 weeks sustained </li></ul>
  • 30. Sample of production: CMS (1) <ul><li>CMS activity at T1: grand summary </li></ul><ul><ul><li>Data transfer </li></ul></ul><ul><ul><ul><li>&gt;1 TB per day T0  T1 in 04 via PhEDEx </li></ul></ul></ul><ul><ul><li>Local MC production </li></ul></ul><ul><ul><ul><li>&gt;10 Mevts for &gt;40 physics datasets </li></ul></ul></ul><ul><ul><li>Grid activities </li></ul></ul><ul><ul><ul><li>official production on LCG </li></ul></ul></ul><ul><ul><ul><li>analysis on DST via grid tools </li></ul></ul></ul>
  • 31. Sample of production: CMS (2)
  • 32. Sample of production: CMS (3)
  • 33. Summary &amp; Conclusions <ul><li>During 2004 INFN Tier1 deeply involved in LHC </li></ul><ul><ul><li>Some experiments (e.g. BABAR, CDF) already in data taking phase </li></ul></ul><ul><li>Main issue is HR shortage </li></ul><ul><ul><li>~10 FTE required for Tier1 to nearly double the staff (both for hw and sw maintenance) </li></ul></ul>

×