Your SlideShare is downloading. ×
Soluzioni HW per il Tier 1 al CNAF
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Soluzioni HW per il Tier 1 al CNAF

460
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
460
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Soluzioni HW per il Tier 1 al CNAF
      • Luca dell’Agnello
      • Stefano Zani
      • (INFN – CNAF, Italy)
      • III CCR Workshop
      • May 24-27 2004
  • 2. Tier1
    • INFN computing facility for HEP community
      • Ending prototype phase last year, now fully operational
      • Location: INFN-CNAF, Bologna (Italy)
        • One of the main nodes on GARR network
      • Personnel: ~ 10 FTE’s
        • ~ 3 FTE's dedicated to experiments
    • Multi-experiment
      • LHC experiments(Alice, Atlas, CMS, LHCb), Virgo, CDF, BABAR, AMS, MAGIC, ...
      • Resources dynamically assigned to experiments according to their needs
    • 50% of the Italian resource for LCG
      • Participation to experiments data challenge
      • Integrated with Italian Grid
      • Resources accessible also in traditional way
  • 3. Logistics
    • Recently moved to a new location (last January)
      • Hall in the basement (-2 nd floor)
      • ~ 1000 m 2 of total space
        • Computing Nodes
        • Storage Devices
        • Electric Power System (UPS)
        • Cooling and Air conditioning system
        • Garr GPop
      • Easily accessible with lorries from the road
      • Not suitable for office use (remote control needed)
  • 4. Electric Power
    • Electric Power Generator
      • 1250 KVA (~ 1000 KW)
    •  up to 160 racks
    • Uninterruptible Power Supply (UPS)
      • Located into a separate room (conditioned and ventilated)
      • 800 KVA (~ 640 KW)
    • 380 V three-phase distributed to all racks (Blindo)
      • Rack power controls output 3 independent 220 V lines for computers
      • Rack power controls sustain burden up to 16 or 32 A
        • 32 A power controls needed for Xeon 36 bi-processors racks
      • 3 APC power distribution modules (24 outlets each)
        • Completely programmable (allows gradual servers switching on)
        • Remotely manageable via web
    • 380 V three-phase for other devices (tape libraries, air conditioning, etc…)
  • 5. Cooling & Air Conditioning
    • RLS (Airwell) on the roof
      • ~ 700 KW
      • Water cooling
      • Need “booster pump” (20 mts T1  roof)
      • Noise insulation
    • 1 Air Conditioning Unit (uses 20% of RLS refreshing power and controls humidity)
    • 12 Local Cooling Systems (Hiross) in the computing room
  • 6. WN typical Rack Composition
    • Power Controls (3U)
    • 1 network switch (1-2U)
      • 48 FE copper interfaces
      • 2 GE fiber uplinks
    • 34-36 1U WNs
      • Connected to network switch via FE
      • Connected to KVM system
  • 7. Remote console control
    • Paragon UTM8 (Raritan)
      • 8 Analog (UTP/Fiber) output connections
      • Supports up to 32 daisy chains of 40 nodes (UKVMSPD modules needed)
      • Costs: 6 KEuro + 125 Euro/server (UKVMSPD module)
      • IP-reach (expansion to support IP transport) evaluted but not used
    • Autoview 2000R (Avocent)
      • 1 Analog + 2 Digital (IP transport) output connections
      • Supports connections up to 16 nodes
        • Optional expansion to 16x8 nodes
      • Compatible with Paragon (“gateway” to IP)
    • Evaluating Cyclades Alterpath KVM via serial line (cheaper)
  • 8. Networking (1)
    • Main Network infrastructure based on optical fibres (~ 20 Km)
      • To ease adoption of new (High Performances) transmission technologies
      • To insure a better electrical insulation on long distances
      • Local (Rack wide) links with UTP (copper) cables
    • LAN has a “classical” star topology
      • GE core switch (Enterasys ER16)
      • NEW core switch is going to be shipped (Next july)
        • 120 Gbit Fiber (Scale up to 480 ports)
        • 12 10 Gbit Ethernet (Scale up to max 48 ports)
      • Farms up-link via GE trunk (Channel) to core switch
      • Disk Servers directly connected to GE switch (mainly fibre)
  • 9. Networking (2)
    • WN's connected via FE to rack switch (1 switch per rack)
      • Not a single brand for switches (as for wn's)
        • 3 Extreme Summit 48 FE + 2 GE ports
        • 3 3550 Cisco 48 FE + 2 GE ports
        • 8 Enterasys 48 FE 2GE ports
        • 7 switch Summit400 48 GE copper + 2 GE ports + (2x10Gb ready)
      • Homogeneous characteristics
        • 48 Copper Ethernet ports
        • Support of main standards (e.g. 802.1q)
        • 2 Gigabit up-links (optical fibers) to core switch
    • CNAF interconnected to GARR-G backbone at 1 Gbps.
      • Giga-PoP co-located
      • 2 x 1 Gbps test links to CERN, Karlsruhe
  • 10. SSR8600 S.Zani Disk Server F.C. F.C. F.C. F.C. 131.154.99.121 FarmSWG2 F.C. 1 st Floor Internal services T1 Network Configuration F.C. FarmSW3(IBM) NAS4 FarmSWG1 Bo 12KGP FarmSW1 FarmSW2(Dell) LHCBSW1 NAS2 NAS3 FarmSW4(IBM3) Catalyst3550 FarmSW5(3Com) DELL AXUS SAN FarmSW9 FarmSW12 FarmSW6 FarmSW7 FarmSW8 FarmSW10 FarmSW11 STK Babar SW NAS1 I nfortrend
  • 11. L2 Configuration
    • Each Experiment has its own VLAN
    • Solution adopted for complete granularity
      • Port based VLAN
      • VLAN identifiers are propagated across switches (802.1q)
      • Avoid recabling ( or physical moving ) of machines to change farm topology
    • Level 2 isolation of farms
    • Possibility to define multi-tag (Trunk) ports (for servers)
  • 12. Power Switches
    • 2 models used at Tier1:
      • “ Old” APC MasterSwitch Control Unit AP9224 controlling 3x8 outlets 9222 PDU from 1 Ethernet
      • “ New” APC PDU Control Unit AP7951 controlling 24 outlets from 1 Ethernet
    • “ zero” Rack Unit (vertical mount)
    • Access to the configuration/control menu via serial/telnet/web/snmp
    • 1 Dedicated machine running APC Infrastruxure Manager Software (in progress)
    • See also: http://www.cnaf.infn.it/cnafdoc/CD0044.doc
  • 13. Remote Power Distribution Unit Screenshot of APC Infrastruxure Manager Software with the status of all TIER1 PDU                                                   
  • 14. Computing units
    • ~ 400 1U rack-mountable Intel dual processor servers
      • 800 MHz – 2.4 GHz
      • ~ 240 wn’s (~ 480 CPU’s) available for LCG
    • To be shipped June 2004:
      • 32 1U bi-processors Pentium 2.4 GHz
      • 350 1U bi-processors Pentium IV 3.06 GHz
        • 2 x 120 GB HDs
        • 4 GB RAM
        • 2159 euro each
    • Tendering:
      • HPC farm with MPI
        • Servers interconnected via Infiniband
      • Opteron farm (near future)
        • To allow experiments to test their software on AMD architecture
  • 15. Storage Resources
    • ~50 TB RAW Disk Space ON LINE.
      • NAS
        • NAS1+NAS4 (3Ware low cost) Tot 4.2 TB
        • NAS2+NAS3 (Procom) Tot 13.2 TB
      • SAN
        • Dell Powervault 660f Tot 7 TB
        • Axus (Brownie) Tot 2 TB
        • STK Bladestore Tot 9 TB
        • Infortrend ES A16F-R Tot 12 TB
        • IBM Fast-T 900 (in few weeks) Tot 150 TB
    • See also:
    • http://www.lnf.infn.it/sis/preprint/ pdf /INFN-TC-03-19.pdf
  • 16. STORAGE resource CLIENT SIDE WAN or TIER1 LAN PROCOM NAS2 Nas2.cnaf.infn.it 8100 Gbyte VIRGO ATLAS Fileserver CMS diskserv-cms-1 PROCOM NAS3 Nas3.cnaf.infn.it 4700 Gbyte ALICE ATLAS IDE NAS1,NAS4 Nas4.cnaf.infn.it 1800+2000 Gbyte CDF LHCB AXUS BROWIE Circa 2200 GByte 2 FC interface DELL POWERVAULT 7100 GByte 2 FC interface FAIL-OVER support Gadzoox Slingshot FC Switch 18 port RAIDTEC 1800 Gbyte 2 SCSI interfaces CASTOR Server+staging STK180 with 100 LTO (10Tbyte Native) Fileserver Fcds2 Alias diskserv-ams-1 diskserv-atlas-1 STK BladeStore Circa 10000 GByte 4 FC interface STK L5500 robot (max 5000) 6 LTO-2 Infortrend ES A16F-R 12 TB
  • 17. Storage management and access (1)
    • Tier1 storage resources accessible as classical storage or via grid
    • Non grid disk storage accessible via NFS
    • Generic WN’s also have AFS client
    • NFS mount volumes configured via autofs and ldap
      • unique configuration repository eases maintenance
      • in progress: integration of ldap configuration with Tier1 db data
    • Scalability issues with NFS
      • Experienced stalled mount points
      • Recent nfs versions use synchronous export: needed to revert to async and use reduced rsize and wsize to avoid huge amount of retransmissions
  • 18. Storage management and access (2)
    • Part of disk storage used as front-end to CASTOR
      • Balance between disk and CASTOR according to experiments needs
    • 1 stager for each experiment (installation in progress)
    • CASTOR accessible both directly or via grid
      • CASTOR SE available
    • ALICE Data Challenge used CASTOR architecture
      • Feedback to CASTOR team
      • Need optimization for file restaging
  • 19. Tier1 Database
    • Resource database and management interface
      • Postgres database as back end
      • Web interface (apache+mod_ssl+php)
      • Hw servers characteristics
      • Sw servers configuration
      • Servers allocation
    • Possible direct access to db for some applications
      • Monitoring system
      • Nagios
    • Interface to configure switches and interoperate with installation system.
      • Vlan tags
      • dns
      • dhcp
  • 20. Installation issues
    • Centralized installation system
      • LCFG (EDG WP4)
      • Integration with a central Tier1 db
      • Moving from a farm to another implies just changes in IP address (not name)
      • Unique dhcp server for all VLANs
      • Support for DDNS (cr.cnaf.infn.it)
    • Investigating Quattor for future needs
  • 21. Our Desired Solution for Resource Access
    • SHARED RESOURCES among all experiments
      • Priorities and reservations managed by the scheduler
    • Most of Tier1 computing machines installed as LCG Worker Nodes, with light modifications to support more VOs
    • Application Software not directly installed on WNs but accessed from outside (NFS, AFS, …)
    • One or more Resource Manager to manage all the WNs in a centralized way
    • Standard way to access Storage for each application