SlideShare a Scribd company logo
1 of 34
Download to read offline
The CMS Online Cluster: 

Setup, Operation and Maintenance 

of an Evolving Cluster"
J.A. Coarasa "
CERN, Geneva, Switzerland"
for the CMS TriDAS group."
"
ISGC 2012, 26 February - 2 March 2012,
Academia Sinica, Taipei, Taiwan
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Outline"
•  Introduction"
•  On/Off-line Computing Model"
•  The Compact Muon Solenoid (CMS) Data Acquisition
system (DAQ)"
•  The CMS Online Cluster"
– IT Infrastructure"
•  Computing"
•  Networking"
•  Services"
– Operation"
J.A. Coarasa 2
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Introduction"
– The CMS Data Acquisition system (DAQ)"
J.A. Coarasa 3
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Rate on tape 100 Hz!
On/Off-line computing"
~3µs"
~ s "
Level-1. Massive parallel processing !
Particle identification (High pT, e, µ, jets, missing ET)"
Hardwired custom systems (ASIC, FPGA). Synchronous clock driven!
!
High Level Triggers !
Physics process identification !
Clusters of PCs. Asynchronous event driven!
First Level 100 kHz!
Readout: Data to Surface and Event Builder!
2 Tb/s optical data links and 2 Tb/s switch networks!
!
Distributed computing GRID (Tiers 0-4)!
Analysis, production and archive!
Collisions rate ~1 GHz!
High Level Triggers!
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
The CMS Data Acquisition system (DAQ). 

The challenge"
Large Data Volumes (~100 Gbytes/s data flow, 20TB/day)!
–  After Level 1 Trigger ~100 Gbytes/s (rate ~O(100) kHz) reach the
event building (2 stages, ~1600 computers).!
–  HLT filter cluster select 1 out 1000. Max. rate to tape: ~O(100) Hz "
⇒  The storage manager (stores and forwards) can sustain a
2GB/s traffic.!
⇒  Up to ~300 Mbytes/s sustained forwarded to the CERN T0.
(>20TB/day).!
Detector Front-end
Computing Services	

Readout	

Systems	

Builder and Filter	

Systems	

Event	

	

Manager	

 Builder Networks	

Level 1	

Trigger
Run	

Control	

40 MHz
100 kHz
100 Hz
J.A. Coarasa 5
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
The CMS Online Cluster"
– IT Infrastructure"
J.A. Coarasa 6
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
IT Infrastructure. The purpose and qualities"
"The IT infrastructure (computing, networking and services) of
the CMS Online Cluster takes care of the CMS data acquisition
and experiment control. "
•  Autonomous (i.e. independent from all other networks) and
provides uninterrupted operation 24/7 on two far apart (~200 m)
physical locations, with two control rooms;"
•  Redundant services design!
–  Losing 1 min of data is wasting accelerator time (worth ~O(1000)CHF/min)."
–  Our system serves online data taking! If not taken, data is gone!"
•  Scalable services design to accommodate expansions;"
•  Fast configuration turnaround copes with the evolving nature
of DAQ applications and large scale of cluster;"
•  Serving the needs of a community of more than 900 Users."
J.A. Coarasa 7
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
More than 2700 computers mostly under Scientific Linux CERN 5:"
–  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3
independent 1 Gbit Ethernet lines for data networking. (1280 cores);"
–  1008 (720 (8-core) + 288 (12-core allowing HT)) as high level trigger
computers with 2 Gbit Ethernet lines for data networking. (9216 cores);"
–  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet
lines for data networking and 2 additional ones for networking to Tier 0;"
Computing. Variety of roles"
J.A. Coarasa 8	

Highbandwidth
networking
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
More than 2700 computers mostly under Scientific Linux CERN 5:"
–  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3
independent 1 Gbit Ethernet lines for data networking. (1280 cores);"
–  1008 (720 (8-core) + 288 (12-core allowing HT)) as high level trigger
computers with 2 Gbit Ethernet lines for data networking. (9216 cores);"
–  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet
lines for data networking and 2 additional ones for networking to Tier 0;"
–  More than 400 used by the subdetectors;"
–  90 running Windows for Detector Control Systems;"
–  12 computers as an ORACLE RAC;"
–  12 computers as CMS control computers;"
–  50 computers as desktop computers in the control rooms;"
–  200 computers for commissioning, integration and testing;"
–  15 computers as infrastructure and access servers;"
–  250 active spare computers;"
!⇒ Many different Roles!
Computing. Variety of roles"
J.A. Coarasa 9	

Highbandwidth
networking
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Networking.

Overall Picture"
CMS Networks:"
–  Private Networks:"
•  Service Network "
"(~3000 1 Gbit ports);"
•  Data Network "
"(~4000 1Gbit ports)"
–  Source routing on computers"
–  VLANs on switches "
•  Central Data Recording "
"(CDR). Network to Tier 0."
•  Private networks for Oracle
RAC"
•  Private networks for
subdetectors"
–  Public CERN Network"
"J.A. Coarasa 10	

CMS Networks
CMSSites
Computer gateways
Readout, HLTControl…
Firewall
Internet
Service Network"
Data Networks
CDR Network
CERN Network
Storage Manager
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
The Network Attached Storage"
•  Our important data is hosted in a
Clustered Network Attached Storage:"
–  User home directories;"
–  CMS data: calibration, data quality
monitoring…;"
–  Admin data."
•  2 NetApp filer heads in failover
configuration (in two racks);"
•  With 6 storage drawers (2 of them in
mirror) with internal Dual Parity RAID;"
•  Snapshot feature active (saves us from
going to Backup)."
•  Deduplication active (saves ~25-55%)"
•  Tested throughput > 380 MBytes/s."
J.A. Coarasa 11	

	
  
	
  
	
  
	
  
	
  
	
  
4Gbit	
   2x10Gbit	
  
redundant	
  
routers
shelves
heads
NAS
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
IT Structural Services. Redundancy and Load
balancing. The Concept"
•  Pattern followed:"
–  1 master + N slave/replicas "
"(now N=3 for most of the services) "
"hosted in different racks;"
"⇒Easy scalability."
"⇒Needs replication for all services."
–  Services working under DNS alias "
"where possible."
"⇒Allows to move the service."
"⇒No service outage."
–  Load balancing of primary server "
"for client:"
•  DNS Round Robin;"
•  explicit client configuration "
"segregating in groups of computers."
J.A. Coarasa 12	

1 master
N slave/replicas
Explicit client configuration
segregating in groups of computers
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
IT Structural Services. Replication and Load
balancing. Basic Services."
•  Critical data local to the servers (hosted in RAID 1 protected
servers);"
•  Manual load balancing through in groups segregation."
J.A. Coarasa 13	

Service! replication through! Load Balancing through!
DNS! named" Round Robin"
DHCP! in-house scripts" No. First who answers"
Kerberos! in-house scripts" explicit segregation"
LDAP! slurpd" explicit segregation"
NTP! -" explicit segregation"
syslog! -" No. Single server, used for
stress test purposes"
Nagios monitoring! -" explicit segregation"
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
CMS Management and Configuration
Infrastructure: The tools"
•  IPMI (Intelligent Platform Management Interface) is used
to manage the computers remotely:"
–  reboot, console access,…;"
•  PXE and anaconda kickstart through http are
used as bootstrap installation method;"
•  Quattor (QUattor is an Administration ToolkiT for Optimizing
Resources) is used as the configuration
management system;"
"⇒All Linux computers configured through it or rpms
distributed with it (even the Quattor servers
themselves): BIOS, all Networking parameters…"
J.A. Coarasa 14
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
CMS Management and Configuration
Infrastructure: Quattor implementation"
•  Based on Quattor 1.3 (cdb 2.0.4, swrep 2.1.38, PANC 6.0.8)"
•  Manages ~2500 installed computers in ~90 types!
⇒One fully reinstalled computer in 9-30 min"
⇒Big (~Gbyte) change cluster-wise in less than 25 min;"
"Small change only few minutes."
•  Uses in-house:!
–  restricted format in templates: hierarchical +other conventions;"
–  areas to define subdetector software and versioning in them;"
⇒Allowed in-house easy developments:"
•  Template summarizer/ inventory maker "
•  Dropbox for rpms"
•  Template updater"
"
"J.A. Coarasa 15
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Monitoring infrastructure"
•  We monitor 2500 hosts, 25000 services"
–  The usual things (ping, ssh, disk usage…)"
–  In house:"
•  Electronics modules working?"
•  Myrinet links working?"
•  Quattor spma finished properly?"
•  …"
•  Based on Nagios. 1+3 servers with manually split conf."
•  Performance:"
–  Every 1 minute latency for the ping test;"
–  Every more than 5 minutes for the rest of the services."
⇒Needed improving. About to move to production the new version
of monitoring."
J.A. Coarasa 16
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
New Monitoring infrastructure"
•  Addresses issues:"
–  Easy configuration"
–  Scalability"
•  Based on icinga (1 server + N workers)"
–  mod_gearman: allows to split functionality (Master/Worker)."
–  check_multi: groups checks, can cascade."
–  PNP4nagios: provides performance data."
–  rrdcached: caches files and improves performance."
•  We monitor more services grouped in less checks
and get historical performance data"
•  Performance:"
–  Every 1 minute latency including performance data."
•  Adding system to notify on a fine grain self-enroll
basis"
J.A. Coarasa 17
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Databases"
•  Save all bookkeeping data:"
–  configuration of the detectors;"
–  conditions for the experimental data
taken…"
•  Oracle 11.2.0.3 (just migrated, still on compatibility
mode to 10.2.0.5) "
–  hosting two databases @ CMS
totalling ~30 Tbytes (more, including
replicated/synced ones @ CERN);"
–  hardware:"
•  6 blades + standby nodes;"
•  netApp filer as a backend for storage;"
•  Completely redundant at hardware level."
J.A. Coarasa 18
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Hardware Tracking system"
Our Hardware Tracking system (HTS) allows us
to know everything about the systems (soft/
hardware)"
It is based on:"
•  OCS (www.ocsinventory-ng.org)"
–  Inventory system"
–  Counts everything related to a machine"
•  GLPI (www.glpi-project.org)"
–  Tracking system"
–  Works in relation with OCS"
–  Set of tools to maintain machines history"
J.A. Coarasa 19
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
The CMS Online Cluster"
– Operation"
"
J.A. Coarasa 20
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Constraints and implications"
•  Man power (~6 FTE):"
"⇒Needs effort on implementing scripts and automation:"
•  Computers will monitor the temperature and shutdown when too high;"
•  Scripts to Check and fix serviceʼs problems."
⇒Allows faster recovery after unexpected power cut"
"⇒Act only as a second level support in 24/7 operation."
"⇒Try to interact with users through a ticketing system (savannah)"
•  Cluster 5 years old and still growing, adding resources as needed"
–  Some hardware out of warranty"
–  Some hardware with extended warranty"
•  Computers connected to the electronics need swift replacement (1h)
if broken. "
"⇒Need for standby spares in location."
"⇒Fast turnaround in reinstallation or reconfiguration. "
•  Other computers, dealt with fault tolerant software and
reconfiguration. Enough to change it the next day."
""J.A. Coarasa 21
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Operation

Emphasis on continue running"
•  Learning from previous failures"
•  Preventing service failures before they happen"
–  Redundancy where possible"
•  Added full redundancy to the new database"
•  Added redundancy to the control system (DCS)"
•  Virtualization explored to add redundancy to other services"
–  Preventive campaigns carried out"
•  Battery replacement on old computers"
•  aging SAS controller (still under warranty) replacement (THS priceless)"
•  Clean up/reinstallation of the whole cluster beginning of 2011"
•  Clean up/selective beginning of 2012"
•  Identify the failures before they affect the running:"
–  Carried out tests on the redundancy of the services"
–  Improved the monitoring system"
–  Scripts reporting problems before they affect (SAS controller)"
•  Testing before deployment"
–  2 Testing and integration areas and 2 more being built"
"J.A. Coarasa 22
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Operation

Large Scale Effect on Failures"
Failure Happens! For the last year of operation (2011):"
•  Unscheduled power cuts: ~1 every 2 months"
–  By far the worst to recover from. Affects also the detectors
electronics."
•  Unscheduled cooling cuts: ~1 every few months"
•  Network switch failures: ~ 1 every 3 months"
–  Database Failure"
–  Failure of redundancy of kerberos and ldap (in the way of
being solved)"
–  Affected the storage manager (still running under reduced
bandwidth"
•  Computer failures: ~3 every week"
–  SAS aging problem responsible for most of them till exchange"
"J.A. Coarasa 23
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Summary"
•  A scalable cluster, 2700 computers
growing to 3000 in the next months"
•  autonomous,"
•  with service redundancy, "
•  capable of more than 100GB/s data
throughputs, "
•  has been working successfully since
2007.!
J.A. Coarasa 24
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
"
"
"
"
"
"
"
"
Thank you. Questions?"
J.A. Coarasa 25
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Backup slides"
J.A. Coarasa 26
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
CMS Networks Configuration"
•  Service Network (.cms)"
–  Service IPs "
•  DHCP (fixed for central servers)"
–  IPMI IPs"
•  Fixed at boot time from the DNS name"
(in house development that fixes IPMI and BIOS
parameters)"
•  Data and Central Data Recording Networks"
–  IPs and routing rules"
•  Fixed at boot time from DNS (A and TXT records)"
(in house development: auto_ifconfig)"
J.A. Coarasa 27
x
CMS networks"
J.A. Coarasa 28	

x 80
x 8
Readout
Unit
Myrinet
Switches
Frontend
Detector
Readout
x 8
Force 10
2x1Gbit
3x1Gbit
x (90+36)
x 8
Builder
Filter
Unit
x 16
Storage
Manager
4x1Gbit4x3Gbit
HP switch
2x10Gbit
Tier 0
Force 10
redundant
HP switch
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Installation/Configuration Services. 

Replication and Load balancing"
•  Data being read from the NAS once and kept in
memory cache from computers."
"⇒The NAS is not loaded even at peak installation time
when the installation servers network is saturated."
J.A. Coarasa 29	

Service! replication through! Load Balancing through!
PXE boot/TFTP !
No replication. NFS
mount from the NAS."
Through DHCP"
Kickstart Installation
server!
DNS Round Robin"
yum repository
Installation server!
DNS Round Robin"
Quattor repository
Installation server!
DNS Round Robin"
Quattor cdb
configuration server!
No. "
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
CMS IT Security"
•  We are behind the CERN firewall, and services are not
reachable from the internet, unless explicitly allowed."
•  The CMS Networks are private networks: not even
reachable from the CERN network."
•  There is a Reverse Proxy, that requires authentication,
to access some internal information (selected http
traffic)."
•  All users have to log in into gateway computers
submitted to tighter security (ssh traffic to some
computers). "
•  Traffic from internals computers to the outside world
goes through proxying servers."
J.A. Coarasa 30
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
The Compact Muon Solenoid experiment
(CMS). Design Parameters."
J.A. Coarasa 31	

Detector Channels Ev. Data
Pixel 60000000 50 (kB)
Tracker 10000000 650
Preshower 145000 50
ECAL 85000 100
HCAL 14000 50
Muon DT 200000 10
Muon RPC 200000 5
Muon CSC 400000 90
⇒  >50 000 000 Channels,
~630 data origins
⇒  Average of 1 Mbyte/event
(bigger for HI ~20 Mbytes/event)
Detectors
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
pp collisions in CMS"
FM – TIPP2011 32
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Pb-Pb collisions in CMS"
FM – TIPP2011 33
ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster
Networking. 

Service Network Redundancy"
•  Consists of networking in two
physical locations:"
–  Each switch in each rack is
connected to 2 routers;"
–  The 2 routers are connected in
a redundant failover
configuration with 10 Gbit lines
to the routers in the other
location."
J.A. Coarasa 34	

2x10Gbit
redundant
1Gbit 1Gbit
1Gbit 1Gbit
SCX5
USC
NAS

More Related Content

What's hot

Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Fuad Arshad
 
11g r2 rac grid clusterware doug presentation 10 21-10
11g r2 rac grid clusterware doug presentation 10 21-1011g r2 rac grid clusterware doug presentation 10 21-10
11g r2 rac grid clusterware doug presentation 10 21-10admdbarac
 
Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard Fuad Arshad
 
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'OpenStack Korea Community
 
Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7Fuad Arshad
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStackOpenStack
 
Exadata x3 workshop
Exadata x3 workshopExadata x3 workshop
Exadata x3 workshopFran Navarro
 
MIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via DataguardMIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via DataguardFuad Arshad
 
Severalnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines
 
SQL Server Clustering and High Availability
SQL Server Clustering and High AvailabilitySQL Server Clustering and High Availability
SQL Server Clustering and High Availability► Supreme Mandal ◄
 
Nové vlastnosti Oracle Database Appliance
Nové vlastnosti Oracle Database ApplianceNové vlastnosti Oracle Database Appliance
Nové vlastnosti Oracle Database ApplianceMarketingArrowECS_CZ
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
Zero Data Loss Recovery Appliance - Deep Dive
Zero Data Loss Recovery Appliance - Deep DiveZero Data Loss Recovery Appliance - Deep Dive
Zero Data Loss Recovery Appliance - Deep DiveDaniele Massimi
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesBernd Ocklin
 
Severalnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part IISeveralnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part IISeveralnines
 
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...Zahid Anwar (OCM)
 

What's hot (20)

Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached
 
11g r2 rac grid clusterware doug presentation 10 21-10
11g r2 rac grid clusterware doug presentation 10 21-1011g r2 rac grid clusterware doug presentation 10 21-10
11g r2 rac grid clusterware doug presentation 10 21-10
 
Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard Migrating from Single Instance to RAC Data guard
Migrating from Single Instance to RAC Data guard
 
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
[OpenStack Day in Korea 2015] Track 2-3 - 오픈스택 클라우드에 최적화된 네트워크 가상화 '누아지(Nuage)'
 
Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7Oda as an enterprise solution at walgreens oow 2012 v7
Oda as an enterprise solution at walgreens oow 2012 v7
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
Simplifying the Move to OpenStack
Simplifying the Move to OpenStackSimplifying the Move to OpenStack
Simplifying the Move to OpenStack
 
Exadata x3 workshop
Exadata x3 workshopExadata x3 workshop
Exadata x3 workshop
 
MIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via DataguardMIgrating from Single Instance to RAC via Dataguard
MIgrating from Single Instance to RAC via Dataguard
 
Severalnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VISeveralnines Self-Training: MySQL® Cluster - Part VI
Severalnines Self-Training: MySQL® Cluster - Part VI
 
SQL Server Clustering and High Availability
SQL Server Clustering and High AvailabilitySQL Server Clustering and High Availability
SQL Server Clustering and High Availability
 
Nové vlastnosti Oracle Database Appliance
Nové vlastnosti Oracle Database ApplianceNové vlastnosti Oracle Database Appliance
Nové vlastnosti Oracle Database Appliance
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Zero Data Loss Recovery Appliance - Deep Dive
Zero Data Loss Recovery Appliance - Deep DiveZero Data Loss Recovery Appliance - Deep Dive
Zero Data Loss Recovery Appliance - Deep Dive
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Severalnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part IISeveralnines Self-Training: MySQL® Cluster - Part II
Severalnines Self-Training: MySQL® Cluster - Part II
 
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...
 

Similar to The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving Cluster

The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)Jose Antonio Coarasa Perez
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SAMeh Zaghloul
 
Arista reinventing data center switching
Arista   reinventing data center switchingArista   reinventing data center switching
Arista reinventing data center switchingVLCM2015
 
Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...
Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...
Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...Netgear Italia
 
DataCluster
DataClusterDataCluster
DataClustergystell
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
CloudStack challenges for China customers
CloudStack challenges for China customersCloudStack challenges for China customers
CloudStack challenges for China customersgavin_lee
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
Sun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWHSun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWHMark Rabne
 
Cloud Networking is not Virtual Networking - London VMUG 20130425
Cloud Networking is not Virtual Networking - London VMUG 20130425Cloud Networking is not Virtual Networking - London VMUG 20130425
Cloud Networking is not Virtual Networking - London VMUG 20130425Greg Ferro
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013Taldor Group
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Crate.io
 
Технологии Intel® для облачных решений
Технологии Intel® для облачных решенийТехнологии Intel® для облачных решений
Технологии Intel® для облачных решенийFujitsu Russia
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstackJames Beal
 

Similar to The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving Cluster (20)

The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 
Arista reinventing data center switching
Arista   reinventing data center switchingArista   reinventing data center switching
Arista reinventing data center switching
 
Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...
Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...
Webinar NETGEAR - Switch ProSAFE per il disegno di rete nei livelli di core, ...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
DataCluster
DataClusterDataCluster
DataCluster
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
CloudStack challenges for China customers
CloudStack challenges for China customersCloudStack challenges for China customers
CloudStack challenges for China customers
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Sun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWHSun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWH
 
Cloud Networking is not Virtual Networking - London VMUG 20130425
Cloud Networking is not Virtual Networking - London VMUG 20130425Cloud Networking is not Virtual Networking - London VMUG 20130425
Cloud Networking is not Virtual Networking - London VMUG 20130425
 
Big Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS ExampleBig Data Management at CERN: The CMS Example
Big Data Management at CERN: The CMS Example
 
Cat os
Cat osCat os
Cat os
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
10 sdn-vir-6up
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6up
 
Технологии Intel® для облачных решений
Технологии Intel® для облачных решенийТехнологии Intel® для облачных решений
Технологии Intel® для облачных решений
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving Cluster

  • 1. The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving Cluster" J.A. Coarasa " CERN, Geneva, Switzerland" for the CMS TriDAS group." " ISGC 2012, 26 February - 2 March 2012, Academia Sinica, Taipei, Taiwan
  • 2. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Outline" •  Introduction" •  On/Off-line Computing Model" •  The Compact Muon Solenoid (CMS) Data Acquisition system (DAQ)" •  The CMS Online Cluster" – IT Infrastructure" •  Computing" •  Networking" •  Services" – Operation" J.A. Coarasa 2
  • 3. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Introduction" – The CMS Data Acquisition system (DAQ)" J.A. Coarasa 3
  • 4. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Rate on tape 100 Hz! On/Off-line computing" ~3µs" ~ s " Level-1. Massive parallel processing ! Particle identification (High pT, e, µ, jets, missing ET)" Hardwired custom systems (ASIC, FPGA). Synchronous clock driven! ! High Level Triggers ! Physics process identification ! Clusters of PCs. Asynchronous event driven! First Level 100 kHz! Readout: Data to Surface and Event Builder! 2 Tb/s optical data links and 2 Tb/s switch networks! ! Distributed computing GRID (Tiers 0-4)! Analysis, production and archive! Collisions rate ~1 GHz! High Level Triggers!
  • 5. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster The CMS Data Acquisition system (DAQ). 
 The challenge" Large Data Volumes (~100 Gbytes/s data flow, 20TB/day)! –  After Level 1 Trigger ~100 Gbytes/s (rate ~O(100) kHz) reach the event building (2 stages, ~1600 computers).! –  HLT filter cluster select 1 out 1000. Max. rate to tape: ~O(100) Hz " ⇒  The storage manager (stores and forwards) can sustain a 2GB/s traffic.! ⇒  Up to ~300 Mbytes/s sustained forwarded to the CERN T0. (>20TB/day).! Detector Front-end Computing Services Readout Systems Builder and Filter Systems Event Manager Builder Networks Level 1 Trigger Run Control 40 MHz 100 kHz 100 Hz J.A. Coarasa 5
  • 6. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster The CMS Online Cluster" – IT Infrastructure" J.A. Coarasa 6
  • 7. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster IT Infrastructure. The purpose and qualities" "The IT infrastructure (computing, networking and services) of the CMS Online Cluster takes care of the CMS data acquisition and experiment control. " •  Autonomous (i.e. independent from all other networks) and provides uninterrupted operation 24/7 on two far apart (~200 m) physical locations, with two control rooms;" •  Redundant services design! –  Losing 1 min of data is wasting accelerator time (worth ~O(1000)CHF/min)." –  Our system serves online data taking! If not taken, data is gone!" •  Scalable services design to accommodate expansions;" •  Fast configuration turnaround copes with the evolving nature of DAQ applications and large scale of cluster;" •  Serving the needs of a community of more than 900 Users." J.A. Coarasa 7
  • 8. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster More than 2700 computers mostly under Scientific Linux CERN 5:" –  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3 independent 1 Gbit Ethernet lines for data networking. (1280 cores);" –  1008 (720 (8-core) + 288 (12-core allowing HT)) as high level trigger computers with 2 Gbit Ethernet lines for data networking. (9216 cores);" –  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet lines for data networking and 2 additional ones for networking to Tier 0;" Computing. Variety of roles" J.A. Coarasa 8 Highbandwidth networking
  • 9. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster More than 2700 computers mostly under Scientific Linux CERN 5:" –  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3 independent 1 Gbit Ethernet lines for data networking. (1280 cores);" –  1008 (720 (8-core) + 288 (12-core allowing HT)) as high level trigger computers with 2 Gbit Ethernet lines for data networking. (9216 cores);" –  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet lines for data networking and 2 additional ones for networking to Tier 0;" –  More than 400 used by the subdetectors;" –  90 running Windows for Detector Control Systems;" –  12 computers as an ORACLE RAC;" –  12 computers as CMS control computers;" –  50 computers as desktop computers in the control rooms;" –  200 computers for commissioning, integration and testing;" –  15 computers as infrastructure and access servers;" –  250 active spare computers;" !⇒ Many different Roles! Computing. Variety of roles" J.A. Coarasa 9 Highbandwidth networking
  • 10. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Networking.
 Overall Picture" CMS Networks:" –  Private Networks:" •  Service Network " "(~3000 1 Gbit ports);" •  Data Network " "(~4000 1Gbit ports)" –  Source routing on computers" –  VLANs on switches " •  Central Data Recording " "(CDR). Network to Tier 0." •  Private networks for Oracle RAC" •  Private networks for subdetectors" –  Public CERN Network" "J.A. Coarasa 10 CMS Networks CMSSites Computer gateways Readout, HLTControl… Firewall Internet Service Network" Data Networks CDR Network CERN Network Storage Manager
  • 11. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster The Network Attached Storage" •  Our important data is hosted in a Clustered Network Attached Storage:" –  User home directories;" –  CMS data: calibration, data quality monitoring…;" –  Admin data." •  2 NetApp filer heads in failover configuration (in two racks);" •  With 6 storage drawers (2 of them in mirror) with internal Dual Parity RAID;" •  Snapshot feature active (saves us from going to Backup)." •  Deduplication active (saves ~25-55%)" •  Tested throughput > 380 MBytes/s." J.A. Coarasa 11             4Gbit   2x10Gbit   redundant   routers shelves heads NAS
  • 12. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster IT Structural Services. Redundancy and Load balancing. The Concept" •  Pattern followed:" –  1 master + N slave/replicas " "(now N=3 for most of the services) " "hosted in different racks;" "⇒Easy scalability." "⇒Needs replication for all services." –  Services working under DNS alias " "where possible." "⇒Allows to move the service." "⇒No service outage." –  Load balancing of primary server " "for client:" •  DNS Round Robin;" •  explicit client configuration " "segregating in groups of computers." J.A. Coarasa 12 1 master N slave/replicas Explicit client configuration segregating in groups of computers
  • 13. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster IT Structural Services. Replication and Load balancing. Basic Services." •  Critical data local to the servers (hosted in RAID 1 protected servers);" •  Manual load balancing through in groups segregation." J.A. Coarasa 13 Service! replication through! Load Balancing through! DNS! named" Round Robin" DHCP! in-house scripts" No. First who answers" Kerberos! in-house scripts" explicit segregation" LDAP! slurpd" explicit segregation" NTP! -" explicit segregation" syslog! -" No. Single server, used for stress test purposes" Nagios monitoring! -" explicit segregation"
  • 14. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster CMS Management and Configuration Infrastructure: The tools" •  IPMI (Intelligent Platform Management Interface) is used to manage the computers remotely:" –  reboot, console access,…;" •  PXE and anaconda kickstart through http are used as bootstrap installation method;" •  Quattor (QUattor is an Administration ToolkiT for Optimizing Resources) is used as the configuration management system;" "⇒All Linux computers configured through it or rpms distributed with it (even the Quattor servers themselves): BIOS, all Networking parameters…" J.A. Coarasa 14
  • 15. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster CMS Management and Configuration Infrastructure: Quattor implementation" •  Based on Quattor 1.3 (cdb 2.0.4, swrep 2.1.38, PANC 6.0.8)" •  Manages ~2500 installed computers in ~90 types! ⇒One fully reinstalled computer in 9-30 min" ⇒Big (~Gbyte) change cluster-wise in less than 25 min;" "Small change only few minutes." •  Uses in-house:! –  restricted format in templates: hierarchical +other conventions;" –  areas to define subdetector software and versioning in them;" ⇒Allowed in-house easy developments:" •  Template summarizer/ inventory maker " •  Dropbox for rpms" •  Template updater" " "J.A. Coarasa 15
  • 16. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Monitoring infrastructure" •  We monitor 2500 hosts, 25000 services" –  The usual things (ping, ssh, disk usage…)" –  In house:" •  Electronics modules working?" •  Myrinet links working?" •  Quattor spma finished properly?" •  …" •  Based on Nagios. 1+3 servers with manually split conf." •  Performance:" –  Every 1 minute latency for the ping test;" –  Every more than 5 minutes for the rest of the services." ⇒Needed improving. About to move to production the new version of monitoring." J.A. Coarasa 16
  • 17. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster New Monitoring infrastructure" •  Addresses issues:" –  Easy configuration" –  Scalability" •  Based on icinga (1 server + N workers)" –  mod_gearman: allows to split functionality (Master/Worker)." –  check_multi: groups checks, can cascade." –  PNP4nagios: provides performance data." –  rrdcached: caches files and improves performance." •  We monitor more services grouped in less checks and get historical performance data" •  Performance:" –  Every 1 minute latency including performance data." •  Adding system to notify on a fine grain self-enroll basis" J.A. Coarasa 17
  • 18. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Databases" •  Save all bookkeeping data:" –  configuration of the detectors;" –  conditions for the experimental data taken…" •  Oracle 11.2.0.3 (just migrated, still on compatibility mode to 10.2.0.5) " –  hosting two databases @ CMS totalling ~30 Tbytes (more, including replicated/synced ones @ CERN);" –  hardware:" •  6 blades + standby nodes;" •  netApp filer as a backend for storage;" •  Completely redundant at hardware level." J.A. Coarasa 18
  • 19. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Hardware Tracking system" Our Hardware Tracking system (HTS) allows us to know everything about the systems (soft/ hardware)" It is based on:" •  OCS (www.ocsinventory-ng.org)" –  Inventory system" –  Counts everything related to a machine" •  GLPI (www.glpi-project.org)" –  Tracking system" –  Works in relation with OCS" –  Set of tools to maintain machines history" J.A. Coarasa 19
  • 20. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster The CMS Online Cluster" – Operation" " J.A. Coarasa 20
  • 21. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Constraints and implications" •  Man power (~6 FTE):" "⇒Needs effort on implementing scripts and automation:" •  Computers will monitor the temperature and shutdown when too high;" •  Scripts to Check and fix serviceʼs problems." ⇒Allows faster recovery after unexpected power cut" "⇒Act only as a second level support in 24/7 operation." "⇒Try to interact with users through a ticketing system (savannah)" •  Cluster 5 years old and still growing, adding resources as needed" –  Some hardware out of warranty" –  Some hardware with extended warranty" •  Computers connected to the electronics need swift replacement (1h) if broken. " "⇒Need for standby spares in location." "⇒Fast turnaround in reinstallation or reconfiguration. " •  Other computers, dealt with fault tolerant software and reconfiguration. Enough to change it the next day." ""J.A. Coarasa 21
  • 22. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Operation
 Emphasis on continue running" •  Learning from previous failures" •  Preventing service failures before they happen" –  Redundancy where possible" •  Added full redundancy to the new database" •  Added redundancy to the control system (DCS)" •  Virtualization explored to add redundancy to other services" –  Preventive campaigns carried out" •  Battery replacement on old computers" •  aging SAS controller (still under warranty) replacement (THS priceless)" •  Clean up/reinstallation of the whole cluster beginning of 2011" •  Clean up/selective beginning of 2012" •  Identify the failures before they affect the running:" –  Carried out tests on the redundancy of the services" –  Improved the monitoring system" –  Scripts reporting problems before they affect (SAS controller)" •  Testing before deployment" –  2 Testing and integration areas and 2 more being built" "J.A. Coarasa 22
  • 23. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Operation
 Large Scale Effect on Failures" Failure Happens! For the last year of operation (2011):" •  Unscheduled power cuts: ~1 every 2 months" –  By far the worst to recover from. Affects also the detectors electronics." •  Unscheduled cooling cuts: ~1 every few months" •  Network switch failures: ~ 1 every 3 months" –  Database Failure" –  Failure of redundancy of kerberos and ldap (in the way of being solved)" –  Affected the storage manager (still running under reduced bandwidth" •  Computer failures: ~3 every week" –  SAS aging problem responsible for most of them till exchange" "J.A. Coarasa 23
  • 24. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Summary" •  A scalable cluster, 2700 computers growing to 3000 in the next months" •  autonomous," •  with service redundancy, " •  capable of more than 100GB/s data throughputs, " •  has been working successfully since 2007.! J.A. Coarasa 24
  • 25. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster " " " " " " " " Thank you. Questions?" J.A. Coarasa 25
  • 26. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Backup slides" J.A. Coarasa 26
  • 27. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster CMS Networks Configuration" •  Service Network (.cms)" –  Service IPs " •  DHCP (fixed for central servers)" –  IPMI IPs" •  Fixed at boot time from the DNS name" (in house development that fixes IPMI and BIOS parameters)" •  Data and Central Data Recording Networks" –  IPs and routing rules" •  Fixed at boot time from DNS (A and TXT records)" (in house development: auto_ifconfig)" J.A. Coarasa 27
  • 28. x CMS networks" J.A. Coarasa 28 x 80 x 8 Readout Unit Myrinet Switches Frontend Detector Readout x 8 Force 10 2x1Gbit 3x1Gbit x (90+36) x 8 Builder Filter Unit x 16 Storage Manager 4x1Gbit4x3Gbit HP switch 2x10Gbit Tier 0 Force 10 redundant HP switch
  • 29. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Installation/Configuration Services. 
 Replication and Load balancing" •  Data being read from the NAS once and kept in memory cache from computers." "⇒The NAS is not loaded even at peak installation time when the installation servers network is saturated." J.A. Coarasa 29 Service! replication through! Load Balancing through! PXE boot/TFTP ! No replication. NFS mount from the NAS." Through DHCP" Kickstart Installation server! DNS Round Robin" yum repository Installation server! DNS Round Robin" Quattor repository Installation server! DNS Round Robin" Quattor cdb configuration server! No. "
  • 30. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster CMS IT Security" •  We are behind the CERN firewall, and services are not reachable from the internet, unless explicitly allowed." •  The CMS Networks are private networks: not even reachable from the CERN network." •  There is a Reverse Proxy, that requires authentication, to access some internal information (selected http traffic)." •  All users have to log in into gateway computers submitted to tighter security (ssh traffic to some computers). " •  Traffic from internals computers to the outside world goes through proxying servers." J.A. Coarasa 30
  • 31. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster The Compact Muon Solenoid experiment (CMS). Design Parameters." J.A. Coarasa 31 Detector Channels Ev. Data Pixel 60000000 50 (kB) Tracker 10000000 650 Preshower 145000 50 ECAL 85000 100 HCAL 14000 50 Muon DT 200000 10 Muon RPC 200000 5 Muon CSC 400000 90 ⇒  >50 000 000 Channels, ~630 data origins ⇒  Average of 1 Mbyte/event (bigger for HI ~20 Mbytes/event) Detectors
  • 32. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster pp collisions in CMS" FM – TIPP2011 32
  • 33. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Pb-Pb collisions in CMS" FM – TIPP2011 33
  • 34. ISGC 2012, 26/2-2/3 2012, Academia Sinica, Taipei, TaiwanThe CMS Online Cluster Networking. 
 Service Network Redundancy" •  Consists of networking in two physical locations:" –  Each switch in each rack is connected to 2 routers;" –  The 2 routers are connected in a redundant failover configuration with 10 Gbit lines to the routers in the other location." J.A. Coarasa 34 2x10Gbit redundant 1Gbit 1Gbit 1Gbit 1Gbit SCX5 USC NAS