Tim Bell
tim.bell@cern.ch
@noggin143
19/09/2013 CERN Infrastructure Evolution 2
19/09/2013 CERN Infrastructure Evolution 3
CERN was founded 1954:CERN was founded 1954: 12 European States12 European Stat...
What are the Origins of Mass ?
19/09/2013 CERN Infrastructure Evolution 4
Matter/Anti Matter Symmetric?
19/09/2013 CERN Infrastructure Evolution 5
Where is 95% of the Universe?
19/09/2013 CERN Infrastructure Evolution 6
19/09/2013 CERN Infrastructure Evolution 7
19/09/2013 CERN Infrastructure Evolution 8
19/09/2013 CERN Infrastructure Evolution 9
Collisions
19/09/2013 CERN Infrastructure Evolution 10
A Big Data Challenge
19/09/2013 CERN Infrastructure Evolution 11
The CERN Meyrin Data Centre
19/09/2013 CERN Infrastructure Evolution 12
New Data Centre in Budapest
19/09/2013 CERN Infrastructure Evolution 13
Good News, Bad News
19/09/2013 CERN Infrastructure Evolution 14
We’re Not Special
19/09/2013 CERN Infrastructure Evolution 15
19/09/2013
Bamboo
Koji, Mock
AIMS/PXE
Foreman
Yum repo
Pulp
Puppet-DB
mcollective, yum
JIRA
Lemon /
Hadoop /
LogStash /
Ki...
The Agile Experience
19/09/2013 CERN Infrastructure Evolution 17
Cultural Barriers
19/09/2013 CERN Infrastructure Evolution 18
Status
• Toolchain implemented in 18 months with
enhancements and bug fixes submitted back to
the community
• Now in produ...
Summary
• Constraints on resources have led to major
technology transformations at CERN
• Open source community participat...
Questions ?
19/09/2013 CERN Infrastructure Evolution 21
19/09/2013 CERN Infrastructure Evolution 22
Service Models
19/09/2013 CERN Infrastructure Evolution 23
• Pets are given names like pussinboots.cern.ch
• They are uniq...
19/09/2013 CERN Infrastructure Evolution 24
19/09/2013 CERN Infrastructure Evolution 25
http://www.eucalyptus.com/blog/2013/04/02/cy13-q1-community-analysis-%E2%80%94...
19/09/2013 CERN Infrastructure Evolution 26
19/09/2013 CERN Infrastructure Evolution 27
Tier-1 (11 centres):
•Permanent storage
•Re-processing
•Analysis
Tier-0 (CERN)...
19/09/2013 CERN Infrastructure Evolution 28
19/09/2013 CERN Infrastructure Evolution 29
Microsoft Active
Directory
CERN DB
on Demand
CERN Network
Database
Account mgm...
Training for Newcomers
19/09/2013 CERN Infrastructure Evolution 30
Buy the book rather than guru mentoring
CERN clouds and culture at GigaOm London 2013
Upcoming SlideShare
Loading in …5
×

CERN clouds and culture at GigaOm London 2013

732 views

Published on

Published in: Spiritual, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
732
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Over 1,600 magnets lowered down shafts and cooled to -271 C to become superconducting. Two beam pipes, vacuum 10 times less than the moon
  • These collisions produce data, lots of it. Over 100PB currently 45,000 tapes… data rates of up to 35 PB/year currently and expected to significantly increase in the next run in 2015. The data must be kept at least 20 years so we’re expecting exabytes….
  • Recording and analysing the data takes a lot of computing power. The CERN computer centre was built in the 1970s for mainframes and crays. Now running at 3.5MW of power, it houses 11,000 servers but is at the limit of cooling and electrical power. It is also a tourist attraction with over 80,000 visitors last year! As you can see, racks are only partially empty in view of the limits on cooling.
  • We asked our 20 member states to make us an offer for server hosting using public procurement. 27 proposals and Wigner centre in Budapest, Hungary was chosen. This allows us to envisage sufficient computing and online storage for the run from 2015.
  • While it was great news to be allocated the budget for a new data centre, there was bad news associated with this. No additional budget for staff would be made available… we needed to find a way for the IT department to manage twice the number of servers with the same personnel The current toolset would not scale to the additional DC The tools needed significant maintenance effort, IPv6, new linux versions, … were using up valuable engineering resource Users were asking for faster response times to resource requests and more dynamic configurations
  • So, we looked around at how others were solving these problems and found we were not special. While CERN has a research culture, there is a need to understand that not all our services are pioneering. It is not always necessary to start from a blank sheet of paper but instead build on the work of others rather than lead. The world wide web invention at CERN reflected a need which was original but not all of our work is new. Companies such as Yahoo, Rackspace, Zynga, eBay, Paypal are facing scalability and management issues far beyond ours. Thus, we need to try to not innovate but to follow
  • We adopted a Google toolchain approach. The majority of home written software was replaced by open source projects. Commercial tools which were already working well such as JIRA and Active Directory were maintained. The approach was to select a tool, prototype, fail early and then refine requirements (following the we are not special approach) Key technologies were Puppet for configuration management and OpenStack for the private cloud.
  • So, we assembled a team made up of experienced service managers and new students. By freezing developments on legacy projects, we were able to make resources available but only as long as we could rapidly implement new functions. Many of the staff had to do their ‘day’ jobs as well as work on the new implementations. Several effects - Newcomers often had experience of the tools from university People learnt very rapidly by following mailing lists, going to conferences and interacting with the community. Contributions such as contributing to the governance, use cases and testing in addition to standard development contributions. Short term staff saw major improvements in their post-CERN job prospects as they left with very relevant skills
  • The agile approach is a major cultural change which is an ongoing process. To illustrate this, there are some characteristics which I show extreme examples of to watch out from Tolkein…. Luckily, we never had characters like this at CERN: Don’t be hasty, let’s go slowly… transformations such as this cannot be done in a reasonable time by incremental change Move away from silos… top to bottom from application to hardware managed by a single team to a layered model with shared budget and resources. Knowledge management responsibilities change. The guru who wrote the tool and trains others on how to use it is replaced by the outside community in which people participate. Everything can appear to be research if you start with a blank piece of paper. The server or application manager of ‘precious’ applications that need special handling and care has to be understood… some cases are inevitable but many reflect non-technical aspects of the application or server management and may justify changes of process
  • Already 3 independent clouds – federation is now being studied Rackspace inside CERN openlab Helix Nebula as discussed later
  • The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs over 2 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  • CERN clouds and culture at GigaOm London 2013

    1. 1. Tim Bell tim.bell@cern.ch @noggin143 19/09/2013 CERN Infrastructure Evolution 2
    2. 2. 19/09/2013 CERN Infrastructure Evolution 3 CERN was founded 1954:CERN was founded 1954: 12 European States12 European States ““Science for Peace”Science for Peace” Today: 20 Member StatesToday: 20 Member States Member States:Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark,Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway,Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andPoland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdomthe United Kingdom Candidate for Accession:Candidate for Accession: RomaniaRomania Associate Members in Pre-Stage to Membership:Associate Members in Pre-Stage to Membership: Israel, SerbiaIsrael, Serbia Applicant States for Membership or Associate Membership:Applicant States for Membership or Associate Membership: Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, UkraineBrazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council:Observers to Council: India, Japan, Russia, Turkey, United States of America;India, Japan, Russia, Turkey, United States of America; European Commission and UNESCOEuropean Commission and UNESCO Member States:Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark,Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway,Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andPoland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdomthe United Kingdom Candidate for Accession:Candidate for Accession: RomaniaRomania Associate Members in Pre-Stage to Membership:Associate Members in Pre-Stage to Membership: Israel, SerbiaIsrael, Serbia Applicant States for Membership or Associate Membership:Applicant States for Membership or Associate Membership: Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, UkraineBrazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council:Observers to Council: India, Japan, Russia, Turkey, United States of America;India, Japan, Russia, Turkey, United States of America; European Commission and UNESCOEuropean Commission and UNESCO ~ 2,300 staff~ 2,300 staff ~ 1,000 other paid personnel~ 1,000 other paid personnel > 11,000 users> 11,000 users Budget (2013) ~1,000 MCHFBudget (2013) ~1,000 MCHF ~ 2,300 staff~ 2,300 staff ~ 1,000 other paid personnel~ 1,000 other paid personnel > 11,000 users> 11,000 users Budget (2013) ~1,000 MCHFBudget (2013) ~1,000 MCHF
    3. 3. What are the Origins of Mass ? 19/09/2013 CERN Infrastructure Evolution 4
    4. 4. Matter/Anti Matter Symmetric? 19/09/2013 CERN Infrastructure Evolution 5
    5. 5. Where is 95% of the Universe? 19/09/2013 CERN Infrastructure Evolution 6
    6. 6. 19/09/2013 CERN Infrastructure Evolution 7
    7. 7. 19/09/2013 CERN Infrastructure Evolution 8
    8. 8. 19/09/2013 CERN Infrastructure Evolution 9
    9. 9. Collisions 19/09/2013 CERN Infrastructure Evolution 10
    10. 10. A Big Data Challenge 19/09/2013 CERN Infrastructure Evolution 11
    11. 11. The CERN Meyrin Data Centre 19/09/2013 CERN Infrastructure Evolution 12
    12. 12. New Data Centre in Budapest 19/09/2013 CERN Infrastructure Evolution 13
    13. 13. Good News, Bad News 19/09/2013 CERN Infrastructure Evolution 14
    14. 14. We’re Not Special 19/09/2013 CERN Infrastructure Evolution 15
    15. 15. 19/09/2013 Bamboo Koji, Mock AIMS/PXE Foreman Yum repo Pulp Puppet-DB mcollective, yum JIRA Lemon / Hadoop / LogStash / Kibana OpenStack Nova Hardware database Puppet Active Directory / LDAP CERN Infrastructure Evolution 16
    16. 16. The Agile Experience 19/09/2013 CERN Infrastructure Evolution 17
    17. 17. Cultural Barriers 19/09/2013 CERN Infrastructure Evolution 18
    18. 18. Status • Toolchain implemented in 18 months with enhancements and bug fixes submitted back to the community • Now in production in 3 OpenStack clouds (over 50,000 cores in total) in Geneva and Budapest managed by Puppet • Target is more than 300,000 cores by 2015 and 90% compute resources in the private cloud 19/09/2013 CERN Infrastructure Evolution 19
    19. 19. Summary • Constraints on resources have led to major technology transformations at CERN • Open source community participation helps drive cultural change and motivates staff • CERN benefits and contributes back through code and outreach 19/09/2013 CERN Infrastructure Evolution 20
    20. 20. Questions ? 19/09/2013 CERN Infrastructure Evolution 21
    21. 21. 19/09/2013 CERN Infrastructure Evolution 22
    22. 22. Service Models 19/09/2013 CERN Infrastructure Evolution 23 • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one
    23. 23. 19/09/2013 CERN Infrastructure Evolution 24
    24. 24. 19/09/2013 CERN Infrastructure Evolution 25 http://www.eucalyptus.com/blog/2013/04/02/cy13-q1-community-analysis-%E2%80%94-openstack-vs-opennebula-vs-eucalyptus-vs-clo
    25. 25. 19/09/2013 CERN Infrastructure Evolution 26
    26. 26. 19/09/2013 CERN Infrastructure Evolution 27 Tier-1 (11 centres): •Permanent storage •Re-processing •Analysis Tier-0 (CERN): •Data recording •Initial data reconstruction •Data distribution Tier-2 (~200 centres): • Simulation • End-user analysis • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid • In a normal day, the grid provides 100,000 CPU days executing over 2 million jobs
    27. 27. 19/09/2013 CERN Infrastructure Evolution 28
    28. 28. 19/09/2013 CERN Infrastructure Evolution 29 Microsoft Active Directory CERN DB on Demand CERN Network Database Account mgmt system Horizon Keystone Network Compute Glance Scheduler Cinder Nova Block Storage Provider
    29. 29. Training for Newcomers 19/09/2013 CERN Infrastructure Evolution 30 Buy the book rather than guru mentoring

    ×