Puppet Camp CERN Geneva

1,508 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,508
On SlideShare
0
From Embeds
0
Number of Embeds
421
Actions
Shares
0
Downloads
28
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Puppet Camp CERN Geneva

  1. 1. A Puppet Infrastructure at CERN Steve Traylen CERN IT Department steve.traylen@cern.ch Puppet Camp, Geneva, CH. 11 July 2012
  2. 2. Outline•  CERN and Computing for High Energy Physics•  Today’s CERN IT Deployment –  Why and What’s changing•  Adoption of Puppet, Foreman, … –  Progress, Integration –  Difficulties –  Future Puppet Camp Geneva - CERN
  3. 3. CERN§  Conseil Européen pour la Recherche Nucléaire §  aka European Laboratory for Particle Physics §  Facilities for fundamental research§  Between Geneva and the Jura mountains, straddling the Swiss- French border§  Founded in 1954
  4. 4. The Large Hadron Collider§  Accelerator for protons against protons – 14 TeV collision energy §  By far the world’s most powerful accelerator§  Tunnel of 27 km circumference, 4 m diameter, 50…150 m below ground§  Detectors at four collision points
  5. 5. The  LHC  Computing  Challenge  ž  Data volume è 15 PetaBytes of new data each yearž  Global compute power è  250k CPU cores è  100 PB of disk storagež  Worldwide analysis & funding —  Distributed computing infrastructure to provide the production and analysis environments for the LHC experiments —  Managed and operated by a worldwide collaboration between the experiments and the participating computer centres —  Distributed for funding and sociological reasons Puppet Camp Geneva -
  6. 6. Motivation to Change Tools•  CERN data centre is reaching its limits: –  IT staff numbers remain fixed –  more computing capacity is needed•  Inefficiencies exist but root cause cannot be easily identified –  Tools becoming increasingly brittle and difficult to adapt •  E.g porting of tools to IPv6 would need a development project –  Some core components cannot be scaled up Puppet Camp Geneva - CERN
  7. 7. Second CERN Data Centre•  Wigner Institute in Budapest, Hungary•  Hands off facility, hardware support only•  Deploying 2012 to 2014 Puppet Camp Geneva - CERN
  8. 8. Infrastructure Tools Evolution•  We had to develop our own toolset in 2002 –  “Extremely Large Fabric Management System” or http://cern.ch/ELFms –  Included Quattor for configuration•  Nowadays, –  CERN compute capacity is no longer leading edge –  Many options available for open source fabric management –  We need to scale to meet the upcoming capacity increase•  If there is a requirement which is not available through an open source tool, we should question the need –  If we are the first to need it, contribute it back to the open source tool Puppet Camp Geneva - CERN
  9. 9. Infrastructure as a Service•  Goals –  Improve repair processes with virtualisation –  More efficient use of our hardware –  Better tracking of usage –  Enable remote management for new data centre –  Support potential new use cases , e.g Cloud –  Sustainable support model•  At scale for 2015 –  15,000 servers –  90% of hardware virtualized. –  300,000 VMs needed.•  Plan = OpenStack Adoption Puppet Camp Geneva - CERN
  10. 10. Chose Puppet for Configuration•  The tool space has exploded in the last few years –  In configuration management and ops –  Large, shared ‘tool forges’, and lots of experience•  Puppet and Chef are the clear leaders for the ‘core’ tool•  Many large-scale enterprises use Puppet –  Its declarative approach fits better with what we are used to in Quattor. –  Large installations: friendly, wide-base community and commercial support and training –  You can buy books on it –  You can employ people who know puppet better than you do Puppet Camp Geneva - CERN
  11. 11. Deployed System
  12. 12. Starting with Puppet•  Puppet was and is trivial to setup: –  Anyone can do it in a day:•  Configuring something with puppet is easy•  What’s hard: –  Deciding module scope and interaction with one another. •  Three modules editing grub.conf or one –  We started early 2012 with very little plan in the area of module organization Puppet Camp Geneva - CERN
  13. 13. Downloading Puppet Modules•  Expectation at start – all done for us: –  ssh, iptables , sysctl , apache, mysql all done –  example42 or similar can do everything.•  Reality –  Modules often not quite correct. •  Too simple, –  e.g. I want my sshd_config to be different in two places. •  Too much abstraction –  I want to use puppet and not some abstraction of 100s of variables covering every possible case »  e.g puppet with(out) passenger. I only want one –  Parameterized classes and Foreman don’t really work •  Resulting modules are not shareable – ENC globals vs params Puppet Camp Geneva - CERN
  14. 14. Sharing and Fixing Modules•  Not as easy as it should be: –  Our modules are littered with CERNisms •  ntpservers, subnets, authorization systems, .. •  Adaption to work with foreman •  All of us learning puppet and doing things quickly (badly)•  Hiera is being used now: –  Provides the code vs data separation we had with Quattor –  Dozens of ways to setup and (ab)use hiera –  Little experience with this anywhere yet –  Hiera should make modules more sharable across sites •  Looking forward to it becoming the normal standard thing that modules use and every one benefits from Puppet Camp Geneva - CERN
  15. 15. Sharing Modules With All•  A big aim is to share our modules as much as possible with everyone but in particular: –  CERN IT not the only puppet deployment at CERN •  ATLAS Point 1 farm at CERN runs puppet –  ATLAS analysis in the cloud has used puppet –  International HEP Labs use or are switching to puppet –  Puppet was the “winner” at recent CHEP fabric session •  Presentations from CERN, BNL, PIC, ATLAS•  We will share here but its early days: –  http://github.com/cernops Puppet Camp Geneva - CERN
  16. 16. Organizing Modules On Disk•  Started with all modules in one directory in git: –  Obviously wrong, great confusion for new comers•  Current situation two directories in git: –  Modules – reusable items – e.g firewall, apache, sysctl, .. –  Manifests – top level service, e.g batch machine, public login machine•  Future plans: –  Split up modules into local and downloaded •  modules like puppetlabs-firewall mixed with our own junk •  Will allow us to track /contribute to upstream better –  Inline with puppet’s upcoming vendor path Puppet Camp Geneva - CERN
  17. 17. Configuration Complexity, 150 clusters ranging form 1 to 3000 hosts.•  We have many configurations of service. –  Puppet handles this diversity well•  We have many administrators >= 300 –  These admins change, are on different continents –  Less obvious what to do with Puppet Puppet Camp Geneva - CERN
  18. 18. Trust Amongst SysAdmins All share one git Git Repository repository Rely on code review. git branches and environments. Puppet Master(s) for Puppet Master (s) for SysAdmin Team A SysAdmin Team B Teams use their own puppet masters. hiera-gpg key for each team. Team A’s Team B’s Nodes Host acl on Nodes puppet masters.•  The full implications of this lack of trust between admins is unclear –  Interested to hear what others have done.
  19. 19. Change Control, Dev Cycle•  Core team maintaining OS and basics: –  Hardware monitoring, ntp configuration, accounts, ..•  Specialized teams maintaining services on top: –  They are ultimately responsible for service stability –  We don’t want NTP configured 150 different ways•  Requirements: –  Some services will follow core updates –  Some service will choose when to take core updates –  Parts of services may follow latest updates –  LHC has physical shutdowns for doing timely updates Puppet Camp Geneva - CERN
  20. 20. Change Control , Dev Cycle•  Puppet Environments map to Git Branches: –  Nodes in Production, Testing and Devel branches –  Big new configurations being tested in feature branches •  A few nodes in these feature branches –  Some services live isolated in their own branch •  Risk of divergence•  Current process: –  A blind weekly devel -> production merge•  Next Process: –  Use Atlassian’s Crucible and Fisheye products to code review puppet configuration Puppet Camp Geneva - CERN
  21. 21. Crucible Reviewing Manifest•  Atlassion themselves use puppet and do this –  http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/ Puppet Camp Geneva - CERN
  22. 22. Hardware Provisioning•  Up to now a homegrown tool in use: –  Has strong similarities to puppet labs new Razor •  Razor is being followed, tracked for the moment –  Final step of tool adds host to foreman•  We are using foreman – happy with it: –  Kickstart templating is great –  Organising hosts into hostgroups is great –  We will now invest time to integrate foreman with CERN services: •  CERN network database , our master for switches, DNS, … •  AIMS kerberos managed tftp server •  CERN CA – We have our own CA used by other services also –  We will use this for puppet also Puppet Camp Geneva - CERN
  23. 23. Virtual Machine Provisioning•  Existing Microsoft HyperV infrastructure: –  3000 Virtual Machines of which 70 puppet managed –  VMs pre-seeded into a foreman hostgroup –  VMs being kickstarted onto puppet and foreman•  Puppet managed OpenStack Nova –  Today aiming at 200 hypervisors with up to 4000 puppet managed VMs. –  Machine Images created with Oz –  Machines NOT pre-seeded in foreman or puppet •  Register at boot time –  amiconfig and cloud-init for contextualizing •  pass puppet server and foreman hostgroup to image Puppet Camp Geneva - CERN
  24. 24. Next Steps till End of Year•  Migrate to PuppetDB –  (300,000 nodes => 300 GB RAM)•  Look at puppet dashboard•  Use mcollective for something: –  Necessary as node number increases –  Currently set up but not being used particularly•  Check Foreman’s integration with OpenStack•  Migrate more services from Quattor to Puppet•  Decide a scheme for secure blob delivery: –  hiera-gpg or ACL’ed puppet fileserver Puppet Camp Geneva - CERN
  25. 25. Conclusions•  Migrating to Puppet –  Largest change in our deployment for 5 years•  Has all been fairly painless: Difficulties: –  forced to integrate to existing stuff sometimes –  Doing things wrong first time •  lack of in house experience•  300,000 VMs in 2015? –  puppet easy to scale, more hardware can be added –  We expect to dedicate up to 100 of cores to puppet•  It’s a joy to work with an active community Puppet Camp Geneva - CERN

×