Your SlideShare is downloading. ×
0
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Puppet Camp CERN Geneva
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Puppet Camp CERN Geneva

890

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
890
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
25
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Puppet Infrastructure at CERN Steve Traylen CERN IT Department steve.traylen@cern.ch Puppet Camp, Geneva, CH. 11 July 2012
  • 2. Outline•  CERN and Computing for High Energy Physics•  Today’s CERN IT Deployment –  Why and What’s changing•  Adoption of Puppet, Foreman, … –  Progress, Integration –  Difficulties –  Future Puppet Camp Geneva - CERN
  • 3. CERN§  Conseil Européen pour la Recherche Nucléaire §  aka European Laboratory for Particle Physics §  Facilities for fundamental research§  Between Geneva and the Jura mountains, straddling the Swiss- French border§  Founded in 1954
  • 4. The Large Hadron Collider§  Accelerator for protons against protons – 14 TeV collision energy §  By far the world’s most powerful accelerator§  Tunnel of 27 km circumference, 4 m diameter, 50…150 m below ground§  Detectors at four collision points
  • 5. The  LHC  Computing  Challenge  ž  Data volume è 15 PetaBytes of new data each yearž  Global compute power è  250k CPU cores è  100 PB of disk storagež  Worldwide analysis & funding —  Distributed computing infrastructure to provide the production and analysis environments for the LHC experiments —  Managed and operated by a worldwide collaboration between the experiments and the participating computer centres —  Distributed for funding and sociological reasons Puppet Camp Geneva -
  • 6. Motivation to Change Tools•  CERN data centre is reaching its limits: –  IT staff numbers remain fixed –  more computing capacity is needed•  Inefficiencies exist but root cause cannot be easily identified –  Tools becoming increasingly brittle and difficult to adapt •  E.g porting of tools to IPv6 would need a development project –  Some core components cannot be scaled up Puppet Camp Geneva - CERN
  • 7. Second CERN Data Centre•  Wigner Institute in Budapest, Hungary•  Hands off facility, hardware support only•  Deploying 2012 to 2014 Puppet Camp Geneva - CERN
  • 8. Infrastructure Tools Evolution•  We had to develop our own toolset in 2002 –  “Extremely Large Fabric Management System” or http://cern.ch/ELFms –  Included Quattor for configuration•  Nowadays, –  CERN compute capacity is no longer leading edge –  Many options available for open source fabric management –  We need to scale to meet the upcoming capacity increase•  If there is a requirement which is not available through an open source tool, we should question the need –  If we are the first to need it, contribute it back to the open source tool Puppet Camp Geneva - CERN
  • 9. Infrastructure as a Service•  Goals –  Improve repair processes with virtualisation –  More efficient use of our hardware –  Better tracking of usage –  Enable remote management for new data centre –  Support potential new use cases , e.g Cloud –  Sustainable support model•  At scale for 2015 –  15,000 servers –  90% of hardware virtualized. –  300,000 VMs needed.•  Plan = OpenStack Adoption Puppet Camp Geneva - CERN
  • 10. Chose Puppet for Configuration•  The tool space has exploded in the last few years –  In configuration management and ops –  Large, shared ‘tool forges’, and lots of experience•  Puppet and Chef are the clear leaders for the ‘core’ tool•  Many large-scale enterprises use Puppet –  Its declarative approach fits better with what we are used to in Quattor. –  Large installations: friendly, wide-base community and commercial support and training –  You can buy books on it –  You can employ people who know puppet better than you do Puppet Camp Geneva - CERN
  • 11. Deployed System
  • 12. Starting with Puppet•  Puppet was and is trivial to setup: –  Anyone can do it in a day:•  Configuring something with puppet is easy•  What’s hard: –  Deciding module scope and interaction with one another. •  Three modules editing grub.conf or one –  We started early 2012 with very little plan in the area of module organization Puppet Camp Geneva - CERN
  • 13. Downloading Puppet Modules•  Expectation at start – all done for us: –  ssh, iptables , sysctl , apache, mysql all done –  example42 or similar can do everything.•  Reality –  Modules often not quite correct. •  Too simple, –  e.g. I want my sshd_config to be different in two places. •  Too much abstraction –  I want to use puppet and not some abstraction of 100s of variables covering every possible case »  e.g puppet with(out) passenger. I only want one –  Parameterized classes and Foreman don’t really work •  Resulting modules are not shareable – ENC globals vs params Puppet Camp Geneva - CERN
  • 14. Sharing and Fixing Modules•  Not as easy as it should be: –  Our modules are littered with CERNisms •  ntpservers, subnets, authorization systems, .. •  Adaption to work with foreman •  All of us learning puppet and doing things quickly (badly)•  Hiera is being used now: –  Provides the code vs data separation we had with Quattor –  Dozens of ways to setup and (ab)use hiera –  Little experience with this anywhere yet –  Hiera should make modules more sharable across sites •  Looking forward to it becoming the normal standard thing that modules use and every one benefits from Puppet Camp Geneva - CERN
  • 15. Sharing Modules With All•  A big aim is to share our modules as much as possible with everyone but in particular: –  CERN IT not the only puppet deployment at CERN •  ATLAS Point 1 farm at CERN runs puppet –  ATLAS analysis in the cloud has used puppet –  International HEP Labs use or are switching to puppet –  Puppet was the “winner” at recent CHEP fabric session •  Presentations from CERN, BNL, PIC, ATLAS•  We will share here but its early days: –  http://github.com/cernops Puppet Camp Geneva - CERN
  • 16. Organizing Modules On Disk•  Started with all modules in one directory in git: –  Obviously wrong, great confusion for new comers•  Current situation two directories in git: –  Modules – reusable items – e.g firewall, apache, sysctl, .. –  Manifests – top level service, e.g batch machine, public login machine•  Future plans: –  Split up modules into local and downloaded •  modules like puppetlabs-firewall mixed with our own junk •  Will allow us to track /contribute to upstream better –  Inline with puppet’s upcoming vendor path Puppet Camp Geneva - CERN
  • 17. Configuration Complexity, 150 clusters ranging form 1 to 3000 hosts.•  We have many configurations of service. –  Puppet handles this diversity well•  We have many administrators >= 300 –  These admins change, are on different continents –  Less obvious what to do with Puppet Puppet Camp Geneva - CERN
  • 18. Trust Amongst SysAdmins All share one git Git Repository repository Rely on code review. git branches and environments. Puppet Master(s) for Puppet Master (s) for SysAdmin Team A SysAdmin Team B Teams use their own puppet masters. hiera-gpg key for each team. Team A’s Team B’s Nodes Host acl on Nodes puppet masters.•  The full implications of this lack of trust between admins is unclear –  Interested to hear what others have done.
  • 19. Change Control, Dev Cycle•  Core team maintaining OS and basics: –  Hardware monitoring, ntp configuration, accounts, ..•  Specialized teams maintaining services on top: –  They are ultimately responsible for service stability –  We don’t want NTP configured 150 different ways•  Requirements: –  Some services will follow core updates –  Some service will choose when to take core updates –  Parts of services may follow latest updates –  LHC has physical shutdowns for doing timely updates Puppet Camp Geneva - CERN
  • 20. Change Control , Dev Cycle•  Puppet Environments map to Git Branches: –  Nodes in Production, Testing and Devel branches –  Big new configurations being tested in feature branches •  A few nodes in these feature branches –  Some services live isolated in their own branch •  Risk of divergence•  Current process: –  A blind weekly devel -> production merge•  Next Process: –  Use Atlassian’s Crucible and Fisheye products to code review puppet configuration Puppet Camp Geneva - CERN
  • 21. Crucible Reviewing Manifest•  Atlassion themselves use puppet and do this –  http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/ Puppet Camp Geneva - CERN
  • 22. Hardware Provisioning•  Up to now a homegrown tool in use: –  Has strong similarities to puppet labs new Razor •  Razor is being followed, tracked for the moment –  Final step of tool adds host to foreman•  We are using foreman – happy with it: –  Kickstart templating is great –  Organising hosts into hostgroups is great –  We will now invest time to integrate foreman with CERN services: •  CERN network database , our master for switches, DNS, … •  AIMS kerberos managed tftp server •  CERN CA – We have our own CA used by other services also –  We will use this for puppet also Puppet Camp Geneva - CERN
  • 23. Virtual Machine Provisioning•  Existing Microsoft HyperV infrastructure: –  3000 Virtual Machines of which 70 puppet managed –  VMs pre-seeded into a foreman hostgroup –  VMs being kickstarted onto puppet and foreman•  Puppet managed OpenStack Nova –  Today aiming at 200 hypervisors with up to 4000 puppet managed VMs. –  Machine Images created with Oz –  Machines NOT pre-seeded in foreman or puppet •  Register at boot time –  amiconfig and cloud-init for contextualizing •  pass puppet server and foreman hostgroup to image Puppet Camp Geneva - CERN
  • 24. Next Steps till End of Year•  Migrate to PuppetDB –  (300,000 nodes => 300 GB RAM)•  Look at puppet dashboard•  Use mcollective for something: –  Necessary as node number increases –  Currently set up but not being used particularly•  Check Foreman’s integration with OpenStack•  Migrate more services from Quattor to Puppet•  Decide a scheme for secure blob delivery: –  hiera-gpg or ACL’ed puppet fileserver Puppet Camp Geneva - CERN
  • 25. Conclusions•  Migrating to Puppet –  Largest change in our deployment for 5 years•  Has all been fairly painless: Difficulties: –  forced to integrate to existing stuff sometimes –  Doing things wrong first time •  lack of in house experience•  300,000 VMs in 2015? –  puppet easy to scale, more hardware can be added –  We expect to dedicate up to 100 of cores to puppet•  It’s a joy to work with an active community Puppet Camp Geneva - CERN

×