Puppet at Google Gordon RowellPuppet Camp Sydney 2013 firstname.lastname@example.org
Non-GoalsNot here to to talk about● Hiring practices● Release schedules● Puppet configs● Monitoring● Compliance● Auditing● ...See also Jason Wrights talk from PuppetConf 2011
BackgroundPuppet at Google is offered as an infrastructure service● Run by a Site Reliability Engineering (SRE) team● Customers are OS teams● Does not manage Googles customer facing infrastructure (search, Gmail, etc.)!● Manages internal laptops, desktops and servers
How Many Nodes?Clients: ● "Lots" of Mac desktops and laptops ● "Lots" of Ubuntu desktops, laptops and servers ● "Some" othersServers: ● "Tens" of puppet config servers ● "Units" of puppet CAs ● Deployed in five globally distributed VIPs ● Clients use Anycast to find closest "server"
Scaling is fun● We dont deploy "a server" ○ Servers break, power fails ○ Clients/DNS need to be reconfigured● We dont deploy "a cluster" ○ Networks break, servers break, power fails ○ Clients/DNS need to be reconfigured● We deploy redundant clusters ○ Attempt to send clients to nearest serving cluster ○ Anycast means unified client configuration
Load balancing is funDo you have enough capacity? ● How many backends do you need? ● What happens if half of your backends lose power? ● What about when half are already out for repairs?How do you send clients to the right cluster? ● Client configuration ● DNS round-robin (simple global load balancing) ● DNS views (give best answer for client IP) ● Anycast (portable IP, routed to "nearest" cluster) ● Consider: DNS views plus Anycast
Anycast is fun● Anycast is "coarse-grain" load balancing ○ It normally sends traffic to closest serving cluster● Networks break ○ Physical issues ○ Routing issues ○ Configuration issues ○ VIP load balancer bugs● All clients could be sent to the same cluster ○ Be ready for that ○ Can a single cluster handle worldwide traffic? ○ What do you do if you cant?
Puppet problems: Thundering herds● "Lots" + "lots" + "some" == "thundering herds"● What if they all want to do a puppet run?● What about every hour?● What about every five minutes?● Masterless puppet is being considered
Puppet problems: Release tracks● OS releases have unstable, testing, stable branches ○ Maintained by OS platform teams● Addons also have unstable, testing, stable branches ○ Maintained by service owners● Using different tracks for OS and addons is hard ○ However, thats common - testing a new addon release ○ Puppets global namespace is part of the problem
Puppet problems: Namespaces● Lots of developers moving fast == conflicts● Conflicts mean surprises● Qualify everything● Testing with rspec-puppet helps to catch issues early