Puppet at Google

Puppet at Google
Gordon Rowell
Puppet Camp Sydney 2013
gordonr@google.com

Non-Goals

Not here to to talk about

● Hiring practices
● Release schedules
● Puppet configs
● Monitoring
● Compliance
● Auditing
● ...

See also Jason Wright's talk from PuppetConf 2011

Background

Puppet at Google is offered as an infrastructure service

● Run by a Site Reliability Engineering (SRE) team
● Customers are OS teams
● Does not manage Google's customer facing infrastructure
(search, Gmail, etc.)!
● Manages internal laptops, desktops and servers

How Many Nodes?

Clients:
● "Lots" of Mac desktops and laptops
● "Lots" of Ubuntu desktops, laptops and servers
● "Some" others

Servers:
● "Tens" of puppet config servers
● "Units" of puppet CAs
● Deployed in five globally distributed VIPs
● Clients use Anycast to find closest "server"

Scaling is fun

● We don't deploy "a server"
○ Servers break, power fails
○ Clients/DNS need to be reconfigured

● We don't deploy "a cluster"
○ Networks break, servers break, power fails
○ Clients/DNS need to be reconfigured

● We deploy redundant clusters
○ Attempt to send clients to nearest serving cluster
○ Anycast means unified client configuration

Load balancing is fun

Do you have enough capacity?
● How many backends do you need?
● What happens if half of your backends lose power?
● What about when half are already out for repairs?

How do you send clients to the right cluster?
● Client configuration
● DNS round-robin (simple global load balancing)
● DNS views (give best answer for client IP)
● Anycast (portable IP, routed to "nearest" cluster)
● Consider: DNS views plus Anycast

Anycast is fun

● Anycast is "coarse-grain" load balancing
○ It normally sends traffic to closest serving cluster

● Networks break
○ Physical issues
○ Routing issues
○ Configuration issues
○ VIP load balancer bugs

● All clients could be sent to the same cluster
○ Be ready for that
○ Can a single cluster handle worldwide traffic?
○ What do you do if you can't?

Puppet problems: Thundering herds

● "Lots" + "lots" + "some" == "thundering herds"

● What if they all want to do a puppet run?

● What about every hour?

● What about every five minutes?

● Masterless puppet is being considered

Puppet problems: Release tracks

● OS releases have unstable, testing, stable branches
○ Maintained by OS platform teams

● Addons also have unstable, testing, stable branches
○ Maintained by service owners

● Using different tracks for OS and addons is hard
○ However, that's common - testing a new addon release
○ Puppet's global namespace is part of the problem

Puppet problems: Namespaces

● Lots of developers moving fast == conflicts

● Conflicts mean surprises

● Qualify everything

● Testing with rspec-puppet helps to catch issues early

Questions?

Gordon Rowell
gordonr@google.com

Puppet at Google

More Related Content

What's hot

Viewers also liked

Similar to Puppet at Google

More from Puppet

Recently uploaded

Puppet at Google