A Tale of Two Systems
David Newman
@darthhexx
Howdy! We're the people behind
WordPress.com, WooCommerce,
Jetpack, and a bunch of other
products for WordPress.
We are passionate about
making the web a better place.
We don't build software for free
– we build it for freedom.
Communication is oxygen
A TALE OF TWO SYSTEMS
WordPress.com
WordPress multisite with 100+ million sites.
VIP Go
Container-based VIP WordPress hosting.
A TALE OF TWO SYSTEMS
WordPress.com
Simplicity scales.
WordPress.com
• The App
• Database
• Static content
• Caching
• Distributing workload
WordPress.com
• Networking
• Protecting
• Continuous Integration
• Stats and analysis
The App
• Free Open Source Software.
• Most widely adopted server-
side programming language.
• Notable performance increase
in PHP7.
• Free Open Source Software.
• 27% of top 10 million sites.
• Passionately developed and
supported.
• Add/remove/apply filters
and actions, offers
extensibility.
• We customise, add features
and improve performance.
• HyperDB
• Object Cache
• CSS/JS concatenation
• 2FA
• many, many more…
WordPress.com
Databases
Databases
A billion+ tables
HyperDB
• Slave lag and failed host
detection and mitigation.
• Distributed reads / writes.
• Replication and partitioning
support.
Partitioning
• Global tables for sites, blogs,
users, usermeta, etc.
• Sharded blog specific data
(posts, comments, options,
terms, etc.)
Databases
• Query comments containing
the URL in order to track down
errant code.
• Dedicated replicated read-only
DBs for backups.
Databases
• “Index all the things” and
EXPLAIN everything :)
• Use MyISAM, unless workload
warrants the use of InnoDB,
due to memory requirements.
Static Content
Life Without Static Content
Images
• Distributed fault tolerant file
system with MogileFS.
• Clusters in multiple
Datacenters.
• Replicated to S3 as immutable
backups.
Images
• Do not store intermediate
sizes.
• Resize images on the fly.
• Responsive design with srcset.
Images
• Perform image optimisations
with pngquant, optipng, and
jpegoptim.
• Output progressive JPEGs.
Images
• On-the-fly WebP conversions
based on HTTP Accept header.
• Disable advanced compression
routines under load.
CSS/JS
• Dedicated static content
production servers.
• Minified at commit.
• Concatenated on the fly in
production.
Caching
Cache Levels
• Server-side
• Datacenter
• PoP / Edge
PHP OpCache
• Compiled PHP scripts cached.
• Linked into our deploy system
to auto-compile / delete as
necessary.
PHP APCu
• Great for “persistent statics”.
• Querying load in order to
switch off non-critical image
optimisations.
• File system IO processes for
consistently hashed services.
Datacenter Cache
• Distribution requires shared
caches.
• Object Cache shards cache
groups.
Memcached
• Drop in Object Cache
replacement: wordpress.org/
extend/plugins/memcached/
• Internal systems for sync’ing
DC caches.
• Multi-DC replication challenge
with DB vs Cache replication
timing.
• Sync’ing cache invalidations
with DB replication, e.g.
Facebook’s mcrouter
Memcached
Batcache
• Uses Memcached to store and
serve rendered pages:

wordpress.org/plugins/
batcache/
• 40x reduction in page
generation time.
NGiNX DC Cache
• Caching of some assets at the
DC level, e.g. feeds, sitemaps,
some images, etc.
PoP / Edge Caching
• Every “corner” of the world.
• Only cache after more than X
page queries in Y time period.
• OpenResty for advanced
features.
PoP/Edge Invalidations
• Performed using mangle, an
internally developed tool.
• Employs Anycast, using
encrypted UDP, and a gossip
protocol.
Distribute
Distribute
Use the right tool for the job at hand.
Async Jobs System
• WordPress-based jobs system.
• Time insensitive processes
performed asynchronously.
• wp-cron jobs, do_pings, etc.
• “Index all the things”.
• Used for performant full text
searches on content.
• Centralised application log
storage.
• Logs shipped via Logcourier.
• Kibana for dashboards and
alerts.
Networking
Networking
• HA via Anycast.
• Custom server ToR setup using
VRRP.
• Production traffic routing.
Unicast
Broadcast
Multicast
Anycast
Anycast
Anycast
• BGP advertises IP address
subnets between networks.
• Local preference metric, which
overrides the network hop
metric (AS Path), adds peering
complications.
Anycast
Anycast Benefits
• Distributed caches.
• Low network latency to closest
PoP.
• Easier to perform maintenance.
• DDoS mitigation.
170Gbps DDoS Absorption
Custom ToR Setup
Not this ToR :)
• Anycast between ToR and
cores.
• VRRP using the host as the
bridge between the 2 ToR
switches.
Custom ToR Setup
• VRRP is active/passive virtual
address that can exist on one
of 2 devices.
• VRRP master used as the host
gateway.
Custom ToR Setup
• Multichassis Link Aggregation
(MLAG) to form 1 LACP port
channel.
• Provides active/active on
Layer 2.
Custom ToR Setup
Custom ToR Setup
Host
Switch BSwitch A
DC
BGP
VRRP
MLAG / LACP
VRRP
MLAG / LACP
BGP
• IP address allocation is
efficient.
• Sub-second failover.
• Easier switch maintenance.
• L3 straight to the server, L2
domain is the server itself.
Custom ToR Setup
• Route certain endpoints and
content types to specialised
backends, e.g. API, wp-admin,
statics, etc.
• Global traffic balancing using
NGiNX split_clients.
Production Traffic Routing
Protecting
If you build it, they will come.
• Anycast absorbs DDoS by
design.
• SSL everywhere using Let’s
Encrypt.

Protecting
• IPSET auto-blocking
mechanisms.
• OpenResty for SSL blocking
algorithms in NGiNX.

Protecting
Comment SPAM
Akismet
Filters for spam comments on
millions of sites in real time.
CI
CI
• Each engineer has one.
• Production test/staging sites.
• Huge advantage for production
debugging.
• Integrated deploy tests and
validations.
Sandbox
• Parallel tests using Agents.
• Supports PHPunit tests.
• Local sandboxed test data.
TeamCity
Stats and

Analysis
• Tracks for user events.
• Sqoop’d into Hadoop and
accessed via Hue2.
• Cloudera, Impala, Zookeeper,
etc.
Stats and Analysis
A TALE OF TWO SYSTEMS
VIP Go
Outline
• Shared infrastructure; isolated
client platform.
• No restrictions set on plugins.
• Seamless deploy to Dev, Test,
Staging and Production.
• Integrated review process.
chroot > solution < VM
Containers
Docker or pure LXC?
Helicopter View
• Container-based hosting
platform.
• Internal Docker Registry.
• Custom container orchestration
via Host Action queues.
• API driven.
Similarities
• Static content
• Protecting
• Distributing workload
Distinctions
• The App Containers
• Networking
• Caching
• Continuous Integration
• Auto-scaling
App Containers
App Containers
• Web and DB containers use
Docker as one would LXC, with
`/sbin/init`.
• Memcached Docker-style.
• AUFS only.
• Pin to Docker version.
App Containers
• DB containers are InnoDB only.
• Percona XtraBackup tools.
• Web Containers have Jetpack
Premium installed.
• New Relic real time PHP
analysis.
Networking
Networking
• Avoid using the NAT userspace
docker-proxy where possible.
• Use “host-only” wherever
possible.
• Dynamic service port assignment
via container orchestration.
Networking
• Dynamic service port firewall in
the containers.
• Set ephemeral port range in
net.ipv4.ip_local_port_range.
• Watch the conntrack table!
Caching
Caching
• Varnish caches all assets at the
Edge.
• Be weary of `Hit-for-pass`.
• Websockets need special
configs and attention.
Caching
• Varnish and TLS, the short
story.
• Leap second, design for failure.
Continuous Integration
Continuous Integration
• Git-centric (Github / Gitlab).
• Clients select their own CI / Test
toolchain.
• Review Queue integration.
Continuous Integration
• Dev / Staging / Production tied
to repo branches.
• Host Action controlled code
deploys triggered by
WebHooks.
Stats and Auto-scaling
Auto-scaling
• InfluxDB timeseries stats.
• API auto-allocates containers to
hosts using signals from host
stats.
Auto-scaling
• Auto-scaling service
interrogates InfluxDB stats and
scales reactively.
• Predictive resource allocation
using Machine Learning.
Summary
• Shard DBs or Clients :)
• Cache at the Edge.
• Anycast whenever you can.
• Employ simple active defences.
Summary
• Design for failure.
• Sync invalidations with data
streams.
• Employ AI for resource
planning.
Simplicity scales :)
Q & A
Thanks
David Newman
@darthhexx

A Tale of 2 Systems