How do you massively scale an API as your business grows?
Ciaran worked for a company that has grown from a small B2C app written in PHP/MySQL to a B2B advertising company doing 15 Billion API calls per month. Along the way there have been some big technology changes and decisions that have both helped and hindered us. Along the way we have had 3 day outages, several AWS outages, massive traffic spikes especially early days when we signed customers like Wordpress.com, and many many improvements, re-factors and even successes.
Today and for the last year we are highly modern engineering team, using many of the latest technologies and methodologies. We code in PHP and Python with C, Go, CoffeeScript and more, store data in MongoDB, Redis, memcache, Solr and MySQL, we have a full CI environment, do deploys using IRC (and Hubot) and Puppet everything. We have 100% (not 99.999%) up time for over 12 months now.
Good Afternoon share my experiences So from reboots, To Redis, still works my journey Not why should But What happens when BIG
So who I am? Ciaran, Freelance CTO PHP UK Ireland, Talk so slow What?
Talk about architecture Take you through some of my blueprints for success
14 Billion API requests 4 Billion Page views 145 Million Unique users monetize 1million Websites 26 thousand customers didn’t start like that
Started Skimbit.com Rebranded skimit.com social decision making and bookmarking tool Launched 2007 topic create a page
bookmark items brides maids shoes friends and family comments / rating One day something BIG happened!!
We got Techcrunched!! Website didn’t go down So …
Partied! You see At that that time …
hosted at FlexiScale anyone? No? Small, local, cloud SIMPLE CEO’s number on speed dial Everything we need Our app was …
App /arch very straight forward It looked like this Few still like this? node.js on solar powered raspberry pi’s Ours & we were happy with it BUT one day this happened ->
three day outage to the hour wasn’t our fault Well really it was
the cloud floated away!! human error !! deleted main storage restore via remount Not enough hardware ship from Germany! Didn’t have a backup plan So …
We panicked! CEO’s number no answer
Our app is simple Just deploy Not that easy We thought But…
DNS servers were OUR server Email records on OUR Server Even customer uploaded images on it One giant single point of failure This is how NOT to do it
DNS was the hardest 24 Hour TTL on NS records
Email right after Already using Google Apps 30 min TTL on MX Once DNS back ..
Finally Back up images to S3
Shortly after this We started October 2008
We business means new website Nov 2008
New hosting company The Planet Real dedicated servers No cloud float Backward? Sensible New app , new arch
New App New ARCHICTURE Half dozen servers Next slide!!
2 API boxes - redunecy Mostly read Writes local
Master slave DB MySQL replication –again 2 - redunecy Sperate writes Batch writes to master
Finally client app Reports Website Etc Last mention
Simple Unbreakable
Taking off Growth over the next few years Interesting events Coming up
1 Year in we Signed Wordpress
3 years in Pinterest took off! but lets come back to where we started
Architecture = this Never could break! But then! WP Approaching our 1 st challange
Stopped recording Important lesson Only record what you need Later we did record Moving on … started to grow
International traffic growing 40% = USA 20% = UK 10% = France 30% =others like germany canada So moving on up! Groth means new website
Started to improve New Website easly 2010
New product Skimwords Analyses content Creates links Time for something new
Great thing New products go on new servers!! Eventually all on AWS This where start talking about the right way
2nd challenge International traffic growing
Created API boxen Apache PHP Memcached Single mimum unit to do work
Scaled out API horizontaly Added ELB Backed by MYSQL Skimwords = MongoDB
globally Amazon regions Availabily zones Mysql Master slave Mongo Master Slave Mongo 1.6 Ultra DNS = GLB Failover Short TTL
It worked Latency dropped huge
World Graphs Blue = europe Green = us east Red = us west
We were where Xmas ‘10 And this was coming Pinterest over 1 year Growing fast So we start to improve What happens when things change, some new So what do you think the 1 st step?
First step New website Early 2011 And then …
We bought Atma 3 rd challange Integrate them into our stack, python Also new services in python due to new python devs
4 th challange And did some research Bake off Apache Worker = Not thread safe PHP Drivers Mongo at the time
Picked Python Why not scala? Many reasons Team skills
New Jay Box Same princaples, new arch Jay = Json Ngix PHP/Apache Tornado / Python Flume AGENT for logs+events
Again same princeables Make a Cluster Jays ELB Mysql Memcached Flume Collector C Tree Filter + Redis
Again International Clusters Balance with Ultra DNS Mysql Master Slave Flume Master -> S3 – do Hadoop jobs Hbase real time analytics
Other tools Puppet Jenkins-ci / RMP’s / Hubot Fastly Api Axle
Monitoring Local Cacti Icinga aka nagios Lerenzo Pindom & Padgerduty
NOC Always watching Finally Reminder
NOC Always watching Finally Reminder
Questions!
My Details Hope you got something from this We learned a lot Hope you can apply to your business here for the rest of day Hope you all get a lot out of the rest of the week