Who are we?• Premium photo & video sharing.• Everyone pays!• Bootstrapped in ’02.• $10M+ as of ’07.• Profitable.• Top 250 website.• 35M+ people / month
The challenge• Premium! aka “more” + “better” + “faster”.• Unlimited storage.• Unlimited bandwidth.• Huge photos (100Mpix!). Billions of them.• Huge videos (1080p, high bitrates)• Lots of photos per page.• Super fast.
Architectureearly 2006• Multiple datacenters• Self-managed• Self-installed hardware• Tons of spinning disks• Tons of custom servers• Tons of distracting work• We’re not a datacenter company
The phone callearly 2006• *ring* “Hi, this is Amazon, we’d like to sell you storage.”• “Say what? Amazon? Storage?”• “Yeah, how does $0.50/GB/mo sound?”• ... quick napkin math ... “Sorry, we do $0.20/GB today”• “Oh, really? Thanks for the feedback.” *click*• ... days pass ...• *ring* “Hi, Amazon again. How about $0.15/GB/mo?”• Sold.
It beginsApril 2006• Started simple. Storage - and lots of it.• Slow at first. “Isn’t Amazon a bookseller?”• First bill a huge $1147.41 in April. ;)• Redundant backup to begin with.• Soon, primary with on-site as backup.• Finally, 100% photos & videos in S3.• “Wow, this thing is for real!”
Show me the moneyearly 2007• Guesstimate: ~$500K saved first year• Actual: • Growth: 64M photos -> 140M photos • Stored 200TB at S3 • Disks would have cost: $40K/mo -> $100K/mo • $922K projected spend, $230K actual • $692K in cold hard savings• Taxes! $295K ‘saved’ in cash flow.• Reselling disks - recouping sunk costs.
The revelationearly 2007• Yes, S3 is cheap.• Yes, it’s durable and available.• Yes, it’s fast.• But most important: Weight off our shoulders!• No more hard disk replacements!• No more midnight datacenter fiascos!• We can focus on photo sharing!
But wait, there’s more...2007• Amazon does books ... storage... and compute?!• Hey, we have lots of compute!• Web servers, background proc, rendering, etc• Buying, installing, maintaining servers• Often idle.• Let’s try rendering first.
SkyNet Lives!2007• First EC2 service: ‘RubberBand’• Handles all background photo processing• Automated, near zero human interaction• Tried to take over the world• Launched ~1920 cores in a single API call• Amazon spun them up as requested• Renamed to ‘SkyNet’ :)
SkyNet success2007-2012+• Rendering load peaky• User-driven based on # of photos shot recently• Only roughly predictable • Sundays heavy - but how heavy? • Big spike - but will it last?• Elastic scaling maximizes throughput, minimizes cost• Instrument and automate• No humans!
Leverage for... new products?late 2007• Customers begging us for video• Not just any video: Hi-Def & high bitrate• Potentially huge $$ capital expense (lots of servers)• Totally unknown customer adoption• Upside? Who knows?!
Leverage for new products!late 2007• Use EC2! No capital expense!• If usage takes off, just scale it up!• If usage falls off a cliff, just turn it off!• Worked like a charm• Minimal investment to get it into customers’ hands• Took off (whew!)
New products part two, electric boogaloomid 2008• Customers begging for archival storage• RAW photos, original video footage, etc• Breaks our business model• Potentially costly to implement• Again, unknown customer adoption
New products part two, electric boogaloomid 2008• DevPay to the rescue!• S3 + Amazon Payments mashup• We called it ‘SmugVault’• Store anything you like, pay as you go• Amazon bills customer directly• Terabytes of backup storage• Happy customers
Mo money2009• Amazon does payments, too?• Sure, why not. We’ll try it.• SmugMug subscriptions via Amazon Payments• Immediate 7% increase in total signups
EC2 steamroller begins2009• Important new EC2-related services arrive• Auto-Scaling• Elastic Load Balancing• Monitoring• Able to migrate lots more services to AWS
EC2 steamroller: Photos & Videos2009• SmugMug’s security & privacy layer complex• Doesn’t map to S3’s• Needs a proxy layer to intercept & validate requests• From client straight to AWS• Bypasses our datacenters• Auto-Scaling + ELB + EC2 + S3 = Win
EC2 steamroller: Realtime rendering2010• Lots of different devices & screens out there• SmugMug’s pre-rendered sizes don’t always fit• Allow realtime dynamic photo resizing server-side• Any resolution they wish• Must be lightning fast• Unpredictable load• More ELB + Auto-Scaling + EC2 + S3
EC2 steamroller: Uploads2011• No more proxy uploads to our servers• Uploads go straight to ELB->EC2->S3• Can’t use Auto-Scaling, terminates too fast• User-generated, unpredictable load• ELB + EC2 + S3
EC2 steamrollertoday• Vast majority of CPU usage in EC2• 100% photo & video requests served from AWS• 4 out of 5 customer facing web clusters 100% in AWS• 5th one “any day now” - full testing currently underway• Final stage required advanced AWS functionality • DynamoDB • EC2 instances w/SSD (hi.4xlarge)• 100% AWS within reach
EC2 evolution: hi.4xlarge2012• Finally.• Extremely high-scale I/O DB-class systems.• Final missing-link to let us migrate 100% to AWS. • (We’re already 100% SSD in our datacenters)• 2TB of SSD storage• 120,000 random read IOPS• 10,000 - 85,000 random write IOPS• omg.
Alien technology: DynamoDB2012• Finally.• “S3 for databases”• Bottomless low-latency datastore.• Key-value. (aka NoSQL)• Bulk of our data headed to DynamoDB• omg.
Alien technology: CloudSearch2012• Billions of documents to search• Millions of new & changed docs per day• Many dozens of different facets• Old system basically duct tape + SSDs• CloudSearch blazingly fast even with crazy queries.• omg.
Handling Failurealways & forever• Everything breaks. Even in your own datacenters.• Especially in your own datacenters.• Plan for it.• With AWS, ‘breaking’ is clearly defined• Regions, Zones, Services, Instances, etc.• Mix & match for your needs• Multi-AZ is currently our sweet spot.• Minimal impact during various ‘Amazonpocalypse’ events