3. Launched June 18, 2005
875,000 active sellers
33.5MM items for sale
$65.9MM in sales, in May
1.4B page views, in May
102 engineers
32 releases, last Friday
Monday, June 18, 12
4. LAMP
any questions?
8BitLit, http://www.etsy.com/listing/90066890/
Monday, June 18, 12
8. Architectural Principles
* Don't bet against the future.
* Our customers are humans.
* Simplicity always wins, in the end.
* Favor global vs local optimization.
* Ambiguity kills momentum.
* Make failure cheap.
* Technical debt is an inevitable by-product
of shipping code.
* Optimize for change.
Monday, June 18, 12
10. Complex systems and change
1. Distributed systems are inherently complex.
2. The outcome of change in complex systems is hard to
predict.
3. The outcome of small, frequent, measurable changes
are easier to predict, easier to recover from, and promote
learning.
Ckrickett, http://www.etsy.com/listing/90611466
Monday, June 18, 12
12. Continuous deployment: Small,
frequent changes to production
Ckrickett, http://www.etsy.com/listing/90611466
Monday, June 18, 12
13. Continuous Deployment:
No branching.
“All existing revision control systems were
built by people who build installed
software”
- Paul Hammond,
Always Ship Trunk, Velocity 2010
Thursday, March 17, 2011
Monday, June 18, 12
14. Continuous Deployment:
feature flags
if ($cfg[‘awesome_new_search’]) {
# new hotness
$rsp = do_solr();
} else {
# boring old stuff
$rsp = do_grep();
}
Monday, June 18, 12
15. Continuous Deployment:
Ramp - ups
(on top of feature flags)
1. Launch to staff only
2. Launch to 1% of all users
3. Launch to members of a beta group
Monday, June 18, 12
16. Continuous Deployment:
any engineer can launch a feature to
1% of users
Monday, June 18, 12
26. inbound request
CDNs - diversified at the DNS level
Internet providers - diversified at borders
AWS
Etsy network appliances
analytics imstor
etsystatic.com/
etsy.com/ bcn.etsy.com EMR S3
photos
api.etsy.com JRuby/
/atlas Squid Cascading
apache
apache apache S3
logs
php application php PHP
logrotate
MySQL imstor MySQL
HDFS
search analytics NFS
memcache
async http
StatsD
sqlite
gearman
logs
MySQL server/OS search mail out PCI
hardware Thrift SMTP
dbindex Jetty
dbshards X-Yarnblaster
Solr slaves via jsonp,
dbaux datasets no privileged access
dbdata Solr master etc
HBase
sharded MySQL
Monday, June 18, 12
27. CDNs: Put a slider on it
Just works via weighted DNS
Monday, June 18, 12
28. Apache
* Well known
* PHP is native
* apache_note
* fast start time
* cheap in place replacement
* .htaccess
* Challenge: memory usage
Monday, June 18, 12
29. Apache: apache_note
intr Addit
osp ive!
ecti insa
on nely
thro
apache_note('etsy_uaid', $id); ugh usefu
the l!
life
cyc
le
Monday, June 18, 12
37. Etsy: Concurrency
* no native concurrency in PHP
* asynchronous HTTP calls
* Gearman
Monday, June 18, 12
38. Etsy: Async HTTP calls
* curl_multi_exec
* non-blocking, per request time outs
* used for optional aspects of a page
* curl against http://localhost to avoid
network overhead
Monday, June 18, 12
39. Etsy: Gearman
* language agnostic job server
* don’t use an MQ when you want a job
server
* 150 job types
* persistent jobs flushed to MySQL, read
from memory
* non-persistent jobs just stored in memory
* NP queue is wicked fast.
Monday, June 18, 12
40. Etsy: Gearman
* scaling CPU of cron jobs
* denormalizing data
* pushing to 3rd party services
Monday, June 18, 12
41. Etsy: Challenges
* Apache memory usage
* liveliness talking to services, no
concurrency, blocking by default
Monday, June 18, 12
43. Etsy: Challenges
* Apache memory usage
* liveliness talking to services: no
concurrency, blocking by default
Enforce liveliness with a judicious
application of force
Monday, June 18, 12
44. Etsy: judicious application of force
list($v, $res, $shar) = @fopen(‘/proc/self/statm', 'r');
$mine = $res-$shar;
if ($mine > $cfg[‘sizelimit’]) {
$pid = getmypid();
@exec("kill -USR1 $pid");
}
Monday, June 18, 12
45. Etsy: judicious application of force
Bowhunter
* Find long running PHP processes
* Try to avoid those mid-post
open(APACHE, "/usr/bin/curl -s http://localhost/server-
status|") || die "$!";
Monday, June 18, 12
46. Etsy: judicious application of force
Query_killer
* Same idea, long running queries
* MySQL “SHOW PROCESSLIST();”
Monday, June 18, 12
47. Memcache
* Caching, obviously
* Cache invalidation is hard
* Write buffering
* multi_get
* rate limits
Monday, June 18, 12
48. Memcache
* atomic INCR is awesome
* slice your time windows to reduce risk of
cache eviction
* we’ve been unlucky, lots of segfaults :(
* multi_get slows down the more boxes in the
pool
Monday, June 18, 12
49. MySQL: By the numbers
* 25K+ queries/sec avg
* 3TB InnoDB buffer pool
* 15TB + data stored
* 50 servers
* 99.99% queries under 1ms
Monday, June 18, 12
50. MySQL: a NotMuchSQL server
* no joins
* no foreign keys
* no transactions or locks
* no sub-selects
* store data like you want to read it.
* also: no auto_increment
Monday, June 18, 12
51. MySQL: a NotMuchSQL server
“Normalization is for sissie.”
- Cal Henderson, Flickr
Monday, June 18, 12
52. MySQL: scale horizontally
* objects shared by key
* lookups maintained in dbindex (MySQL is a
FAST key-value store)
* avoid key hashing, range partitions, and
partitioning functions
more: http://www.slideshare.net/jgoulah/the-etsy-shard-architecture-starts-with-s-and-ends-with-hard
Monday, June 18, 12
53. MySQL: Master-Master
* objects hashed to a side, avoid split brain
* allows in place schema upgrades without
slave promotion
* simplified capacity planning
more: http://codeascraft.etsy.com/2012/04/20/two-sides-for-salvation/
Monday, June 18, 12
55. MySQL: Deletes are expensive
* update objects to state=‘deleted’
* use partitions
* truncatenator - on ext3, hard link file, move,
delete slowly.
Monday, June 18, 12
56. Anatomy of a feature: Shop Stats
Monday, June 18, 12
57. Anatomy of a feature: Shop Stats
“Never get into a land war in Asia, and never
build an analytics tool on top of MySQL.
Monday, June 18, 12
58. Anatomy of a feature: Shop Stats
* buffer writes in Memcache using
predictable keys
* flush to MySQL tables periodically via cron
* bake old data into all possible date ranges,
and archived to S3
* truncate tables
Monday, June 18, 12
60. bcn.etsy.com: beaconed event stream
* Server-side and javascript event stream
* At least one per page view
* Apache serving static assets
* Aggregated on HDFS via logrotate
* Archived on S3
* Analyzed via JRuby/Cascading on Hadoop
* Doesn’t use: Flume, Scribe, etc
Monday, June 18, 12
62. Search
Search Master
BitTorrent to distribute indexes
Thrift, with server affinity
Search Slave01 Web01
to improve cache hit ratio,
just returns ids
Search Slave02 Web02
Search SlaveNN WebNN
100% of all indexes
on each slave
incremental index, every 7 minutes,
avoid even numbered cron times hydrate IDs via multi-get,
ignore a few failures
pull via cron,
push via gearman
denormalized listing store, databases and memcache
transition from MySQL to
Hbase, not user facing
Monday, June 18, 12
63. Search
* Solr trunk
* Custom ranking via crunched datasets
* BitSet fields for personalized search
* Scaling the JVM
* 32% of visits, 40% of sales
* Also powers categories, unshardable
queries
* Next time, just use HTTP
* Up next: custom codecs
* Avoiding sharding
Monday, June 18, 12
64. Search
* JVM slow start
* Search deployinator does rolling restart
* HotSpot and GC causes unpredictable
throughput
* Overfetch - ask multiple servers, go with 1st
response
* Index size is important. Don’t store too
much.
Monday, June 18, 12
65. Photos
* 400 million photos
* Uploaded locally, then
streamed to S3
* GraphicsMagick FTW
* Working set is tiny, served
out of Squid
* 2% read failure rate during
full S3 outage.
* 0% write failure rate
during full S3 outage.
JonathanOtis, http://www.etsy.com/listing/96361102/
Monday, June 18, 12
66. Technology no longer part of the stack
* Python Twisted
* PostgreSQL and stored procedures
* Scala and MongoDB
* Clojure and Tokyo Tyrant
* Rails
* ActiveMQ
* RabbitMQ
* a "Routes" framework
* building RPMs
* Lighttpd
Monday, June 18, 12
67. Take aways
1. A few simple, boring, well known
components
2. Extensive instrumentation
3. Rapid iteration and feedback loops
4. Human centric
5. A few tweaks on the classics for scale
6. Technology supports business goals
Monday, June 18, 12
68. Questions?
More info:
http://codeascraft.etsy.com
http://slideshare.net/etsy
http://github.com/etsy
http://www.etsy.com/jobs
kellan@etsy.com
Monday, June 18, 12