Optimizing forchange: Taking risks safely &Kellan Elliott-McCrea@kellanCTO, Etsy
Launched June 18, 2005 in Brooklyn875,000 monthly active sellers33.5MM items for sale$525MM in sales in 20111.43B page vie...
Take more risks.Build a better software.    Have more fun.
“Sure that works whenyou’re building socialsoftware but what about areal business with $$$involved?”- everybody always
ContinuousDeployment:    small changes,   pushed frequently
you can’t avoidmaking mistakes  you can avoidmaking BIG mistakes
What are you optimizing for?MTTR                    MTBF
MTTR   MTBF
4 core techniques:  1. Put a Button On It  2. Branch in Code  3. Trunk is Always Deployable 4. Dark/Incremental Launches
Put a Button On It.
Branch in use features4code:  core techniques: flags if ($cfg[‘awesome_new_search’]) {     # new hotness     $rsp = do_solr...
Branch in4code: use features flags  core techniques: for free you get: 1% launches admin only launches dark launches split ...
any engineer can launch anexperiment to57 experiments live right
Metrics drivenmeasureeverything!feedback loops!
Engineers love tomake it ridiculouslyeasy
Metrics drivenStatsD::timing("page.render", $msec);
Metrics driven
Metrics aren’t optionala feature isn’t donewithout metrics
Make metrics visibleremove thepasswords
Some tools: Graphite, Ganglia, Logster*, StatsD*, event beacons, log files, EMR, Vertica, Splunk
Getting started? UseStatsD @Instagram, Pinterest, Github,Mozilla, LAN.com, Zynga,Kickstarter, LivingSocial and70+ other co...
Step 1: your 5 core@ Etsy:sign ups, logins, checkout,new listings, posts in thebugs forums
Who watches the graphs?
Automate youranalysis   USE COMPUTERS!
Automate youranalysis  holtWintersConfidence(Upper|Lower)
Automate youranalysis continuous integration:unit tests, codingstandards,static analysis, risky codepaths
Make effective security   easy by default            Make insecure            patterns “grep-                 able”
Actively monitor for      attacks.      Spikes in 500s and     failed logins are your            first clue.
“I discovered the vuln late Friday afternoon andwasnt quite ready to email it to them. Saturdaymorning, I confirmed the hol...
Treat independent security researches with respect.
“Culture eats strategy for breakfast”*     (*possibly
Thank you!
Optimizing for change: Taking risks safely & e-commerce
Upcoming SlideShare
Loading in …5
×

Optimizing for change: Taking risks safely & e-commerce

1,976 views

Published on

2 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total views
1,976
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
15
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide
  • or, what to do when people tell you lean startup techniques don’t work in an ecommerce setting.\nfeels like talking about lean startup at next context is preaching to the choir :)\n
  • Who here knows Etsy?\nBought something?\nSeller?\nAwesome\nWe’re a marketplace of artists, and craftspeople\n
  • some quick info about Etsy. launched 7 years ago. nearly 900k sellers, selling 33million items.\nwe sold over $500 million last year, a\n
  • lean startup techniques are about rapid cycles of hypothesis, change and learning. \nand change is usually viewed as source risk. how does QA work? how do you avoid making mistakes? how can you prove the software is correct.\n
  • does that stuff really work when there’s money involved?\nwe’re on track to do a billion dollars in sales this year, it’s not huge, it’s not small, and it’s definitely real money. \n
  • this approach of deploying the site frequently is called continuous deployment.\nand counter intuitively it IS a risk mitigation technique\n\n
  • 20 lines of code i wrote 10 mintues ago are much easier to diagnose and fix, then a 50,000 lines of diff in a weekly release of code i wrote two weeks ago.\n
  • everything is optimized for something. \n
  • failure is inevitable, make it cheap.\nmarines, 4 minutes. feeling i want from my software.\n\n
  • how continuous deployment works\n
  • step 1. take all your build scripts, and make files, and rsync shells, and wrap them in a simple web page, and making that deploys your application. it doesn’t matter how often you push it. having a button that ANY ENGINEER can push at also any time, is the first step.\n
  • step 2. source controls systems were built by people who built software that shipped on floppy drives. they made sense for them. it don’t make sense for you. make your application aware of the of it’s history using feature flags. \n
  • \n
  • new ideas come from everywhere, but improvements can be hard to find. spread a wide net. and make your engineers more deeply invested in your product. \n
  • it’s great we’re running experiments, and we’re changing things all the time. how do we learn from it? measure everything!\n
  • who has the time to measure everything? if you make it easy, they will do it. at a ridiculous rate. we monitor 340k metrics a second.\n
  • starting collection metrics is as simple as droping a line in your code. this is using the PHP bindings, but there are bindings for every language. sends out over UDP. functionally free.\n\n
  • and here’s an example of what you get out of that call.\n
  • just like it’s not done if there aren’t tests, or the stories aren’t complete. metrics are part of the deliverables, and they’re core to making continuous deployment effective and safe.\n
  • Don’t hide metrics behind passwords. Our core metrics are available to anyone who walks into our office, but at least everyone on staff should be able to learn from them and spot issues. Good data begets good data begets good decisions. Transparency is hard work. But it’s worth it.\n
  • some tools we use for metrics collection. Graphite and Ganglia are open source, Logster and StatsD we open sourced. \n
  • getting started with metrics collection? choose StatsD you’re in good company.\n
  • getting started is easy. you don’t need 340k metrics to start. you just need your core 5.\nat Etsy our core 5 are sign ups, logins, checkout, new listings, and posts in our bugs forum, because if the bugs are blowing up, we’ve probably broken something. start there\n
  • so who watches 340k graphs?\n
  • computers! change is happening too rapidly in a distributed system for a human to detect all the issues (though humans are surprisingly good at it)\n
  • exponentially smoothed historical averages. alert when a metric leaves the confidence bounds. \n
  • CI is great for testing. but it’s also great for a lot of other automated analysis of your code that a computer can do. computers watching for people making dumb mistakes.\nchanges to crypto code, the sessions, anything which interacts with files, or shared memory.\n
  • sanitize input before it hits your application. force developers to override safe choices. monitor people making those choices. people shouldn’t have to think by default in a rapidly changing environment. remove ambiguity from the system.\n
  • security shouldn’t be blocking changes, it should be following behind your wave of changes as closely as possible looking for anomalies. automate this. alert on it. 500s are a good sign that someone is probing and they’ve found a weakness.\n
  • actively respond to attacks. fix bugs while they’re being researched. they’ll post about it on Reddit and win you credibility with the independent security researcher community.\n
  • actively respond to attacks. fix bugs while they’re being researched. be polite. be responsive. give them a way to contact you and a policy around security disclosure. celebrate them, add them to a webpage, send them schwag.\n
  • just take smart risks that optimize for your ability to change and learn.\n
  • and when in doubt. make sure you have a great company culture. none of this works without the support and buy in of your peers.\n
  • \n
  • Optimizing for change: Taking risks safely & e-commerce

    1. 1. Optimizing forchange: Taking risks safely &Kellan Elliott-McCrea@kellanCTO, Etsy
    2. 2. Launched June 18, 2005 in Brooklyn875,000 monthly active sellers33.5MM items for sale$525MM in sales in 20111.43B page views, in Aug102 engineers74 releases, yesterday
    3. 3. Take more risks.Build a better software. Have more fun.
    4. 4. “Sure that works whenyou’re building socialsoftware but what about areal business with $$$involved?”- everybody always
    5. 5. ContinuousDeployment: small changes, pushed frequently
    6. 6. you can’t avoidmaking mistakes you can avoidmaking BIG mistakes
    7. 7. What are you optimizing for?MTTR MTBF
    8. 8. MTTR MTBF
    9. 9. 4 core techniques: 1. Put a Button On It 2. Branch in Code 3. Trunk is Always Deployable 4. Dark/Incremental Launches
    10. 10. Put a Button On It.
    11. 11. Branch in use features4code: core techniques: flags if ($cfg[‘awesome_new_search’]) { # new hotness $rsp = do_solr(); } else { # boring old stuff $rsp = do_grep(); }
    12. 12. Branch in4code: use features flags core techniques: for free you get: 1% launches admin only launches dark launches split tests
    13. 13. any engineer can launch anexperiment to57 experiments live right
    14. 14. Metrics drivenmeasureeverything!feedback loops!
    15. 15. Engineers love tomake it ridiculouslyeasy
    16. 16. Metrics drivenStatsD::timing("page.render", $msec);
    17. 17. Metrics driven
    18. 18. Metrics aren’t optionala feature isn’t donewithout metrics
    19. 19. Make metrics visibleremove thepasswords
    20. 20. Some tools: Graphite, Ganglia, Logster*, StatsD*, event beacons, log files, EMR, Vertica, Splunk
    21. 21. Getting started? UseStatsD @Instagram, Pinterest, Github,Mozilla, LAN.com, Zynga,Kickstarter, LivingSocial and70+ other companies
    22. 22. Step 1: your 5 core@ Etsy:sign ups, logins, checkout,new listings, posts in thebugs forums
    23. 23. Who watches the graphs?
    24. 24. Automate youranalysis USE COMPUTERS!
    25. 25. Automate youranalysis holtWintersConfidence(Upper|Lower)
    26. 26. Automate youranalysis continuous integration:unit tests, codingstandards,static analysis, risky codepaths
    27. 27. Make effective security easy by default Make insecure patterns “grep- able”
    28. 28. Actively monitor for attacks. Spikes in 500s and failed logins are your first clue.
    29. 29. “I discovered the vuln late Friday afternoon andwasnt quite ready to email it to them. Saturdaymorning, I confirmed the hole was still thereand fixed a few bugs with my demo.I had my girlfriend test it from her house. Itdidnt work for her. I tested again and it hadstopped working for me. Sure enough, it wasnow properly sanitized and had the correctJSON MIME type.The following Monday I received a responsethanking me for reporting it, and telling me Iwas right. “
    30. 30. Treat independent security researches with respect.
    31. 31. “Culture eats strategy for breakfast”* (*possibly
    32. 32. Thank you!

    ×