Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GeeCON Microservices 2015 scaling micro services at gilt

984 views

Published on

An evolution of the talk I gave at CraftConf earlier this year, talking about software architecture and micro-services at Gilt. Some new additions include ownership, service discovery and service anatomy.

Published in: Technology
  • Be the first to comment

GeeCON Microservices 2015 scaling micro services at gilt

  1. 1. scaling μ-services at Gilt ade@gilt.com Sopot, Poland 11th September 2015 Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman @gilttech
  2. 2. why was I late today? and… were micro-services to blame?
  3. 3. svc-localised-string mongodb login-reg mosaic product listing product search product search A localisation file was loaded with an character encoding The driver spun on CPU, consuming CPU credits The service starved and fell over. Core parts of the site were broken
  4. 4. so… … how did I really feel about micro-services yesterday?
  5. 5. gilt: luxury designer brands at discounted prices
  6. 6. we shoot the product in our studios
  7. 7. we receive, store, pick, pack and ship...
  8. 8. we sell every day at noon
  9. 9. stampede...
  10. 10. this is what the stampede really looks like...
  11. 11. rails to riches: 2007 - ruby-on-rails monolith
  12. 12. 2011: java, loosely-typed, monolithic services Hidden linkages; buried business logic Monolithic Java App; huge bottleneck for innovation. lots of duplicated code :( teams focused on business lines Large loosely- typed JSON/HTTP services
  13. 13. enter: µ-services “How can we arrange our teams around strategic initiatives? How can we make it fast and easy to get to change to production?”
  14. 14. 2015: micro-services
  15. 15. driving forces behind gilt’s emergent architecture ● team autonomy ● voluntary adoption (tools, techniques, processes) ● kpi or goal-driven initiatives ● failing fast and openly ● open and honest, even when it’s difficult
  16. 16. service growth over time: point of inflexion === scala.
  17. 17. what are all these services doing?
  18. 18. anatomy of a gilt service
  19. 19. anatomy of a gilt service - typical choices gilt-service-framework, log4j, cloudwatchCave, , java, javascript or
  20. 20. lines of code per service
  21. 21. # source files per service
  22. 22. service discovery: straight forward zookeeper Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)
  23. 23. from bare-metal... PHX IAD
  24. 24. … to vapour.
  25. 25. single tenant deployment: one AMI per service instance
  26. 26. reproducible, immutable deployments: docker
  27. 27. service discovery: new services use ELB zookeeper Amazon ELB
  28. 28. # running AMIs per service
  29. 29. lift’n’shift + elastic teams Existing Data Centre dual 10Gb direct connect line, 2ms latency
  30. 30. AWS instance sizing
  31. 31. evolution of architecture and tech organisation
  32. 32. Lessen dependencies between teams: faster code- to-prod Lots of initiatives in parallel Your favourite <tech/language/framework> here We (heart) μ-services Graceful degradation of service Disposable Code: easy to innovate, easy to fail and move on.
  33. 33. We (heart) cloud Do devops in a meaningful way. Low barrier of entry for new tech (dynamoDB, Kinesis, ...) Isolation Cost visibility Security tools (IAM) Well documented Resilience is easy Hybrid is easy Performance is great
  34. 34. seven μ-service challenges (& some solutions) no one ever said this was gonna be easy
  35. 35. 1. staging vs test-in-prod We find it hard to maintain staging environments across multiple teams with lots of services. ● We think TiP is the way to go: invest in automation, use dark canaries in prod. ● However, some teams have found TiP counter- productive, and use minimal staging environments.
  36. 36. 2. ownership Who ‘owns’ that service? What happens if that person decides to work on something else? We have chosen for teams and departments to own and maintain their services. No throwing this stuff over the fence.
  37. 37. 1. Software is owned by departments, tracked in ‘genome project’. Directors assign services to teams. 2. Teams are responsible for building & running their services; directors are accountable for their overall estate. bottom-up ownership, RACI-style
  38. 38. ‘ownership donut’ informs tech strategy 3. Ownership is classified: active, passive, at-risk. ‘done’ === 0% ‘at risk’
  39. 39. 3. deployment Services need somewhere to live. We’ve open-sourced tooling over docker and AWS to give: elasticity + fast provisioning + service isolation + fast rollback + repeatable, immutable deployment. https://github.com/gilt/ionroller
  40. 40. 4. lightweight APIs We’ve settled on REST-style APIs, using http: //apidoc.me. Separate interface from implementation; ‘an AVRO for REST” (Mike Bryzek, Gilt Founder) We strongly recommend zero-dependency strongly-typed clients.
  41. 41. 5. audit + alerting How do we stay compliant while giving engineers full autonomy in prod? Really smart alerting: http://cavellc.github.io orders[shipTo: US].count.5m == 0
  42. 42. 6. io explosion Each service call begets more service calls; some of which are redundant... => unintended complexity and performance Looking to lambda architecture for critical-path APIs: precompute, real-time updates, O(1) lookup
  43. 43. 7. reporting Many services => many databases => data is centralized. Solution: real-time event queues to a data-lake.
  44. 44. so… how did I really feel about yesterday’s outage? great.
  45. 45. svc-localised-string mongodb login-reg mosaic product listing product search product search A localisation file was loaded with an character encoding The driver spun on CPU, consuming CPU credits The service was small: it was re-written in about an hour, deployed and fixed the site. We knew exactly where the problem was. We focussed and rapidly deployed tentative incremental fixes. Once we fixed that problem, all of our problems were fixed. Try that in a monolith :)
  46. 46. scaling μ-services at Gilt ade@gilt.com Sopot, Poland 11th September 2015 Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman @gilttech

×