InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud

76 views

Published on

Miguel Gubitosi, Project Leader do Mercadolibre.com fala sobre SLA vs Agilidade: uso de microserviços e monitoramento de cloud no InterCon 2016.
Saiba mais em http://intercon2016.imasters.com.br/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
76
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

InterCon 2016 - SLA vs Agilidade: uso de microserviços e monitoramento de cloud

  1. 1. October 2016 First 90SLA vs. Agile Microservices and cloud monitoring
  2. 2. Why this talk?
  3. 3. This is our vision Building the foundation to Build a 3B Company by FY20 Agenda 1 . “Old World”: MercadoLivre’s original architecture. 2 . “Ground Zero”: shifting to microservices on the cloud 3 . Monitoring the cloud 4. Alarms: when things go south 5. “Fury”: streamlining DevOps at MercadoLivre
  4. 4. In numbers +400 deploys/day On +650 APPS +1000 Developers In 8 development centers +10 programming languages
  5. 5. In numbers +25.000.000 Request per minute +22.000 VM’s In 7 data centers +700 DB’s In 4 different engines
  6. 6. Old World
  7. 7. Old world architecture User ml.jar Huge DB
  8. 8. This is our vision Building the foundation to Build a 3B Company by FY20 Old world properties ● Monolithic ● Highly coupled code ● Unified SVN repository ● Single DB ● Simple infrastructure with little overhead ● Single QA team ● Closed system
  9. 9. This is our vision Building the foundation to Build a 3B Company by FY20 Deployments as ML grew Anyone at anytime
  10. 10. This is our vision Building the foundation to Build a 3B Company by FY20 Deployments as ML grew Anyone at anytime Some people, anytime
  11. 11. This is our vision Building the foundation to Build a 3B Company by FY20 Deployments as ML grew Anyone at anytime Some people, anytime Some people, once a week
  12. 12. This is our vision Building the foundation to Build a 3B Company by FY20 Deployments as ML grew Anyone at anytime Some people, anytime Some people, once a week Only by all experts together, at 3 AM, on thursdays not covered by any “freeze”
  13. 13. Ground Zero
  14. 14. Shifting to microservices Frontend API Frontend CRM Mobile apps 3rd party devs API API
  15. 15. This is our vision Building the foundation to Build a 3B Company by FY20 Ground zero properties ● Multiple technologies and frameworks (dev’s choice) ● Completely decoupled code in multiple Github repositories ● One DB for each app, multiple engines ● Complex infrastructure with possible high overhead ● QA, testing and Continuous Integrations is done by each team ● Independent deployments, environments and policies ● Open platform
  16. 16. “With great power comes great responsibility”. Stan Lee
  17. 17. This is our vision Building the foundation to Build a 3B Company by FY20 Developer responsibilities ● Developer gets ownership of entire dev cycle ● Massive empowerment of dev team -> OWNERSHIP Manage resources VMs Choose support systems required and create them Develop Code Choose your technology and keep your Github repository Test Create tests, regressions or CI as needed Ensure quality Define uptime Define what “up” means for your own app (health.sh) Measure Create metrics to analyze performance and downtime DBs and services Networking Create rules and loadbalancers to route traffic to application Create & scale computing pools for dev/test/prod React Deploy Write all routines for automatically deploying your app on any VM React to critical events that affect your app
  18. 18. DevTools in ML Developer Melicloud API - Create apps - Manage pools (test/prod) - Manage VMs & loadbalancers - Build & deploy - Create queues - Create DBaaS or KVSaaS - Create caches Github repo - Code app - Write test & deploy strategy - Write uptime definitions Nginx eventRouting & OpsGenie - Write rules to route traffic to your pools - Write rules to manage alarms - Define alarm escalation policies & schedules - Manage contact channels
  19. 19. Microservices in ML
  20. 20. Mobile apps Module Test app CI Main app Automated build & store deployment Repo Team Module Test app CI Repo Team Module Test app CI Repo Team
  21. 21. Monitoring mobile apps Module Main app Team Module Module Crash reporting Team Team
  22. 22. Monitoring the cloud
  23. 23. This is our vision Building the foundation to Build a 3B Company by FY20 New Relic ● Default monitoring in VMs golden image ● No configuration necessary (initially) HTTP errors Unhandled errors See if other devs/clients misuse your entry params Stack traces Fast debugging See what’s going on in production Unified pool data All instances’ traces in the same place Performance metrics Transaction traces See what’s taking so long Recognize deviations Graphs to see if traffic or response time vary w/ respect to another period Unsupported params Other services Detect down services affecting you Unexpected issues appear in production Apdex Score
  24. 24. This is our vision Building the foundation to Build a 3B Company by FY20 Datadog ● Easy to use for different frameworks ● Good for business specific metrics Custom metrics Complex metrics Graphs filtered with different dimensions Infra monitoring Full info More data than NR on disk, memory, network Scalable Handles well aggregating information from many different VMs Real time analysis Fast response Almost no latency Dashboards Customizable dashboards to show what’s more relevant for each app Online filtering Alarms Flexible alarms based on custom metrics You can send multiple parameters for events
  25. 25. This is our vision Building the foundation to Build a 3B Company by FY20 Log collection ● Logs are collected by an agent on all VMs ● They are sent to an ElasticSearch ● Access via a Kibana frontend ● Developers can use special syntax to create queryable dimensions for all logged events ● All instances’ logs in the same place ● Request tracing through multiple applications/APIs (request_id)
  26. 26. Alarms
  27. 27. Unified handling of events health.sh Code triggered alarms eventRouting
  28. 28. This is our vision Building the foundation to Build a 3B Company by FY20 Event routing ● Rules added by each team ● Check alarm origin, type and importance ● Check “quiet hours” ● Assign escalation policy and forward to OpsGenie
  29. 29. This is our vision Building the foundation to Build a 3B Company by FY20 OpsGenie ● Manage teams to deal with escalation policies ● Set “on call” schedules (w/substitutes & manager escalation) ● Everyone manages his contact methods (SMS, mail, phone call, app)
  30. 30. Fury
  31. 31. This is our vision Building the foundation to Build a 3B Company by FY20 Evolution Old world Ground zero Fury
  32. 32. This is our vision Building the foundation to Build a 3B Company by FY20 Fury: DevOps to NoOps ● Still microservices ● Full service oriented ● Easier dev cycle and learning curve ● Pre-assembled flavors for popular frameworks ● Less bash scripts, more UI based configuration ● Auto-scaling & auto-healing ● Docker based (smaller dev/prod environment gap) ● Designed to run on AWS ● Continuous integration already included
  33. 33. This is our vision Building the foundation to Build a 3B Company by FY20 Fury dashboard
  34. 34. This is our vision Building the foundation to Build a 3B Company by FY20 Dev Cycle in Fury: create app ● Creates repository ● Creates Jenkins CI server ● Creates network infra
  35. 35. This is our vision Building the foundation to Build a 3B Company by FY20 Dev Cycle in Fury: create scope ● Creates load balancer (ELB) ● Creates auto scaling group (ASG) for scope instances ● Creates instances ● Initialize logs & metrics services ● Download containers to instances ● Start traffic
  36. 36. This is our vision Building the foundation to Build a 3B Company by FY20 Dev Cycle in Fury: deploy ● Creates ASG for new version ● Create instances for new ASG ● Initialize logs & metrics services ● Download containers to instances ● Progressive traffic switch ● If candidate is OK, destroy previous infrastructure
  37. 37. ?
  38. 38. Thank you!

×