Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tokyo SRE Meetup - Building Reliable Services - A Journey from servers to services

1,038 views

Published on

TD Presents: Reliability x Large Scale talks for infrastructure and Site Reliability Engineering

Talk: Building Reliable Services - A journey from servers to services
Speaker: Chris Maxwell

Event: https://techplay.jp/event/657905
Location: Tokyo, Japan
Date: March 15, 2018

【TD Presents】「信頼性×大規模」サービスを運営する会社が語る!サービスを安定的、かつ、スケーラブルに運営するための技術事例勉強会 ~インフラ/SRE編~

Published in: Technology
  • Effective powerful love spell to get your Ex lover back urgently after breakup/divorce!. Hi everyone, I'm so excited. My ex-boyfriend is back after a breakup, I’m extremely happy that will are living together again. My boyfriend of a 4yr just broke up with me and am 30 weeks pregnant. I have cried myself to sleep most of the nights and don’t seem to concentrate during lectures sometimes I stay awake almost all night thinking about him and start to cry all over again. Because of this I end up not having energy for my next day’s classes, my attendance has dropped and am always in uni and on time. Generally he is a very nice guy, he ended it because he said we were arguing a lot and not getting along. He is right we’ve been arguing during the pregnancy a lot. After the break up I kept ringing him and telling him I will change. I am in love with this guy and he is the best guy I have ever been with. I’m still hurt and in disbelief when he said he didn’t have any romantic feelings towards me anymore that hurt me faster than a lethal syringe. He texts me now and then mainly to check up on how am doing with the pregnancy, he is supportive with it but it’s not fair on me, him texting me as I just want to grieve the pain and not have any stress due to the pregnancy. I was really upset and I needed help, so I searched for help online and I came across a website that suggested that Dr Ahmed can help solve marital problems, restore broken relationships and so on. So, I felt I should give him a try. I contacted him and he told me what to do and I did it then he did a spell for me. 17 hours later, my bf came to me and apologized for the wrongs he did and promise never to do it again. Ever since then, everything has returned back to normal. I and my bf are living together happily again... All thanks to Dr Ahmed if you have any problem contact Dr Ahmed now and I guarantee you that he will help you. Here’s his contact. Email him at: Ahmedutimate@gmail.com Call/what’s-app him: +2348160153829
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.thesisscientist.com/top-30-sites-for-download-free-books-2018
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Tokyo SRE Meetup - Building Reliable Services - A Journey from servers to services

  1. 1. BUILDING RELIABLE SERVICES
  2. 2. T R E A S U R E D A T A BUILDING RELIABLE SERVICES The journey from servers to services Chris Maxwell Site Reliability Manager
  3. 3. Treasure Data Services
  4. 4. WHY? Building Reliable Services • Reliability is an emergent property • You cannot buy reliability • You can invest in communication, tools, and processes that increase reliability
  5. 5. Product Sales M arketing Analytics DAILY WORKLOAD 1+ Million Events / Sec 400,000+ Queries / Day 15+ Trillion Rows / Day
 173+ Million Rows / Sec
  6. 6. MANY DEPLOYMENTS 8+ Environments Varying capabilities and scale per environment 50+ Services Not a micro services architecture… 275+ Deployments Production clusters from 3 to 200+ instances
  7. 7. RUNTIME CONVERGENCE Cookbooks Downloaded Configuration Management Server Pattern Code Downloaded Configuration Management of releases Runtime Failures Dependencies and Releases use same process Dependencies Downloaded 3rd Party dependencies are everywhere
  8. 8. OUR HERO Infrastructure Engineer Systems Engineer who owns the resources underlying services. Automation, Cloud, Networks, Security Groups, DNS, Production Support services Site Reliability Engineer Software Engineer and Systems Engineer that improves services with automation and system- wide tools and best practices
  9. 9. INCREASE VELOCITY Faster than Weekly Deployments • Releases through Configuration Management • Infrastructure team gatekeeping More Sites • We need more sites by end of the year • 50+ services per site
  10. 10. COMPLEX PLATFORM Where to Start? • Job Control • Query and Compute • Storage • Segmentation Many Differences • Ruby • Java • Hadoop • Presto • Scala Many teams • Backend • Query • API • Integrations • Frontend • Infrastructure Growth and Change • New features every week • Product evolution
  11. 11. SERVICE DELIVERY IS HARD Hero Refuses Politely… Teams continue using existing practices Foundation is Dirty Work Thankless tasks Change exposes implicit usage Measure Reliability Improves existing processes Starts measuring features
  12. 12. WISDOM FROM OUTSIDE Simple First “Everything should be made as simple as possible, but not simpler.” — Paraphrase of Albert Einstein
  13. 13. ON EXPERTS AND ADVICE You’re the expert given your specific context and needs
  14. 14. MENTOR RETURNS The number of “chunks” of context an human engineer 
 
 can retain is the: “magical number seven (7), plus or minus two” — George Miller
  15. 15. FIRST CHANGES Standard Deployment Targets For our environment, we need: • Site - data residency • Cloud - vendor / implementation • Region - resource location • Service - internal service name • Stage - delivery stages • Cluster - deployment target
  16. 16. HARD WORK AHEAD Reliability sometimes means rolling up your sleeves and getting dirty, working on core infrastructure to create a strong foundation to be reliable upon
  17. 17. FIRST CHANGES Standard Startup Services For our environment, we need: • preinit - discover deployment target • ephemeral - automatic volume mounting • final - bootstrap configuration management
  18. 18. KEEP IT SIMPLE “Complexity is the root cause of the vast majority of problems with software today” — Moseley & Marks
  19. 19. ACCEPTS CHALLENGE Standard Service Definition • Autoscale Group • Optional CodeDeploy Package • Internal Load Balancer • Internal DNS Endpoint • Optional External Load Balancer & DNS Endpoint
  20. 20. AUTOSCALING PRESTO Attach to the Team Our hero joins a service team Autoscaling Presto Helps to autoscale the entire service Work with Team Helps transition config into artifact
  21. 21. CODEDEPLOY PRESTO Learn from Team Their challenges and needs Artifact Code + Config Transition from simple autoscaling to Code + Config Artifacts Simple is Hard 3+ sources of configuration truth 12+ mostly same but different configurations Complexity was workaround for inflexible Configuration Management
  22. 22. MOVE FAST Direct API Tools • Service API not complete • Team needed compound operations Conductor to manage cluster ops • Built service-specific tools using underlying APIs • Routing and Segmentation
  23. 23. FRIENDS FOR THE JOURNEY AutoScaling & Launch Configuration IAM Instance-Profile RolesRoute53CodeDeploy EC2 Security GroupApplication Load Balancer & Target Group
  24. 24. MORE FRIENDS Trusting Team Software Engineering teams trusted our hero Outside Experience Engineers with Domain Specific experience helped our hero understand the systems
  25. 25. SLIDE TITLE value of explicitly defined service contracts talk first, software later
  26. 26. DELIVERY STATES Dangerous Shutdown Some services require careful shutdown procedures Delivery cannot hard-fail 14-day running jobs Loose definition of responsibility Delivery is an organic combination of Configuration Management, system service control, release control New Orchestration exposes old assumptions In-place is sub-optimal for 2-week jobs New-cluster is sub-optimal for remaining jobs
  27. 27. MENTOR RETURNS Tools express the process Process should uplift the organization “Tools are necessary but not sufficient. To build a future we all can live with, we have to build it together” — Bridget Kromhout
  28. 28. OUR HERO Service Tool Orchestrate 6 infrastructure APIs with MVP tools: • Leverage immediate gain • orchestration • Paying interest • Learning team needs and behaviour • Liability that must be paid in full • Intend to replace with API + client
  29. 29. SERVICES FIRST All services should look the same Any engineer can • Create a cluster • Update a cluster • Deploy to a cluster • Delete a cluster Safely, using the same tool
  30. 30. SLIDE TITLE Survey the Work How deep does the hole go? Start with Friends API and Segmentation Where to Start? Look for the greatest need
  31. 31. COMPLEXITY Complex Service(s) • Manual Post-Start Actions • Service Discovery because no standards Duplication in Many Places • 5 services of the same service • We were pushing the limits of legacy model
  32. 32. COMPLEXITY Unclear boundaries • Configuration ownership shared across teams • Service Discovery because no standards Unclear assumptions • Inconsistent naming and usage • The way it works now is the way it should be
  33. 33. MIGRATION Simplifying Complex Re-evaluate all choices in light of services-first Many Transitional Changes Startup Services Infrastructure to Application Precision Replacement Coordinated Handover Careful work
  34. 34. THE PROCESS Legacy Process • Servers First • Human Orchestration Transition • Services First • Automatic triggers legacy Value • Replace legacy with artifact
  35. 35. VISION Standard Services First With standards,
 exceptions are hard; Without standards, everything is hard
  36. 36. OUR HERO Autoscaling Implemented • Second Services Team: • Launched to Staging last week • Launched to Production yesterday
  37. 37. THE REWARD Service Patterns for Scaling • Deployment Targets • Standard Startup • Standard Services New Powers • On-Demand Clusters • Per-Cluster Versioning • Immediate Feedback
  38. 38. OUR HERO Your team builds it, your team runs it; we can help your team run it better
  39. 39. OUR BLUEPRINT Standard Services • Deployment Target • Internal Hostname • Internal Load Balancer • Autoscale Group • CodeDeploy Artifact Supporting Services Artifacts are easier with: • Configuration support hooks • Service Control hooks • Remote Execution hooks • Metrics, monitors, logs, alerts
  40. 40. REMAINING SERVICES 41+ Services Just 41+ more to go Each one needs conversion 200+ Deployments Just 200+ more to go Each one needs re-deployment Empathy Not all services were designed for a multi-cluster environments Not all services were designed for graceful termination Not all services have active improvements planned Challenges • Non-idempotent • State-full / Disk-full • Master/Worker Co-Services • Maintain Service Levels • High Throughput Environment
  41. 41. THE WAY HOME Best Practices Standard Services Standard Delivery Standard Tooling Work for Teams Improve Service as a Service Work with Teams Enable Super Powers Deploy on Demand Per-Cluster Versions
  42. 42. REMAINING SERVICES Service Improvements Target business value: Delivery Velocity High-Trust Services Support Config Management No Big-Bang Replacements Business Depends on Previous Process Strategy to Improve Small Iterations Incremental Value
  43. 43. OUR SERVICE IS NOT YOUR SERVICE All software is created within a context, and trade-offs are made based on that context
  44. 44. RELIABILITY Reliability is: The quality of being trustworthy or performing consistently well
  45. 45. INVESTMENTS Understandable Make every service easy to understand Allow any engineer to quickly operate and improve Consistent Make every service look the same Allow any engineer to work on any system without context Repeatable Practice makes perfect
  46. 46. HEROES ARE FOR STORIES
  47. 47. NO HEROES, ONLY TEAM Yuu Yamashita Takashi Kokubun Yuki Ito Chris Maxwell You? Site Reliability Engineer Robin Bowes You? Site Reliability Engineer You? Infrastructure Engineer You? Site Reliability Engineer
  48. 48. T R E A S U R E D A T A BUILDING RELIABLE SERVICES • @WrathOfChris
 https://twitter.com/WrathOfChris • Chris Maxwell
 https://www.linkedin.com/in/wrathofchris/ • 採用情報
 https://www.treasuredata.co.jp/careers/ • トレジャーデータ株式会社
 https://www.linkedin.com/company/treasure-data-inc-

×