Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Good Things and Hard Things of SaaS Development/Operations

クラウド活用のアーキテクチャとDevOps事例勉強会

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Good Things and Hard Things of SaaS Development/Operations

  1. 1. © 2019 Arm Limited Satoshi Tagomori (@tagomoris) Principal Software Engineer, Arm Treasure Data クラウド活用のアーキテクチャとDevOps事例勉強会 Good Things and Hard Things of SaaS Development/Operations
  2. 2. © 2019 Arm Limited2 Satoshi Tagomori (@tagomoris) • Treasure Data: 2015~ • Arm Treasure Data (2018~) • Current: Backend Team • OSS: Fluentd, MessagePack-Ruby, Woothee, Norikra, ... • ISUCON • 2011~ (livedoor, NHN Japan, LINE)
  3. 3. © 2019 Arm Limited3
  4. 4. © 2019 Arm Limited4 What's DevOps? • Collaboration between Devs and Ops • "Development" and "System Operations" • For faster development, deployment, release cycles • For quicker improvements • Performance • Business Values
  5. 5. © 2019 Arm Limited5 Treasure Data Platform • Distributed Systems • Distributed Database Systems • Distributed Data Processing Systems • Job Queue & Workers • API Endpoints • Data Transferring, Conversion • ... • CDP (Customer Data Platform) Application • on the top of the platform • our application, as a Marketing platform for customers Treasure Data Platform Treasure Data CDP Customers' Marketing Businesses, Apps Customers' Data Analytics Workloads
  6. 6. © 2019 Arm Limited6 Engineering Teams: Own, Build and Operate Systems Teams own components • Design • Development • Configuration • Testing • Deployment • Release • Monitoring • Alert • Operation • On-call! SRE provides common features • OS base images • System-wide configurations • Shared tools (chatbot, etc) • ...
  7. 7. © 2019 Arm Limited7 DevOps... ? A team own everything around a component including Monitoring, Operations, On-call, ... 🤔
  8. 8. © 2019 Arm Limited8 Developers do Operations Nothing like "Devs vs Ops"! 😆
  9. 9. © 2019 Arm Limited9 THE END ... ? No, No, Wait, WE HAVE PROBLEMS 👻
  10. 10. © 2019 Arm Limited10 Distributed Systems by Many Teams Complexity over Teams Component Dependencies • Components depend on each other • Query engine <-> Database • Data ingestion <-> Database • Worker <-> Query engine • API <-> Worker • ... • Feature dependencies between components Data Compatibilities • Write data, Read data *later* • Data ingestion write data • Database subsystem read data • Query engine read data • ... • Incompatible data cause crashes *later*
  11. 11. © 2019 Arm Limited11 Various Backend Team Components Treasure Data Platform Backend Team Components Plazma (Distributed Database) Data Ingestion Workers Workflow API Hadoop Presto Data Connector CDP Frontend Our Customers
  12. 12. © 2019 Arm Limited • Historic components have stories... • Chef-based deployments • Old-style configurations • Old JVMs • Unorganized AWS components • Unorganized monitoring/ alerts Long History 12 Hard Things: Backend Components with Long History, Many Components and Severe Uptime Requirement • Backend was "not Frontend"
 in past • Different components • Distributed database • Data ingestion APIs • Data ingestion workers • Job workers • Workflow manager • Many ways to do things Many Components • Database, Workers • Referred by all other components • Downtime means entire service downtime • Data Ingestions • Downtime means user-side data loss Uptime Requirement
  13. 13. © 2019 Arm Limited13 The Thing Makes Us Slower Outdated Systems Poor System Improvements Bad Stability High Operation Cost Low Business Income Low QoL 😱 😨 😰
  14. 14. © 2019 Arm Limited Quick Delivery Feedbacks From Customers & Teams Frequent Deployment 14 Move Faster Modernized Deployment 😀
  15. 15. © 2019 Arm Limited15 Modernizing Deployment Old-style (Chef) vs Modern-style (CodeDeploy) Chef Deployed Systems Periodically • Developer pushed "production" branch • Release: Merge into "production" • Chef pulls "production" branch • On EC2 instances • Once per 30 minutes • Then, chef recipe does: • Build, Install dependencies • Restart processes (if needed) Developers Kick CodeDeploy • Service repo kicks CircleCI • CI creates CodeDeploy packages on S3 • Developers kick "deploy" or "release" • On Slack • To trigger CodeDeploy • CodeDeploy fetches packages from S3 • Build, Install, Restarts, etc 👤 👤
  16. 16. © 2019 Arm Limited16 Many, Quick, Frequent Deployments & Releases Split Release into Small Releases Minimize Affected Components • Giant release affects many components • Hard to say:
 "it's safe for all components" • Many small releases • Easy to say:
 "it's safe for THIS component!" Minimize Affected Customers • Many customers do various things on our platform • Release MAY affect their workloads • Query compatibility • Query performance • Invalid data handling • Data ingestion request patterns • "A customer says something goes wrong" • "What did happen in this 1 week?" • Make things clear for support operations
  17. 17. © 2019 Arm Limited17 Many, Quick, Frequent Deployments & Releases Split Release into Small Releases Minimize Affected Components • Giant release affects many components • Hard to say:
 "it's safe for all components" • Many small releases • Easy to say:
 "it's safe for THIS component!" Minimize Affected Customers • Many customers do various things on our platform • Release MAY affect their workloads • Query compatibility • Query performance • Invalid data handling • Data ingestion request patterns • "A customer says something goes wrong" • "What did happen in this 1 week?" • Make things clear for support operations
  18. 18. © 2019 Arm Limited18 Revisiting: What's DevOps? • Collaboration between Devs and Ops • "Development" and "System Operations" • Is It Only for System Operations? • "Operation" means not only system operations in many cases
 (For example, Chief "Operation" Officer) • We need to support many type of operations: • Support Operations • Sales Operations • Audit Operations
  19. 19. © 2019 Arm Limited19 Audit • Standards Compliance • ISO/IEC 27001:2013 • SOC-2 (Service Organization Controls) type 2 https://www.treasuredata.co.jp/security/
  20. 20. © 2019 Arm Limited20 Executing Infra Operations in Explicit Way Terraform Enterprise • AWS infra
 by code • Run code on Terraform Enterprise workspaces with history
  21. 21. © 2019 Arm Limited21 Executing Release Operations in Explicit Way Slack, Jira, Github and Automation • Trying release automation • Communication Central - Slack • History Central - GitHub • Release Central - Jira
  22. 22. © 2019 Arm Limited22 Own Your Service Only, Use Cloud Services Your service is heavy enough Own Your Service Only • Many things to do about your service • Design • Development • Configuration • Testing • Deployment • Release • Monitoring • Alert • Operation • On-call Use Cloud Services • No additional spaces
 to own additional things • Cloud services are to help your work • Be Agile, Move Faster!
  23. 23. Thank You Danke Merci 谢谢 ありがとう Gracias Kiitos 감사합니다 धन्यवाद ‫ا‬ً‫شكر‬ ‫תודה‬© 2019 Arm Limited

×