Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to ensure Presto scalability 
in multi use case

2,951 views

Published on

Talk at Treasure Data Tech Talk at 2017 Sptring

Published in: Software
  • Be the first to comment

How to ensure Presto scalability 
in multi use case

  1. 1. How to ensure Presto scalability in multi use case Kai Sasaki Treasure Data Inc.
  2. 2. Kai Sasaki (@Lewuathe) Software Engineer at Treasure Data Inc. Hadoop/Presto/Spark
  3. 3. Presto In TD • 150000+ queries / day • 190+ TB processing / day • 10+ MB processing / query * sec • 100+ million processed records / query
  4. 4. Presto In TD Prestobase Proxy PerfectQueue query Plazma data Presto TD API BI Tool HTTP
  5. 5. How to make it scalable • Prestobase Proxy • Node scheduler • Resource Group
  6. 6. Prestobase proxy
  7. 7. Prestobase proxy Prestobase proxy aims to provide the interface especially for BI tools through JDBC/ODBC and also to replace Prestogres.
  8. 8. Presto In TD Prestobase Proxy PerfectQueue query Plazma data Presto TD API BI Tool HTTP
  9. 9. Prestobase proxy • Written in Scala • Finagle base RPC proxy • Running as Docker container • A user of Airframe • VCR base light-weight test framework
  10. 10. Finagle Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. Finagle implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency. see: https://twitter.github.io/finagle/
  11. 11. Finagle protected val service: Service[Request, Response] = bind[SomeFilter] andThen bind[AnotherHandler] andThen LastFilter andThen prestoClient Build request pipeline by binding filter, handlers with Airframe
  12. 12. Airframe Airframe is a trait base dependency injection framework using Scala macro - https://github.com/wvlet/airframe
  13. 13. Airframe - Dependency injection tailored Scala - Tagged binding with wvlet https://github.com/wvlet/wvlet - Object lifecycle management
  14. 14. Airframe val design : Design = newDesign .bind[X].toInstance(new X) // Bind type X to a concrete instance .bind[Y].toSingleton // Bind type Y to a singleton object .bind[Z].to[ZImpl] // Bind type Z to an instance of ZImpl import wvlet.airframe._ trait App { val x = bind[X] val y = bind[Y] val z = bind[Z] // Do something with X, Y, and Z } val session = design.newSession val app : App = session.build[App]
  15. 15. VCR testing framework Record test suite HTTP interaction to make test stable and deterministic see more detail https://testing.googleblog.com/2016/11/what-test-engineers-do-at-google.html
  16. 16. VCR testing framework protected val service: Service[Request, Response] = bind[SomeFilter] andThen bind[AnotherHandler] andThen QueryRewriter andThen bind[RequestVCR] andThen prestClient protected val service: Service[Request, Response] = bind[SomeFilter] andThen bind[AnotherHandler] andThen QueryRewriter andThen bind[NoRecording] andThen prestClient On CI On Production
  17. 17. Prestobase VCR testing framework RequestVCRClient … … SQLite Recording
  18. 18. Prestobase VCR testing framework RequestVCRClient … … SQLite Replaying
  19. 19. Prestobase proxy Will be open sourced soon
  20. 20. Node Scheduler
  21. 21. Node Scheduler Submitting query follows… - Analyze query AST - Make query logical/physical plan - Schedule each stage
  22. 22. Node Scheduler query stage2 stage1 stage0 task2-0 task2-1 task2-0 task1-0 task1-1 task0-0 Table Scan output
  23. 23. Node Scheduler NodeScheduler creates NodeSelector that selects worker nodes on which tasks are scheduled. NodeSelector picks up worker nodes when there is available splits.
  24. 24. Node Scheduler in TD Keeps worker node map that can be candidate for launching next tasks. - Ignore min candidates - Limit by available memory pool
  25. 25. Node Scheduler in TD Back to normal memory pool usage after task is completed.
  26. 26. Node Scheduler in TD Challenges - Smoothing CPU time metric - Split type awareness - Avoid problematic worker nodes
  27. 27. Resource Group
  28. 28. Resource Group Resource Group was introduced since 0.147 → https://prestodb.io/docs/current/admin/resource-groups.html Resource Group aims to limit the resource usage by account/group/query.
  29. 29. Resource Group rootGroup general adhoc softMemoryLimit: 100% maxQueued : 5000 maxRunning : 1000 softMemoryLimit: 100% maxQueued : 100 maxRunning : 200 softMemoryLimit: 100% maxRunning : 1000
  30. 30. Resource Group limits - maxQueued - maxRunning - softMemoryLimit Following queries will be queued - softCpuLimit Impose penalty against max running queries - hardCpuLimit Following queries will be queued
  31. 31. Resource Group scheduling - schedulingPolicy - fair : FIFO - weighted : Selected stochastically - query_priority : Selected according to priority - schedulingWeight
  32. 32. Resource Group Every query must be associated to a resource group. The matching can be done by configured selector. { "user": “bob", "group": "general" }, { "source": “.*adhoc.*", "group": "global.adhoc.adhoc_${USER}" }
  33. 33. Resource Group rootGroup general adhoc softMemoryLimit: 100% maxQueued : 5000 maxRunning : 1000 softMemoryLimit: 100% maxQueued : 100 maxRunning : 200 softMemoryLimit: 100% maxRunning : 1000 Bob’s query Bob’s query …
  34. 34. Resource Group DI Easily change resource group config behavior with Guice injection. - ResourceGroupConfigurationManager - configure(ResourceGroup, SelectionContext) - ResourceGroupSelector - match(Statement, SelectionContext)
  35. 35. SelectionContext SelectionContext holds the information for associating submitted query. - Authenticated - User - Source - Query Priority Currently available as default
  36. 36. { "runningQueryIds": ["query1", "query2"], "accountId": 1, "children": [{ "memoryUsage": 12345, "runningQueryIds": [“query1"], "children": [], "runningQueries": 1, "queuedQueries": 0, "maxRunningQueries": 2, "resourceId": "general" }, { "memoryUsage": 26296, "runningQueryIds": ["query2"], "children": [], "runningQueries": 1, "queuedQueries": 0, "maxRunningQueries": 2, "resourceId": "scheduled" }], "runningQueries": 2, "maxRunningQueries": 30, } Queries in parent group Running query in general Running query in scheduled
  37. 37. Recap Distributed system often requires each component to be stable and scalable. We can make Presto ecosystem reliable by doing… - Code modification reliability with DI - VCR testing - Multi dimensional resource scheduling - Resource isolation makes multi-tenant distributed SQL engine reliable

×