Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto as a Service - Tips for operation and monitoring

5,580 views

Published on

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Presto as a Service - Tips for operation and monitoring

  1. 1. Presto as a Service Tips for operation and monitoring Taro L. Saito Treasure Data, Inc. leo@treasure-data.com January 20, 2015 Presto Meetup Japan @ FreakOut, Roppongi
  2. 2. About Me: @taroleo •  2007 University of Tokyo. Ph.D. –  XML DBMS, Transaction Processing •  Relational-Style XML Query. ACM SIGMOD 2008 •  ~ 2014 Assistant Professor at University of Tokyo –  Genome Science Research •  Distributed Computing •  2014.3月~ Treasure Data –  Software Engineer, MPP Team Leader 2
  3. 3. My Open Source Projects •  sqlite-jdbc –  SQLite DBMS for Java –  1file =1DB •  snappy-java –  Fast compression library –  More than 100,000 downloads/month •  Used in Spark, Parquet, etc. •  msgpack-java •  UT Genome Browser (UTGB) –  Visualization of massive amount of genome science data 3
  4. 4. Topics •  Presto as a Service in Treasure Data –  Error Recovery –  Presto Deployment •  Tips for Monitoring Presto –  JSON API –  Presto + Fluentd 4
  5. 5. Treasure Data: Presto as a Service 5 Presto Public Release
  6. 6. Hive TD API / Web ConsoleInteractive query batch query Presto Treasure Data PlazmaDB: MessagePack Columnar Storage td-presto connector
  7. 7. Deployment •  Building Presto takes more than 20 minutes. •  Facebook frequently releases new versions •  Let CircleCI build Presto –  Deploy jar files to private Maven repository –  We sometime use non-release versions •  for fixing serious bugs •  hot-fix patches •  Integration Test –  td-presto connector •  PlazmaDB, Multi-tenant query scheduler •  Query optimizer –  Run test queries on staging cluster 7
  8. 8. Production: Blue-Green Deployment •  http://martinfowler.com/bliki/BlueGreenDeployment.html •  2 Presto Coordinators (Blue/Green) –  Route Presto queries to the active cluster –  No down-time upon deployment •  Launch Presto worker instances with chef <- less than 5 min. in AWS •  Inactive clusters is used for pre-production testing and customer support –  Investigation and tuning of customer query performance –  Trouble shooting 8
  9. 9. Error Recovery •  Presto has no fault tolerance •  Error types –  User error •  Syntax errors –  SQL syntax, missing function •  Semantic errors –  missing tables/columns –  Insufficient resource •  Exceeded task memory size –  Internal failure •  I/O error –  S3/Riak CS •  worker failure •  etc. 9 Worth A Retry!
  10. 10. Failed Query Rate 10
  11. 11. 11
  12. 12. Query Retry Patterns used in TD •  Error code + message pattern 12
  13. 13. Monitoring Presto •  REST API for monitoring Presto state –  JSON format •  (presto server IP):8080/v1/query –  List of recent queries (BasicQueryInfo class) •  (presto server IP):8080/v1/query/(query id) –  Detailed query state information –  Query plan, tasks and running worker IDs –  Processed rows/data size 13
  14. 14. Query List /v1/query 14
  15. 15. Detailed query Info /v1/query/(query id) 15
  16. 16. /ui/query-execution/(query id) 16
  17. 17. Complex Queries 17
  18. 18. 18
  19. 19. Presto Coordinator •  Organizes query execution pipelines –  Coordinates presto workers •  Retrieves table partition and split location from connectors –  Creates distributed query plans •  Full GC –  Stalls coordinator •  When memory is insufficient –  Use memory-rich machine –  GC Tuning •  CMSInitiatingOccupancyFraction 19
  20. 20. Monitoring Presto with Fluentd 20 Hive Presto
  21. 21. presto-metrics (Ruby) •  https://github.com/xerial/presto-metrics 21
  22. 22. 22
  23. 23. 23
  24. 24. Detecting Anomaly •  Started Query Rate (in 5min/15min) –  If no query has started, cluster may be down (or not started properly) •  Processed rows in a query –  Sum up the number of the processed rows from all of the sub stages –  Simple, but the most reliable measure •  Send an alert –  HipChat notification –  PagerDuty call •  JP/US team rotation 24
  25. 25. Benchmarking •  Query performance comparison –  between two versions of Presto •  Benchmark –  Run query set multiple times –  Store the results to TD –  Report the result with Presto •  Aggregation query 25
  26. 26. Presto Operation Tool •  Prestop –  Our internal tool for managing multiple presto clusters •  written in Scala –  Query monitoring –  Benchmarking –  Workload simulation •  stress testing •  Monitoring –  Librato –  Datadog –  ChartIO (query stats) 26
  27. 27. WE ARE HIRING! 27 Check: www.treasuredata.com

×