Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Database Agnostic Workload Management (CIDR 2019)

206 views

Published on

We present a system to support generalized SQL workload analysis and management for multi-tenant and multi-database platforms. Workload analysis applications are becoming more sophisticated to support database administration, model user behavior, audit security, and route queries, but the methods rely on specialized feature engineering, and therefore must be carefully implemented and reimplemented for each SQL dialect, database system, and application. Meanwhile, the size and complexity of workloads are increasing as systems centralize in the cloud. We model workload analysis and management tasks as variations on query labeling, and propose a system design that can support general query labeling routines across multiple applications and database backends. The design relies on the use of learned vector embeddings for SQL queries as a replacement for application-specific syntactic features, reducing custom code and allowing the use of off-the-shelf machine learning algorithms for labeling. The key hypothesis, for which we provide evidence in this paper, is that these learned features can outperform conventional feature engineering on representative machine learning tasks. We present the design of a database-agnostic workload management and analytics service, describe potential applications, and show that separating workload representation from labeling tasks affords new capabilities and can outperform existing solutions for representative tasks, including workload sampling for index recommendation and user labeling for security audits.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Database Agnostic Workload Management (CIDR 2019)

  1. 1. Database-Agnostic Workload Management Shrainik Jain, Jiaqi Yan*, Thierry Cruanes*, Bill Howe 1/21/2019 1
  2. 2. Workload Management and Analytics 2 Workload Summarization Index Selection Query Routing / Resource Allocation Query Recommendation Pick your favorite next challenge: Query Forensics Multi Query optimization Self-Tuning Databases Predicting Cache Performance Modeling User Behavior
  3. 3. Jain et al., CIDR 2019 3 Q High priority? (Q, priority) (Q, normal) Fast server
  4. 4. Jain et al., CIDR 2019 4 Q Heavy hitter? (Q, heavy) (Q, normal) Big cluster
  5. 5. Jain et al., CIDR 2019 5 Q Likely Error? (Q, error) (Q, no error) Instrumented cluster
  6. 6. Jain et al., CIDR 2019 6 Q Atypical query? (Q, atypical) (Q, typical) Workload summary for periodic index recommendation
  7. 7. Jain et al., CIDR 2019 7 Q Suspicious query? (Q, suspicious) (Q, not suspicious) Audit Log
  8. 8. Jain et al., CIDR 2019 8 Q (Q, estimated cost) big cluster optimizer
  9. 9. 9 Q heavy suspicious atypical priority (Q, heavy) (Q, heavy, suspicious) (Q, heavy, suspicious, atypical) (Q, heavy, suspicious, atypical, priority) RDS Workload Management = Learning and operationalizing a set of query labeling functions
  10. 10. Workload Management and Analytics 10 Workload Summarization Index Selection Query Routing / Resource Allocation Query Recommendation Pick your favorite next challenge: Query Forensics Multi Query optimization Self-Tuning Databases Predicting Cache Performance Modeling User Behavior
  11. 11. Jain et al., CIDR 2019 11 ○ Extract query type, count joins, etc. [Chaudhuri et al. 2002] ○ Extract fragments [Khoussainova et al. 2010] ○ Extract operators and sql functions [Jain et al. 2016] ○ etc. Every workload management task => feature engineering
  12. 12. 12 N TasksM SQL Dialects PostgreSQL Snowflake SQL Server and so on... Summarization Error Prediction Query Routing Security audits N * M feature extractors More if tenant- specific features are important Manual feature engineering is hopeless ● Many databases, many tasks ● Maybe ~10 database services, each with different dialects of SQL ● The dialects may change frequently, at different rates: ○ Ex: Snowflake SQL parser changes ~10 times / month on average ● 100s of millions of SQL-like queries per day (hour/minute/sec)... ● Workloads are diverse (yet structured) due to multi-tenancy
  13. 13. We want a query representation that can support all these learning tasks SELECT A FROM tableA, tableB WHERE tableA.B = tableB.A AND tableA.C LIKE ‘%something%’ [0.2, 1, 23, 0.01 … … … … …] Given a query Find a vector in k dimensional space that represents it. 13
  14. 14. 14 predic t SELECT D,E,F,G FROM tableA, tableB WHERE tableA.A = tableB.B AND tableA.C = 4Q23 Doc2Vec Word2Vec Totally novel automatic feature learning: Predict a token from its context; use the learned weights as a vector to represent the predicted token
  15. 15. 15
  16. 16. Lots of generic representations… 16 ● Treat queries (or plans) as sentences (natural language text) ● Use representation learning methods for text ○ Doc2Vec ○ LSTM autoencoders ○ LSTM encoder-classifiers ○ TreeLSTM encoder-classifiers on query plans ○ CNNs
  17. 17. Sanity check: TPC-H Query Representations for a TPC-H workload projected onto two dimensions using TSNE 17 Each color is a different TPCH query template The learned representations are at least minimally coherent Do generic NLP representations produce anything meaningful?
  18. 18. 18 Error Prediction big, real SQL workload Each point is a query that generated an error. Random sample of 4200 error-generating queries over a 7 day period. Colors are selected error codes OOM Error Unknown Timezone in Date Date Parse Error Divide by Zero
  19. 19. Error Prediction 19 Clusters are repeated syntactic patterns in the workload; they’re meaningful
  20. 20. DOES THIS ACTUALLY WORK? Jain et al., CIDR 2019 20
  21. 21. Datasets used 21 ● Datasets for training Embedders ● Datasets for training classifiers Workload Total Queries Distinct Queries Snowflake 500000 175958 TPC-H 4200 2180 Workload Total Queries Distinct Queries Snowflake- MultiError 100000 17311 Snowflake- OOM 4491 2501
  22. 22. Predicting OOM Errors 22 Method Precision Recall f1-score Contains heavy joins 0.729 0.115 0.198 Contains window functions 0.762 0.377 0.504 Contains heavy joins OR window functions 0.724 0.403 0.518 Contains heavy joins AND window functions 0.931 0.162 0.162 Query2Vec-LSTM 0.983 0.977 0.980 Query2Vec-Doc2Vec 0.919 0.823 0.869
  23. 23. Predicting Other Errors 23 ErrorCode Precision Recall f1-score #queries -1 (No Error) 0.986 0.992 0.989 7464 604 0.878 0.927 0.902 1106 606 0.929 0.578 0.712 45 608 0.996 0.993 0.995 3119 630 0.894 0.864 0.879 88 2031 0.765 0.667 0.712 39 90030 1 0.998 0.999 1529 100035 1 0.71 0.83 31 100037 1 0.417 0.588 12 100038 0.981 0.968 0.975 1191 100040 0.952 0.833 0.889 48 100046 1 0.923 0.96 13 100051 0.941 0.913 0.927 104 100069 0.857 0.5 0.632 12 100071 0.857 0.5 0.632 12 100078 1 0.974 0.987 77 100094 0.833 0.921 0.875 38 100097 0.923 0.667 0.774 18 ~90% P/R
  24. 24. Security Audits: Predict user, compare with actual user #queries #users Accuracy 73881 28 49.30% 55333 10 37.40% 18487 46 31.80% 5471 21 96.20% 4213 6 58.50% 3894 12 99.70% 3373 9 99.80% 2867 6 99.80% 1953 15 89.10% 1924 4 98.10% 1776 9 95.20% 1699 5 99.80% 1108 12 98.20% Account Labeling User Labeling Doc2Vec 78.8% 39% LSTMAutoencode r 99.1% 55.4%
  25. 25. Workload Summarization for Index Recommendation A lot of Queries Account_name = ‘xyz’ Workload Apply Filters 100 Queries Sample Uniform Sample Output Workload 25
  26. 26. 100 Queries A lot of Queries Account_name = ‘xyz’ Workload Apply Filters Summarization using query vectors Output Workload 26 ** Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, Allison Lee, “Snowtrail: Testing with Production Queries on a Cloud Database”, DBTEST 2018 ** Jiaqi Yan, Qiuye Jin, Shrainik Jain, Stratis D. Viglas, Allison Lee, “Snowtrail: Testing with Production Queries on a Cloud Database”, US Patent Application No. 62/646,817 Workload Summarization for Index Recommendation
  27. 27. Evaluation of workload summary: index recommendation 27 ○ Run the full workload with no indexes, record the time (t1) ○ Recommend and create indexes on the FULL workload ○ Run the full workload again, record the time (t2) ○ Generate small workload summary ○ Recommend and create indexes on the SUMMARY workload ○ Run the full workload again, record the time (t3) ○ Set a time budget for the recommender
  28. 28. 28 Transfer learning: We can even learn the model on Snowflake workload, and use it to infer representations for the TPC-H workload Workload Summarization for Index Selection
  29. 29. How good is this summary? 29
  30. 30. Querc: Query Classsifier 30 Reuse embeddings where possible Collect training labels from the databases (cost, error codes) Retrain models periodically, or online
  31. 31. Last slide ● Every workload management task is query labeling ● You don’t need fancy features ● You can’t maintain fancy features anyway ● SQL strings (and plans) have a lot of signal ● There is tons of training data ● Your workload is not “all possible queries” – use the patterns ● Transfer learning works – you can train on one workload and use on another ● Opens up a lot of simple interesting little applications ○ User behavior modeling, resource allocation, … ● External “query labeling service” keeps everything organized 31 Shrainik Jain
  32. 32. Query recommendation: Predict next query in a session 32
  33. 33. 33 Up is good Learned features about as good as manual features, even with generous assumptions

×