SlideShare a Scribd company logo
1 of 18
JECB: a Join-Extension,
Code-Based Approach to
OLTP Data Partitioning
Khai Q. Tran -- Oracle Labs1
Jeffrey F. Naughton – UW Madison
Bruhathi Sundarmurthy -- UW Madison
Dimitris Tsirogiannis -- Cloudera2
1
1: Work done at UW Madison
2: Work done at the Microsoft Gray Systems Lab
Motivation
 Modern OLTP applications often require
high transaction throughput
 Scaling out OLTP transactions in
distributed systems is challenging
◦ Communication protocols
(2PC, Paxos) are expensive
 A smart way to partition the
database to avoid distributed
transactions is crucial
2
The problem
Given a database, a transaction
workload, and k processing elements,
horizontally split the database into k
partitions such that the number of
distributed transactions is minimized.
Tuesday, June 24, 2014 3
Work done in the past: parallel
database partitioning
Tuesday, June 24, 2014 4
OLTP Parallel database
Goal
Wants local
execution, avoids
splitting xacts
Wants parallelism
power by splitting
query/xact
Workload
properties
Simple queries,
touching few
records
Complex queries,
large scans
Old techniques may not be
applicable for the new problem
State of the art 1 - Schism
 A statistic-based fine-granularity
approach:
◦ Models training transactions as a tuple graph:
tuples nodes and transactions  edges
connecting the tuples
◦ Uses graph min-cut algorithms to partition the
graph into k partitions
 Mapping tuple  partition id
◦ Uses machine learning techniques to infer
partitioning attributes from the sample
mapping
◦ May not scale well for large databasesTuesday, June 24, 2014 5
State of the art 2 -
Horticulture
 A schema-based approach
◦ Uses schema to search for the best
partitioning attribute for each table
◦ Employs an advanced search technique
(large-neighborhood search) to reduce the
search space
◦ Uses training trace and a data skew-aware
cost function to evaluate the partitioning
quality
◦ May not find good partitioning solutions for
complicated workloadsTuesday, June 24, 2014 6
Our approach - JECB
 Join-extension (idea):
◦ Explicitly uses key-foreign key relationships from
schema to partition through joins
 Code-based (algorithm):
◦ Analyze SQL code from store procedures to find the
best way joining tables together
 Divide-and-conquer (optimization):
◦ Consider each class of transactions separately and
combine results later
Tuesday, June 24, 2014 7
Simple example for Join
Extension
Two tables: R(X, Y) and S(Y, Z)
Query: SELECT * FROM R, S
WHERE R.Y = S.Y AND S.Z = 100
The best way to partition R ?
- Not by any columns of R
- By S.Z and using join R.Y = S.Y
Tuesday, June 24, 2014 8
Our data partitioning
framework
 Phase 1: Pre-processing
◦ Collecting workload trace (sample run) and
identifying read-only tables
 Phase 2: Partitioning individual transaction
types
◦ Finding the best partitioning solution for each
type of transactions using join extensions in
transaction source code
 Phase 3: Merging
◦ Combining results from all transaction types to
find the best global solution.
Tuesday, June 24, 2014 9
Phase 2 demo on TPC-E Customer-Position
transactions
IF (@cust_id = 0)
SELECT @cust_id = C_ID
FROM CUSTOMER
WHERE C_TAX_ID = @tax_id
SELECT CA_ID, CA_BAL,
SUM(ISNULL(HS_QTY, 0) * ISNULL(LT_PRICE, 0))
FROM CUSTOMER_ACCOUNT WITH (UPDLOCK) LEFT LOOP JOIN
HOLDING_SUMMARY ON CA_ID=HS_CA_ID LEFT LOOP JOIN LAST_TRADE ON HS_S_SYMB = LT_S_SYMB
WHERE CA_C_ID = @cust_id
GROUP BY CA_ID,CA_BAL
ORDER BY 3 ASC
SELECT TOP 30
TH_DTS,
T_QTY,
T_S_SYMB,
T_ID,
ST_NAME
FROM (SELECT TOP 10 T_ID AS ID
FROM TRADE
WHERE T_CA_ID = @acct_id
ORDER BY T_DTS DESC) AS T
JOIN TRADE ON T_ID=ID
INNER LOOP JOIN TRADE_HISTORY ON TH_T_ID=T_ID,
STATUS_TYPE
WHERE ST_ID = TH_ST_ID
ORDER BY TH_DTS DESC
Tuesday, June 24, 2014 10
CA_ID CA_C_ID
CUSTOMER_ACCOUNT
HS_S_SYMB, HS_CA_ID HS_CA_ID
HOLDING_SUMMARY
T_ID T_CA_ID
TRADE
HS_S_SYMB
LT_S_SYMB
LAST_TRADE
C_ID C_TAX_ID
CUSTOMER
TH_T_ID, TH_ST_ID TH_T_ID
TRADE_HISTORY
ST_ID
STATUS
TH_ST_ID
CA_ID CA_C_IDCA_C_ID
C_IDC_ID C_TAX_ID
CUSTOMER
Step 1: Identify accessed tables and accessed attributes from SQL codeStep 2: Connect tables with key-foreign joins to form a join graphStep 3: List all possible join trees from the join graphStep 4: Use the workload trace to select the best join tree
Implementation
Tuesday, June 24, 2014 11
Workload
driver
Partitioning
Algorithm
P
Simulation
framework
CostCost Model
Trace
Collector
Training
trace
Database schema
Transaction SQL code
Testing
trace
Partitioning solution
SQL Server 2012
Scalability experiments with
TPC-C
Tuesday, June 24, 2014 12
0
50
100
150
200
250
0
1
2
3
4
5
6
0.2 0.4 0.8 1.6 3.2 Thousand
transactions
GB
Database size (GB)
Memory
Traning
size
Schism JECB
0
50
100
150
200
250
0
1
2
3
4
5
6
0.2 0.4 0.8 1.6 3.2
Thousand
transactions
GB
Database size (GB)
Memory
Quality experiments
Tuesday, June 24, 2014 13
0
10
20
30
40
50
60
%ofdist.transactions
Schism 10%
coverage
Horticulture
JECB
TPC-E partitioning scheme
14
Table Horticulture JECB
ACCOUNT_PERMISSION AP_CA_ID replicated
CUSTOMER_TAXRATE CX_C_ID replicated
DAILY_MARKET DM_DATE replicated
WATCH_LIST WL_C_ID replicated
CASH_TRANSACTION CT_T_ID CT_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
CUSTOMER_ACCOUNT replicated CA_C_ID
HOLDING H_CA_ID H_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID
HOLDING_HISTORY HH_T_ID HH_T_ID=T_ID, T_CA_ID = CA_ID,
CA_C_ID
HOLDING_SUMMARY HS_CA_ID HS_CA_ID = CA_ID, CA_C_ID
SETTLEMENT SE_T_ID SE_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
TRADE T_CA_ID T_CA_ID = CA_ID, CA_C_ID
TRADE_HISTORY TH_T_ID TH_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
TRADE_REQUEST replicated TR_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
Join extensions help to deliver
better results on complicated
workloads
What we learnt from
Experiments
Tuesday, June 24, 2014 15
Schema-based approach
Statistics-based
approach
JECB:
• Divide-and-conquer and join
extensions to break complex
workloads
• Restrictions: all joins are key-
foreign key joins
Workload complexity
SimpleComplicated
Databasesize
SmallLarge
?
Conclusion and future work
 We developed a promising approach
for partitioning complicated OLTP
databases.
 Future work:
◦ Address the data skew problem
◦ Try more complicated cost models rather
than % of distributed transactions
◦ End-to-end performance experiments
Tuesday, June 24, 2014 16
Phase 2 (Background)
 Join path:
◦ Sequence of key-foreign key joins connecting tables to partitioning attributes
◦ Allow a table to be partitioned by attribute from another table
Example:
Tuesday, June 24, 2014 17
CA_ID C_ID
1 1
2 1
6 2
9 2
T_ID T_CA_ID
1 1
2 1
3 2
4 2
5 6
6 6
7 9
8 9
TRADE
CUSTOMER_ACCOUNT SELECT *
FROM TRADE,
CUSTOMER_ACCOUNT
WHERE T_CA_ID = CA_ID AND
C_ID = @cust_id
Join path from TRADE to C_ID:
T_ID T_CA_ID CA_ID CA_C_ID
TRADE CUSTOMER_ACCOUNT
Blue: source
Red: destination
• Join tree:
• Combination of join paths sharing the same destination (the root)
TPC-E transaction class
breakdown
18

More Related Content

Similar to Jecb sigmod2014

Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndicThreads
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...
MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...
MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...SequelGate
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA cscpconf
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATAcsandit
 
Specification and Detection of SOA Antipatterns
Specification and Detection of SOA AntipatternsSpecification and Detection of SOA Antipatterns
Specification and Detection of SOA AntipatternsFrancis Palma
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...DataScienceConferenc1
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...Denodo
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performanceShenglin Du
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsWeikun Wang
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetupamarsri
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiersamreshkr19
 

Similar to Jecb sigmod2014 (20)

Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...
MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...
MSBI with SQL Server T-SQL | Microsoft BI With Server T-SQL | Realtime Traini...
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
Specification and Detection of SOA Antipatterns
Specification and Detection of SOA AntipatternsSpecification and Detection of SOA Antipatterns
Specification and Detection of SOA Antipatterns
 
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
 
Ssbse10.ppt
Ssbse10.pptSsbse10.ppt
Ssbse10.ppt
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
 
Oow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BIOow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BI
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
Nic solution strategy
Nic solution strategyNic solution strategy
Nic solution strategy
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Jecb sigmod2014

  • 1. JECB: a Join-Extension, Code-Based Approach to OLTP Data Partitioning Khai Q. Tran -- Oracle Labs1 Jeffrey F. Naughton – UW Madison Bruhathi Sundarmurthy -- UW Madison Dimitris Tsirogiannis -- Cloudera2 1 1: Work done at UW Madison 2: Work done at the Microsoft Gray Systems Lab
  • 2. Motivation  Modern OLTP applications often require high transaction throughput  Scaling out OLTP transactions in distributed systems is challenging ◦ Communication protocols (2PC, Paxos) are expensive  A smart way to partition the database to avoid distributed transactions is crucial 2
  • 3. The problem Given a database, a transaction workload, and k processing elements, horizontally split the database into k partitions such that the number of distributed transactions is minimized. Tuesday, June 24, 2014 3
  • 4. Work done in the past: parallel database partitioning Tuesday, June 24, 2014 4 OLTP Parallel database Goal Wants local execution, avoids splitting xacts Wants parallelism power by splitting query/xact Workload properties Simple queries, touching few records Complex queries, large scans Old techniques may not be applicable for the new problem
  • 5. State of the art 1 - Schism  A statistic-based fine-granularity approach: ◦ Models training transactions as a tuple graph: tuples nodes and transactions  edges connecting the tuples ◦ Uses graph min-cut algorithms to partition the graph into k partitions  Mapping tuple  partition id ◦ Uses machine learning techniques to infer partitioning attributes from the sample mapping ◦ May not scale well for large databasesTuesday, June 24, 2014 5
  • 6. State of the art 2 - Horticulture  A schema-based approach ◦ Uses schema to search for the best partitioning attribute for each table ◦ Employs an advanced search technique (large-neighborhood search) to reduce the search space ◦ Uses training trace and a data skew-aware cost function to evaluate the partitioning quality ◦ May not find good partitioning solutions for complicated workloadsTuesday, June 24, 2014 6
  • 7. Our approach - JECB  Join-extension (idea): ◦ Explicitly uses key-foreign key relationships from schema to partition through joins  Code-based (algorithm): ◦ Analyze SQL code from store procedures to find the best way joining tables together  Divide-and-conquer (optimization): ◦ Consider each class of transactions separately and combine results later Tuesday, June 24, 2014 7
  • 8. Simple example for Join Extension Two tables: R(X, Y) and S(Y, Z) Query: SELECT * FROM R, S WHERE R.Y = S.Y AND S.Z = 100 The best way to partition R ? - Not by any columns of R - By S.Z and using join R.Y = S.Y Tuesday, June 24, 2014 8
  • 9. Our data partitioning framework  Phase 1: Pre-processing ◦ Collecting workload trace (sample run) and identifying read-only tables  Phase 2: Partitioning individual transaction types ◦ Finding the best partitioning solution for each type of transactions using join extensions in transaction source code  Phase 3: Merging ◦ Combining results from all transaction types to find the best global solution. Tuesday, June 24, 2014 9
  • 10. Phase 2 demo on TPC-E Customer-Position transactions IF (@cust_id = 0) SELECT @cust_id = C_ID FROM CUSTOMER WHERE C_TAX_ID = @tax_id SELECT CA_ID, CA_BAL, SUM(ISNULL(HS_QTY, 0) * ISNULL(LT_PRICE, 0)) FROM CUSTOMER_ACCOUNT WITH (UPDLOCK) LEFT LOOP JOIN HOLDING_SUMMARY ON CA_ID=HS_CA_ID LEFT LOOP JOIN LAST_TRADE ON HS_S_SYMB = LT_S_SYMB WHERE CA_C_ID = @cust_id GROUP BY CA_ID,CA_BAL ORDER BY 3 ASC SELECT TOP 30 TH_DTS, T_QTY, T_S_SYMB, T_ID, ST_NAME FROM (SELECT TOP 10 T_ID AS ID FROM TRADE WHERE T_CA_ID = @acct_id ORDER BY T_DTS DESC) AS T JOIN TRADE ON T_ID=ID INNER LOOP JOIN TRADE_HISTORY ON TH_T_ID=T_ID, STATUS_TYPE WHERE ST_ID = TH_ST_ID ORDER BY TH_DTS DESC Tuesday, June 24, 2014 10 CA_ID CA_C_ID CUSTOMER_ACCOUNT HS_S_SYMB, HS_CA_ID HS_CA_ID HOLDING_SUMMARY T_ID T_CA_ID TRADE HS_S_SYMB LT_S_SYMB LAST_TRADE C_ID C_TAX_ID CUSTOMER TH_T_ID, TH_ST_ID TH_T_ID TRADE_HISTORY ST_ID STATUS TH_ST_ID CA_ID CA_C_IDCA_C_ID C_IDC_ID C_TAX_ID CUSTOMER Step 1: Identify accessed tables and accessed attributes from SQL codeStep 2: Connect tables with key-foreign joins to form a join graphStep 3: List all possible join trees from the join graphStep 4: Use the workload trace to select the best join tree
  • 11. Implementation Tuesday, June 24, 2014 11 Workload driver Partitioning Algorithm P Simulation framework CostCost Model Trace Collector Training trace Database schema Transaction SQL code Testing trace Partitioning solution SQL Server 2012
  • 12. Scalability experiments with TPC-C Tuesday, June 24, 2014 12 0 50 100 150 200 250 0 1 2 3 4 5 6 0.2 0.4 0.8 1.6 3.2 Thousand transactions GB Database size (GB) Memory Traning size Schism JECB 0 50 100 150 200 250 0 1 2 3 4 5 6 0.2 0.4 0.8 1.6 3.2 Thousand transactions GB Database size (GB) Memory
  • 13. Quality experiments Tuesday, June 24, 2014 13 0 10 20 30 40 50 60 %ofdist.transactions Schism 10% coverage Horticulture JECB
  • 14. TPC-E partitioning scheme 14 Table Horticulture JECB ACCOUNT_PERMISSION AP_CA_ID replicated CUSTOMER_TAXRATE CX_C_ID replicated DAILY_MARKET DM_DATE replicated WATCH_LIST WL_C_ID replicated CASH_TRANSACTION CT_T_ID CT_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID CUSTOMER_ACCOUNT replicated CA_C_ID HOLDING H_CA_ID H_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID HOLDING_HISTORY HH_T_ID HH_T_ID=T_ID, T_CA_ID = CA_ID, CA_C_ID HOLDING_SUMMARY HS_CA_ID HS_CA_ID = CA_ID, CA_C_ID SETTLEMENT SE_T_ID SE_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID TRADE T_CA_ID T_CA_ID = CA_ID, CA_C_ID TRADE_HISTORY TH_T_ID TH_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID TRADE_REQUEST replicated TR_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID Join extensions help to deliver better results on complicated workloads
  • 15. What we learnt from Experiments Tuesday, June 24, 2014 15 Schema-based approach Statistics-based approach JECB: • Divide-and-conquer and join extensions to break complex workloads • Restrictions: all joins are key- foreign key joins Workload complexity SimpleComplicated Databasesize SmallLarge ?
  • 16. Conclusion and future work  We developed a promising approach for partitioning complicated OLTP databases.  Future work: ◦ Address the data skew problem ◦ Try more complicated cost models rather than % of distributed transactions ◦ End-to-end performance experiments Tuesday, June 24, 2014 16
  • 17. Phase 2 (Background)  Join path: ◦ Sequence of key-foreign key joins connecting tables to partitioning attributes ◦ Allow a table to be partitioned by attribute from another table Example: Tuesday, June 24, 2014 17 CA_ID C_ID 1 1 2 1 6 2 9 2 T_ID T_CA_ID 1 1 2 1 3 2 4 2 5 6 6 6 7 9 8 9 TRADE CUSTOMER_ACCOUNT SELECT * FROM TRADE, CUSTOMER_ACCOUNT WHERE T_CA_ID = CA_ID AND C_ID = @cust_id Join path from TRADE to C_ID: T_ID T_CA_ID CA_ID CA_C_ID TRADE CUSTOMER_ACCOUNT Blue: source Red: destination • Join tree: • Combination of join paths sharing the same destination (the root)