Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Predictive Elastic
Database Systems
May 24th, 2018
1
Rebecca Taft – becca@cockroachlabs.com
Transaction	Performance	
Makes	or	Breaks	a	Business
Online Retail
Financial Applications
Internet of Things
Social Media
…...
Challenges	to	transaction	performance:	
skew	and	workload	variation
3
Database	Elasticity
E-Store	–	manage	skew	and	react	...
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
4
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
Query:	
Get	cart_id	758
4
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router Hash(758)	→	Partition	1
4
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
Query:	
Add	“Harry	Potter”	
to	cart_id	534
5
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router Hash(534)	→	Partition	3
5
TransacRons
0
1
3
4
5
ParRRon	1 ParRRon	2 ParRRon	3
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
Uniform
Work...
But	many	real	database	workloads	are	
skewed,	not	uniform
7
8
9
TransacRons
0
3
5
8
10
ParRRon	1 ParRRon	2 ParRRon	3
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
Skewed
Work...
And	real	database	workloads	are	variable
11
12
13
14
TransacRons
0
1
3
4
5
ParRRon	1 ParRRon	2 ParRRon	3
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
Uniform
Work...
TransacRons
0
3
5
8
10
ParRRon	1 ParRRon	2 ParRRon	3
Data	Partition	1 Data	Partition	2 Data	Partition	3
Router
Uniform
Wor...
TransacRons
0
3
5
8
10
Part	1 Part	2 Part	3
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Arrival	Rate
19
Unif...
TransacRons
0
3
5
8
10
Part	1 Part	2 Part	3
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Arrival	Rate
19
Unif...
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Arrival	Rate
TransacRons
0
3
5
8
10
Part	1 Part	2 Part	3 Part	4 ...
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Arrival	Rate
TransacRons
0
3
5
8
10
Part	1 Part	2 Part	3 Part	4 ...
TransacRons
0
2.5
5
7.5
10
Part	1 Part	2 Part	3 Part	4 Part	5
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Ar...
TransacRons
0
2.5
5
7.5
10
Part	1 Part	2 Part	3 Part	4 Part	5
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Ar...
TransacRons
0
2.5
5
7.5
10
Part	1 Part	2 Part	3 Part	4 Part	5
E-Store: Elastic Scaling to Adapt to Workload
Transaction	Ar...
TransacRons
0
3
5
8
10
Part	1 Part	2 Part	3
TransacRons
0
2.5
5
7.5
10
Part	1 Part	2 Part	3 Part	4 Part	5
E-Store: Elastic...
E-Store Elastic DBMS Architecture
20
Live	Migration
Elastic	Controller
Planner
Load	
Monitor
Shared	Nothing	DBMS	(e.g.	H-S...
A	Case	Study:	B2W	Digital
21
3-Day Online Retail Database Workload
22
3-Day Online Retail Database Workload
22
Elastic Scaling Adapts to Workload
23
Reactive Scaling Causes Latency Spikes
23
P-Store

A Predictive Elastic DBMS
24
P-Store Architecture
25
Live	
Migration
Predictive	Controller
Predictor Planner
Load	
Monitor
Shared	Nothing	DBMS	(e.g.	H-...
Ideal Capacity
Actual Servers
Allocated
Demand
Machine	Capacity
26
Demand
Machine	Capacity
?
27
Options
1. Add	machine(s)
Demand
Machine	Capacity
28
Options
1. Add	machine(s)
Demand
Machine	Capacity
28
Options
1. Add	machine(s)
Demand
Machine	Capacity
28
Options
1. Add	machine(s)
2. Remove	machine(s)
Demand
Machine	Capacity
28
Options
1. Add	machine(s)
2. Remove	machine(s)
3. No	change
Demand
Machine	Capacity
28
Options
1. Add	machine(s)
2. Remove	machine(s)
3. No	change
Demand
Machine	Capacity
28
Options
1. Add	machine(s)
2. Remove	machine(s)
3. No	change
Demand
Machine	Capacity
Effective	Capacity
28
Options
1. Add	machine(s)
2. Remove	machine(s)
3. No	change
Challenges
1. Time	to	Reconfigure
2. Effective	Capacity	
3. Co...
Time to Reconfigure
• Rate	control:	small	chunks	of	data,	spaced	apart
• ∃	some	time	D	–	minimum	time	to	move	entire	datab...
Time to Reconfigure
• Rate	control:	small	chunks	of	data,	spaced	apart
• ∃	some	time	D	–	minimum	time	to	move	entire	datab...
Effective Capacity
30
	
• What	transaction	rate	can	the	system	support?
• Max	rate	per	machine:	Q
Cost
Cost = 7
Example: Scale from 1 to 6 nodes
34
Demand
Machine	Capacity
Effective	Capacity
4	machines

at	t	=	9
Goal:	find	path	with	

minimal	cost	such	that

eff.	capaci...
Demand
Machine	Capacity
Effective	Capacity
Move:	
3→4	machines	
Ending	at	t	=	9
36
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
Move:	
3→4	machines	
Ending	at	t	=	9
36
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
Move:	
3→4	machines	
Ending	at	t	=	9
36
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
Move:	
3→4	machines	
Ending	at	t	=	9
36
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
Move:	
3→4	machines	
Ending	at	t	=	9
36
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
C3→4	=	∞
Move:	
3→4	machines	
Ending	at	t	=	9
36
Demand
Machine	Capacity
Effective	Capacity
Move:	
4→4	machines	
Ending	at	t	=	9
37
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
Move:	
4→4	machines	
Ending	at	t	=	9
37
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
C4→4	=	4
Move:	
4→4	machines	
Ending	at	t	=	9
37
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
C4→4	=	4 ✔
Move:	
4→4	machines	
Ending	at	t	=	9
37
Demand
Machine	Capacity
Effective	Capacity
Move:	
3→4	machines	
Ending	at	t	=	8
38
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
Move:	
3→4	machines	
Ending	at	t	=	8
38
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
Move:	
3→4	machines	
Ending	at	t	=	8
38
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
Move:	
3→4	machines	
Ending	at	t	=	8
38
Demand
Machine	Capacity
Effective	Capacity
T3→4	=	3
C3→4	=	12
Move:	
3→4	machines	
Ending	at	t	=	8
✔
38
Demand
Machine	Capacity
Effective	Capacity
39
Demand
Machine	Capacity
Effective	Capacity
39
Demand
Machine	Capacity
Effective	Capacity
39
Demand
Machine	Capacity
Effective	Capacity
Feasible	Solution

Total	Cost:	30
39
Demand
Machine	Capacity
Effective	Capacity
Move:	
4→4	machines	
Ending	at	t	=	8
40
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
Move:	
4→4	machines	
Ending	at	t	=	8
40
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
C4→4	=	4
Move:	
4→4	machines	
Ending	at	t	=	8
40
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
C4→4	=	4 ✔
Move:	
4→4	machines	
Ending	at	t	=	8
40
Demand
Machine	Capacity
Effective	Capacity
T4→4	=	1
C4→4	=	4 ✔
Move:	
4→4	machines	
Ending	at	t	=	8
And	So	On….
41
Dynamic Programming Algorithm for Planning
Reconfigurations
• Complexity	∝	#	time	steps	and	max	number	of	machines	needed
...
Time Series Prediction
• Diurnal,	periodic	workload,	some	variation	day	to	day
• Prediction	must	complete	in	seconds-to-mi...
Evaluation
• Can	we	reduce	resource	usage?
• Can	we	prevent	latency	spikes?
44
Experiments
• 3	Days	of	B2W	workload	run	on	H-Store	
• Cluster	of	10	servers,	6	partitions	per	server
• About	1	GB	of	shop...
Results – Static, Peak Provisioning

Machines Used: 10
46
Results – Static, Average Provisioning

Machines Used: 4
47
48
Results – Reactive Scaling

Avg. Machines Used: 4.02
Results – P-Store

Avg. Machines Used: 5.05	
49
Results Comparison: CDF of Top 1% of Latency
50
Summary: P-Store Evaluation
• Can	we	reduce	resource	usage?	
• Saves	50%	of	computing	resources	compared	to	static	allocat...
Simulation
52
• 4.5	months	of	B2W	data
• Test	many	hypothetical	allocation	strategies	+	parameters
Simulation: Example
53
Summary
❖	Real	database	workloads	are	skewed	and	vary	over	time
❖	Elasticity	enables	management	of	skew	and	adaptation	to	...
Upcoming SlideShare
Loading in …5
×

Predictive Elastic Database Systems

58 views

Published on

Predictive Elastic Database Systems by Rebecca Taft

On-line transaction processing (OLTP) database management systems (DBMSs) are a critical part of the operation of many large enterprises. These systems often serve time-varying workloads due to daily, weekly or seasonal fluctuations in demand, or because of rapid growth in demand due to a company’s business success. In addition, many OLTP workloads are heavily skewed to “hot” tuples or ranges of tuples. For example, the majority of NYSE volume involves only 40 stocks. To deal with such fluctuations, an OLTP DBMS needs to be elastic; that is, it must be able to expand and contract resources in response to load fluctuations and dynamically balance load as hot tuples vary over time. In this talk, I will present a system we built at MIT called P-Store, which uses predictive modeling to reconfigure the system in advance of predicted load changes. I will show that when running a real database workload, P-Store achieves comparable performance to a traditional static OLTP DBMS while using 50% fewer computing resources.



Published in: Technology
  • Be the first to comment

  • Be the first to like this

Predictive Elastic Database Systems

  1. 1. Predictive Elastic Database Systems May 24th, 2018 1 Rebecca Taft – becca@cockroachlabs.com
  2. 2. Transaction Performance Makes or Breaks a Business Online Retail Financial Applications Internet of Things Social Media …. 2
  3. 3. Challenges to transaction performance: skew and workload variation 3 Database Elasticity E-Store – manage skew and react to variation P-Store – predictive modeling for time variation
  4. 4. Data Partition 1 Data Partition 2 Data Partition 3 Router 4
  5. 5. Data Partition 1 Data Partition 2 Data Partition 3 Router Query: Get cart_id 758 4
  6. 6. Data Partition 1 Data Partition 2 Data Partition 3 Router Hash(758) → Partition 1 4
  7. 7. Data Partition 1 Data Partition 2 Data Partition 3 Router Query: Add “Harry Potter” to cart_id 534 5
  8. 8. Data Partition 1 Data Partition 2 Data Partition 3 Router Hash(534) → Partition 3 5
  9. 9. TransacRons 0 1 3 4 5 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Uniform Workload Transaction Arrival Rate 6
  10. 10. But many real database workloads are skewed, not uniform 7
  11. 11. 8
  12. 12. 9
  13. 13. TransacRons 0 3 5 8 10 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Skewed Workload Transaction Arrival Rate 10
  14. 14. And real database workloads are variable 11
  15. 15. 12
  16. 16. 13
  17. 17. 14
  18. 18. TransacRons 0 1 3 4 5 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Uniform Workload Transaction Arrival Rate 15
  19. 19. TransacRons 0 3 5 8 10 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Uniform Workload,
 Increased Load 18 Transaction Arrival Rate
  20. 20. TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate 19 Uniform Workload, Increasing Load
  21. 21. TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate 19 Uniform Workload, Increasing Load
  22. 22. E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Part 4 Part 5 19 Uniform Workload, Increasing Load
  23. 23. E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Part 4 Part 5 19 Uniform Workload, Increasing Load
  24. 24. TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate 19 Uniform Workload, Increasing Load
  25. 25. TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Transaction Arrival Rate 19 Uniform Workload, Increasing Load Skewed Workload
  26. 26. TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Transaction Arrival Rate 19 Uniform Workload, Increasing Load Skewed Workload
  27. 27. TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate Transaction Arrival Rate 19 Uniform Workload, Increasing Load Skewed Workload
  28. 28. E-Store Elastic DBMS Architecture 20 Live Migration Elastic Controller Planner Load Monitor Shared Nothing DBMS (e.g. H-Store)
  29. 29. A Case Study: B2W Digital 21
  30. 30. 3-Day Online Retail Database Workload 22
  31. 31. 3-Day Online Retail Database Workload 22
  32. 32. Elastic Scaling Adapts to Workload 23
  33. 33. Reactive Scaling Causes Latency Spikes 23
  34. 34. P-Store
 A Predictive Elastic DBMS 24
  35. 35. P-Store Architecture 25 Live Migration Predictive Controller Predictor Planner Load Monitor Shared Nothing DBMS (e.g. H-Store)
  36. 36. Ideal Capacity Actual Servers Allocated Demand Machine Capacity 26
  37. 37. Demand Machine Capacity ? 27
  38. 38. Options 1. Add machine(s) Demand Machine Capacity 28
  39. 39. Options 1. Add machine(s) Demand Machine Capacity 28
  40. 40. Options 1. Add machine(s) Demand Machine Capacity 28
  41. 41. Options 1. Add machine(s) 2. Remove machine(s) Demand Machine Capacity 28
  42. 42. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Demand Machine Capacity 28
  43. 43. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Demand Machine Capacity 28
  44. 44. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Demand Machine Capacity Effective Capacity 28
  45. 45. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Challenges 1. Time to Reconfigure 2. Effective Capacity 3. Cost Demand Machine Capacity Effective Capacity 28
  46. 46. Time to Reconfigure • Rate control: small chunks of data, spaced apart • ∃ some time D – minimum time to move entire database w/ no performance impact % of Data 0 50 100 Nodes 1 2 3 4 % of Data 0 50 100 Nodes 1 2 3 → 29
  47. 47. Time to Reconfigure • Rate control: small chunks of data, spaced apart • ∃ some time D – minimum time to move entire database w/ no performance impact % of Data 0 50 100 Nodes 1 2 3 4 % of Data 0 50 100 Nodes 1 2 3 → Time = ¼ * D 29
  48. 48. Effective Capacity 30 • What transaction rate can the system support? • Max rate per machine: Q
  49. 49. Cost Cost = 7
  50. 50. Example: Scale from 1 to 6 nodes 34
  51. 51. Demand Machine Capacity Effective Capacity 4 machines
 at t = 9 Goal: find path with 
 minimal cost such that
 eff. capacity > demand Cost of {path to A machines ending at time t}
 =
 Cost of {path to B machines ending at time t – TimeB→A} + CostB→A 35
  52. 52. Demand Machine Capacity Effective Capacity Move: 3→4 machines Ending at t = 9 36
  53. 53. Demand Machine Capacity Effective Capacity T3→4 = 3 Move: 3→4 machines Ending at t = 9 36
  54. 54. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 9 36
  55. 55. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 9 36
  56. 56. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 9 36
  57. 57. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 C3→4 = ∞ Move: 3→4 machines Ending at t = 9 36
  58. 58. Demand Machine Capacity Effective Capacity Move: 4→4 machines Ending at t = 9 37
  59. 59. Demand Machine Capacity Effective Capacity T4→4 = 1 Move: 4→4 machines Ending at t = 9 37
  60. 60. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 Move: 4→4 machines Ending at t = 9 37
  61. 61. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 ✔ Move: 4→4 machines Ending at t = 9 37
  62. 62. Demand Machine Capacity Effective Capacity Move: 3→4 machines Ending at t = 8 38
  63. 63. Demand Machine Capacity Effective Capacity T3→4 = 3 Move: 3→4 machines Ending at t = 8 38
  64. 64. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 8 38
  65. 65. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 8 38
  66. 66. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 8 ✔ 38
  67. 67. Demand Machine Capacity Effective Capacity 39
  68. 68. Demand Machine Capacity Effective Capacity 39
  69. 69. Demand Machine Capacity Effective Capacity 39
  70. 70. Demand Machine Capacity Effective Capacity Feasible Solution
 Total Cost: 30 39
  71. 71. Demand Machine Capacity Effective Capacity Move: 4→4 machines Ending at t = 8 40
  72. 72. Demand Machine Capacity Effective Capacity T4→4 = 1 Move: 4→4 machines Ending at t = 8 40
  73. 73. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 Move: 4→4 machines Ending at t = 8 40
  74. 74. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 ✔ Move: 4→4 machines Ending at t = 8 40
  75. 75. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 ✔ Move: 4→4 machines Ending at t = 8 And So On…. 41
  76. 76. Dynamic Programming Algorithm for Planning Reconfigurations • Complexity ∝ # time steps and max number of machines needed • In practice: runtime < 1 second • Greedy alternatives don’t work 42
  77. 77. Time Series Prediction • Diurnal, periodic workload, some variation day to day • Prediction must complete in seconds-to-minutes • We use Sparse Periodic Auto-Regression (SPAR) – Chen et al. NSDI ‘08 43
  78. 78. Evaluation • Can we reduce resource usage? • Can we prevent latency spikes? 44
  79. 79. Experiments • 3 Days of B2W workload run on H-Store • Cluster of 10 servers, 6 partitions per server • About 1 GB of shopping cart and checkout data • Track machines allocated, throughput, latency, reconfiguration time periods 45
  80. 80. Results – Static, Peak Provisioning
 Machines Used: 10 46
  81. 81. Results – Static, Average Provisioning
 Machines Used: 4 47
  82. 82. 48 Results – Reactive Scaling
 Avg. Machines Used: 4.02
  83. 83. Results – P-Store
 Avg. Machines Used: 5.05 49
  84. 84. Results Comparison: CDF of Top 1% of Latency 50
  85. 85. Summary: P-Store Evaluation • Can we reduce resource usage? • Saves 50% of computing resources compared to static allocation • Can we prevent latency spikes? • Superior performance compared to reactive approach • On a real workload, P-Store reduces resource usage while keeping latency within application requirements 51
  86. 86. Simulation 52 • 4.5 months of B2W data • Test many hypothetical allocation strategies + parameters
  87. 87. Simulation: Example 53
  88. 88. Summary ❖ Real database workloads are skewed and vary over time ❖ Elasticity enables management of skew and adaptation to load changes ❖ Predictive scaling improves performance during load changes compared to reactive scaling Rebecca Taft becca@cockroachlabs.com 54

×