Concurrently processing thousands of web queries, each with a response time under a fraction of a second, necessitates maintaining and operating massive data centers. For large-scale web search engines, this translates into high energy consumption and a huge electric bill. This work takes the challenge to reduce the electric bill of commercial web search engines operating on data centers that are geographically far apart. Based on the observation that energy prices and query workloads show high spatio-temporal variation, we propose a technique that dynamically shifts the query workload of a search engine between its data centers to reduce the electric bill. Experiments on real-life query workloads obtained from a commercial search engine show that significant financial savings can be achieved by this technique.
Energy-Price-Driven Query Processing in Multi-center WebSearch Engines
1. Energy-Price-Driven Query Processing
in
Multi-center Web
Search Engines
Enver Kayaaslan (Bliken University)
B. Barla Cambazoglu (Yahoo! Research)
Roi Blanco (Yahoo! Research)
Flavio Junqueira (Yahoo! Research)
Cevdet Aykanat (Bilken University)
2. Overview
• Large scale web search engines
• Is it possible to decrease the energy financial costs?
– Reducing the electric bill (>>35M$ annual)
– Shifting workload between datacenters
- 2 -
• Agenda
– Motivation
– Problem Definition
– Algorithm
– Experiments
7. - 7 -
Metrics
• Query Response Time
– Under a satisfactory amount (400ms)
• User to data center latency
• Query processing operations
– Query degradation
• Peak Sustainable Throughput
– Query overflows
– Number of clusters
• Total Energy Consumption
– Query processing operations
• Total Energy Cost (electric bill)
– Energy price
10. • To decrease the total energy cost, exploiting
- Spatio-temporal variation in energy prices
- Spatio-temporal variation in query workloads
- 10 -
• Constraints:
– Limited hardware
– Bounded response times
Goal
19. Workload estimation (no forwarding)
0 Timeline
Current time
0 Timeline
Current time
0 Timeline
Current time
- 19 -
20. - 20 -
Estimates future
workloads
Current energy prices
Capacities
Shifts workload
evenly assuming
every expensive data-center
will forward
queries evenly
Probability is the ratio
of the forwarding rate
to the estimated
workload
Sort of conservative
Generating probabilities
21. - 21 -
Generating probabilities
Estimates future
workloads
Current energy prices
Capacities
Shifts workload
evenly assuming
every expensive data-center
will forward
queries evenly
Probability is the ratio
of the forwarding rate
to the estimated
workload
Sort of conservative
22. - 22 -
Generating probabilities
Estimates future
workloads
Current energy prices
Capacities
Shifts workload
evenly assuming
every expensive data-center
will forward
queries evenly
Probability is the ratio
of the forwarding rate
to the estimated
workload
Sort of conservative
23. - 23 -
A
B
C
D
Remote Centers
Picking a data-center
Local Center
User query
w.p. pA(A)
25. - 25 -
Set-up
• 5 Datacenters
• 38M queries, over 4 days
• Turn caching on (lowers PST of the back-end)
• Tuned capacities to a low query overflow rate (<0.005)
26. - 26 -
Results (I)
• 63% of the queries served by the cache
– Reduces the PST of the backend
• Average query response time increases (from 66ms to
around 100ms)
• Query degradation rate increases but is kept <5% for a
budget of 400ms (none if budget > 800ms)
• Overflow rate is the same as in the non-forwarding scenario
• Forwarding rate is proportional to price variation
• Despite network latencies forwarding is possible
27. Aggregate hourly query forwarding rate
- 27 -
Global Time
Forwarding depends on ordinal
ranking of current prices
In spatial it dominates price
ordinal ranking more than intra-day
variations (S correlates
with ST)
Traffic and forwarding inverse
relation
Takes into account energy
prices
28. - 28 -
Relative to U setup
Increasing saving with larger
response time limit
ST > S > T (r = ∞)
About 35% saving at ST
Savings in electric cost
29. Savings vs electric price vs forwarding rate
- 29 -
Local Time (where the query
is processed)
T setup, correlates with
temporal effects
r = 800 (~ ∞)
Query forwarding rate:
positive correlation
Electric price:
negative correlation
What happens at 17:00?
30. - 30 -
Conclusions
• Presented the reduction of the electric bill in distributed
search engines as an online optimization framework
• A practical algorithm, based on shifting query workloads
• Evaluation of potential savings via realistic simulations
• Depending on electric price distribution, energy costs can be
significantly reduced by shifting query workloads to energy-cheap
data centers.
– By maintaining the overflow rate equal to that of a centralized
engine
– By keeping the query degradation rate < 5%
• The higher the variability in the configuration the higher the
savings
31. - 31 -
Future work
• Price-aware crawling
• Price-aware indexer
• Energy-aware caches
• Green search engines!
33. - 33 -
No bound on response time,
r = ∞
Almost all can be answered
under 800ms, 5% under
400ms
U does not forward
T forwards more
Query forwarding vs budget
34. - 34 -
PST
(for each datacenter)
Window Size
(for each price configuration)
PST and window size
35. Query processing time estimation
- 35 -
Query q = {t1, t2, t3}
LIST 1
LIST2
LIST3
t1
t2
t3
results = {r11,r15, r22, r26 r31}
#operations ~ c + w(|LIST1| + |LIST2| + |LIST3|)