Aims2012
- 1. AIMS, Luxembourg, Luxembourg, June 6, 2012
Cooperative Database Caching within
Cloud Environments
Andrei Vancea1, Guilherme Sperb Machado1, Laurent d’Orazio2,
Burkhard Stiller1
1
Department of Informatics IFI, Communication Systems Group CSG,
University of Zürich UZH, Switzerland
2
Blaise Pascal University - LIMOS, France
vancea,stiller@ifi.uzh.ch, laurent.dorazio@isima.fr
© 2012 UZH,
- 2. Background
Databases
– Client: asks a query (SQL)
– Server: returns the result (tuples)
Client-side caching
– Page Caching, Tuple Caching
– Semantic Caching
• Clients store the results of old queries
• Old results used for answering new queries
© 2012 UZH,
- 3. Background - Semantic Caching
Query
Semantic Regions
– Query description
– Result set
Query rewriting Queries QUERY
– Probe descriptions REWRITING
– Remainder
Probe Remainder
Semantic Server
cache
© 2012 UZH,
- 4. Database Caching & Cloud Computing
Most cloud providers charge data transfer between
cloud environment and “outside world” in a pay-as-you-
go matter
Database caching within cloud environment
– Improves performance
– Economic benefits
• Amount of data transferred decreases
Payments for data transferred reduced
© 2012 UZH,
- 6. Cooperative Semantic Caching
Share local semantic
caches between clients
Use cache entries of
other clients
Performance
improvements
© 2012 UZH,
- 7. Cooperative Semantic Caching
select * : select * fromresult age > 7 and age7 10
Q3 from persons where where age > <=
persons
select * from R1
result
Q1 : select * fromresult where age > 10
persons
R1 : age > 10
© 2012 UZH,
- 8. Potential Use Cases
GIS (Geographic Information System) storage
– Large amount of data (e.g. seismic events)
– Processing done on client side
– Two-dimensional range selections (area)
NetFlow-based architectures
– Routers collect flow records and store them in databases
– Analyzers (intrusion detection, accounting,… ) access them
– Range selections (Start Time, IP)
© 2012 UZH,
- 9. Query Rewriting
Query
Query rewriting
– Probe
– Remote probes
– Remainder All queries QUERY
descriptions REWRITING
...
Remote Remote
Probe probe probe Remainder
Local Remote Remote Server
Semantic Semantic Semantic
cache cache cache
© 2012 UZH,
- 11. CoopSC
Cooperative Semantic Caching
Coop
Query types
– Selection (n-Dimensional range predicates)
– select id, name, age from persons where 20 < age and
age < 30
Cache organization
– Semantic regions
– Distributed Index – built on top of a P2P overlay
© 2012 UZH,
- 12. CoopSC - Query Rewriting
Local Rewriting Query
– Probe
Local Cache
– Local Remainder Local Rewriting
• Portion of the query which is
Local
not available in the local cache Remainder
Distributed Rewriting Distributed
Distributed Rewriting Index
– Remote Probes
– Remainder
…
Probe Remote Remote Remainder
Probe Probe
© 2012 UZH,
- 13. Distributed Index
Built on top of P2P overlay
Regions and queries represented as
rectangular shapes
MX-CIF Quad Tree
– Efficiently find intersection between
rectangular shapes
Each region is indexed in the smallest
quad which totally contains it
Easy to adapt to n-Dimensional
regions/queries
© 2012 UZH,
- 14. Update Handling
Issues
– Invalidation of old entries
– Combining different snapshots can generate inconsistencies
Quad space division (specified update level)
Virtual timestamps stored in database
Each modification increments the virtual timestamp of
corresponding quad
Regions store virtual timestamps of quads that they
intersect
© 2012 UZH,
- 16. Cloud Scenario A
Database server running
outside the cloud
Clients located inside in
the cloud
Non-operational use cases
– Example: cloud environment
used for running scientific
experiments
© 2012 UZH,
- 17. Cloud Scenario B
Database server running
inside the cloud
Clients located inside in
the cloud
Operational use cases
– Example: corporation
using cloud environment
as an alternative to
building a datacenter
© 2012 UZH,
- 19. Experiment Design
Measurements
– Response time
– Amount of data transferred
– Payments for data transfer
Experiments
– Cache size
– Update level
Testing sessions
– 5 select testing sessions (50 queries each)
– Update sessions interleaved
© 2012 UZH,
- 20. Evaluation
Wisconsin benchmark dataset (10.000.000 tuples)
Scenario A
– Database Server: Zurich testbed
– 5 Client: Rackspace
Scenario B
– Database server
• Amazon EC2
– 5 Clients: EmanicsLab
Queries
– About 10.000 tuples
– Semantic locality
© 2012 UZH,
- 22. Data transferred/Payments
CoopSC
significantly reduces
the number of
tuples sent by
database server
Amount of money
also reduced
© 2012 UZH,
- 23. Response Time
Rackspace
behaves unstable
No performance
improvements
noticed
© 2012 UZH,
- 25. Data transferred/Payments
CoopSC
significantly reduces
the number of
tuples sent by
database server
Bandwidth
payments also
reduced
© 2012 UZH,
- 29. Summary & Conclusion
Summary
– Cooperative caching approach used for reducing the load of
the database server
– Update statements supported
– CoopSC applied in the context of cloud environments
CoopSC reduces the amount of data transferred
between cloud and outside world which has economic
benefits
Performance benefits as long as cloud providers are
stable
© 2012 UZH,
- 31. Update Handling - Algorithm
procedure Execute(query)
quads = query.getIntersecteQuad(updateLevel);
before = database.getTimestamps(quads);
plan = rewrite(query, before);
result = plan.execute();
after = database.getTimestamps(quads);
if (before == after)
return result;
else
result database.execute(query);
© 2012 UZH,