AIMS, Luxembourg, Luxembourg, June 6, 2012




      Cooperative Database Caching within
             Cloud Environments
          Andrei Vancea1, Guilherme Sperb Machado1, Laurent d’Orazio2,
                                   Burkhard Stiller1
        1
          Department of Informatics IFI, Communication Systems Group CSG,
                         University of Zürich UZH, Switzerland
                     2
                       Blaise Pascal University - LIMOS, France
                  vancea,stiller@ifi.uzh.ch, laurent.dorazio@isima.fr




© 2012 UZH,
Background
    Databases
      – Client: asks a query (SQL)
      – Server: returns the result (tuples)
    Client-side caching
      – Page Caching, Tuple Caching
      – Semantic Caching
              • Clients store the results of old queries
              • Old results used for answering new queries




© 2012 UZH,
Background - Semantic Caching
                                                       Query
    Semantic Regions
       – Query description
       – Result set
    Query rewriting           Queries             QUERY
       – Probe               descriptions        REWRITING
       – Remainder


                                             Probe         Remainder


                                            Semantic           Server
                                              cache

© 2012 UZH,
Database Caching & Cloud Computing

    Most cloud providers charge data transfer between
     cloud environment and “outside world” in a pay-as-you-
     go matter
    Database caching within cloud environment
       – Improves performance
       – Economic benefits
              • Amount of data transferred decreases
                  Payments for data transferred reduced




© 2012 UZH,
Approach




© 2012 UZH,
Cooperative Semantic Caching
     Share local semantic
     caches between clients
     Use cache entries of
     other clients
    Performance
     improvements




© 2012 UZH,
Cooperative Semantic Caching



                                  select * : select * fromresult age > 7 and age7 10
                                       Q3 from persons where where age > <=
                                                           persons



                      select * from R1
                      result




                                         Q1 : select * fromresult where age > 10
                                                            persons




              R1 : age > 10

© 2012 UZH,
Potential Use Cases
    GIS (Geographic Information System) storage
      – Large amount of data (e.g. seismic events)
      – Processing done on client side
      – Two-dimensional range selections (area)
    NetFlow-based architectures
      – Routers collect flow records and store them in databases
      – Analyzers (intrusion detection, accounting,… ) access them
      – Range selections (Start Time, IP)




© 2012 UZH,
Query Rewriting
                                        Query
 Query rewriting
  – Probe
  – Remote probes
  – Remainder   All queries           QUERY
                 descriptions       REWRITING
                                          ...
                                   Remote       Remote
                          Probe     probe        probe     Remainder



                          Local     Remote       Remote     Server
                        Semantic   Semantic     Semantic
                          cache      cache        cache

© 2012 UZH,
System Design




© 2012 UZH,
CoopSC
   Cooperative Semantic Caching
    Coop
    Query types
    – Selection (n-Dimensional range predicates)
    – select id, name, age from persons where 20 < age and
      age < 30
    Cache organization
    – Semantic regions
    – Distributed Index – built on top of a P2P overlay




© 2012 UZH,
CoopSC - Query Rewriting
    Local Rewriting                                           Query

       – Probe
                                                                                           Local Cache
       – Local Remainder                                   Local Rewriting

              • Portion of the query which is
                                                                                   Local
                not available in the local cache                                 Remainder

    Distributed Rewriting                                                                           Distributed
                                                                       Distributed Rewriting           Index
       – Remote Probes
       – Remainder
                                                                        …

                                                   Probe       Remote         Remote     Remainder
                                                                Probe          Probe




© 2012 UZH,
Distributed Index
                  Built on top of P2P overlay
                  Regions and queries represented as
                   rectangular shapes
                  MX-CIF Quad Tree
                   – Efficiently find intersection between
                     rectangular shapes
                  Each region is indexed in the smallest
                   quad which totally contains it
                  Easy to adapt to n-Dimensional
                   regions/queries

© 2012 UZH,
Update Handling
    Issues
       – Invalidation of old entries
       – Combining different snapshots can generate inconsistencies
    Quad space division (specified update level)
    Virtual timestamps stored in database
    Each modification increments the virtual timestamp of
     corresponding quad
    Regions store virtual timestamps of quads that they
     intersect


© 2012 UZH,
Cloud Computing Scenarios




© 2012 UZH,
Cloud Scenario A

                         Database server running
                          outside the cloud
                         Clients located inside in
                          the cloud
                         Non-operational use cases
                          – Example: cloud environment
                            used for running scientific
                            experiments




© 2012 UZH,
Cloud Scenario B

                         Database server running
                          inside the cloud
                         Clients located inside in
                          the cloud
                         Operational use cases
                          – Example: corporation
                            using cloud environment
                            as an alternative to
                            building a datacenter


© 2012 UZH,
Evaluation




© 2012 UZH,
Experiment Design
    Measurements
       – Response time
       – Amount of data transferred
       – Payments for data transfer
    Experiments
       – Cache size
       – Update level
    Testing sessions
       – 5 select testing sessions (50 queries each)
       – Update sessions interleaved


© 2012 UZH,
Evaluation
    Wisconsin benchmark dataset (10.000.000 tuples)
    Scenario A
       – Database Server: Zurich testbed
       – 5 Client: Rackspace
    Scenario B
       – Database server
              • Amazon EC2
       – 5 Clients: EmanicsLab
    Queries
       – About 10.000 tuples
       – Semantic locality

© 2012 UZH,
Scenario A




© 2012 UZH,
Data transferred/Payments

                                 CoopSC
                                  significantly reduces
                                  the number of
                                  tuples sent by
                                  database server
                                 Amount of money
                                  also reduced




© 2012 UZH,
Response Time

                           Rackspace
                            behaves unstable
                           No performance
                            improvements
                            noticed




© 2012 UZH,
Scenario B




© 2012 UZH,
Data transferred/Payments

                                 CoopSC
                                  significantly reduces
                                  the number of
                                  tuples sent by
                                  database server
                                 Bandwidth
                                  payments also
                                  reduced



© 2012 UZH,
Response Time

                           CoopSC improves
                            response time




© 2012 UZH,
Data transferred/Payments (Updates)
                               Good behavior for
                                low update rate
                               Economic and
                                performance
                                benefits




© 2012 UZH,
Response Times (Updates)
                                 Response
                                  increases with the
                                  grow of update
                                  rate




© 2012 UZH,
Summary & Conclusion
    Summary
       – Cooperative caching approach used for reducing the load of
         the database server
       – Update statements supported
       – CoopSC applied in the context of cloud environments
    CoopSC reduces the amount of data transferred
     between cloud and outside world which has economic
     benefits
    Performance benefits as long as cloud providers are
     stable

© 2012 UZH,
Questions?




© 2012 UZH,
Update Handling - Algorithm
 procedure Execute(query)
    quads = query.getIntersecteQuad(updateLevel);

     before = database.getTimestamps(quads);

     plan = rewrite(query, before);
     result = plan.execute();

     after = database.getTimestamps(quads);

     if (before == after)
           return result;
     else
           result database.execute(query);


© 2012 UZH,

Aims2012

  • 1.
    AIMS, Luxembourg, Luxembourg,June 6, 2012 Cooperative Database Caching within Cloud Environments Andrei Vancea1, Guilherme Sperb Machado1, Laurent d’Orazio2, Burkhard Stiller1 1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich UZH, Switzerland 2 Blaise Pascal University - LIMOS, France vancea,stiller@ifi.uzh.ch, laurent.dorazio@isima.fr © 2012 UZH,
  • 2.
    Background  Databases – Client: asks a query (SQL) – Server: returns the result (tuples)  Client-side caching – Page Caching, Tuple Caching – Semantic Caching • Clients store the results of old queries • Old results used for answering new queries © 2012 UZH,
  • 3.
    Background - SemanticCaching Query  Semantic Regions – Query description – Result set  Query rewriting Queries QUERY – Probe descriptions REWRITING – Remainder Probe Remainder Semantic Server cache © 2012 UZH,
  • 4.
    Database Caching &Cloud Computing  Most cloud providers charge data transfer between cloud environment and “outside world” in a pay-as-you- go matter  Database caching within cloud environment – Improves performance – Economic benefits • Amount of data transferred decreases Payments for data transferred reduced © 2012 UZH,
  • 5.
  • 6.
    Cooperative Semantic Caching Share local semantic caches between clients Use cache entries of other clients  Performance improvements © 2012 UZH,
  • 7.
    Cooperative Semantic Caching select * : select * fromresult age > 7 and age7 10 Q3 from persons where where age > <= persons select * from R1 result Q1 : select * fromresult where age > 10 persons R1 : age > 10 © 2012 UZH,
  • 8.
    Potential Use Cases  GIS (Geographic Information System) storage – Large amount of data (e.g. seismic events) – Processing done on client side – Two-dimensional range selections (area)  NetFlow-based architectures – Routers collect flow records and store them in databases – Analyzers (intrusion detection, accounting,… ) access them – Range selections (Start Time, IP) © 2012 UZH,
  • 9.
    Query Rewriting Query Query rewriting – Probe – Remote probes – Remainder All queries QUERY descriptions REWRITING ... Remote Remote Probe probe probe Remainder Local Remote Remote Server Semantic Semantic Semantic cache cache cache © 2012 UZH,
  • 10.
  • 11.
    CoopSC  Cooperative Semantic Caching Coop Query types – Selection (n-Dimensional range predicates) – select id, name, age from persons where 20 < age and age < 30 Cache organization – Semantic regions – Distributed Index – built on top of a P2P overlay © 2012 UZH,
  • 12.
    CoopSC - QueryRewriting  Local Rewriting Query – Probe Local Cache – Local Remainder Local Rewriting • Portion of the query which is Local not available in the local cache Remainder  Distributed Rewriting Distributed Distributed Rewriting Index – Remote Probes – Remainder … Probe Remote Remote Remainder Probe Probe © 2012 UZH,
  • 13.
    Distributed Index  Built on top of P2P overlay  Regions and queries represented as rectangular shapes  MX-CIF Quad Tree – Efficiently find intersection between rectangular shapes  Each region is indexed in the smallest quad which totally contains it  Easy to adapt to n-Dimensional regions/queries © 2012 UZH,
  • 14.
    Update Handling  Issues – Invalidation of old entries – Combining different snapshots can generate inconsistencies  Quad space division (specified update level)  Virtual timestamps stored in database  Each modification increments the virtual timestamp of corresponding quad  Regions store virtual timestamps of quads that they intersect © 2012 UZH,
  • 15.
  • 16.
    Cloud Scenario A  Database server running outside the cloud  Clients located inside in the cloud  Non-operational use cases – Example: cloud environment used for running scientific experiments © 2012 UZH,
  • 17.
    Cloud Scenario B  Database server running inside the cloud  Clients located inside in the cloud  Operational use cases – Example: corporation using cloud environment as an alternative to building a datacenter © 2012 UZH,
  • 18.
  • 19.
    Experiment Design  Measurements – Response time – Amount of data transferred – Payments for data transfer  Experiments – Cache size – Update level  Testing sessions – 5 select testing sessions (50 queries each) – Update sessions interleaved © 2012 UZH,
  • 20.
    Evaluation  Wisconsin benchmark dataset (10.000.000 tuples)  Scenario A – Database Server: Zurich testbed – 5 Client: Rackspace  Scenario B – Database server • Amazon EC2 – 5 Clients: EmanicsLab  Queries – About 10.000 tuples – Semantic locality © 2012 UZH,
  • 21.
  • 22.
    Data transferred/Payments  CoopSC significantly reduces the number of tuples sent by database server  Amount of money also reduced © 2012 UZH,
  • 23.
    Response Time  Rackspace behaves unstable  No performance improvements noticed © 2012 UZH,
  • 24.
  • 25.
    Data transferred/Payments  CoopSC significantly reduces the number of tuples sent by database server  Bandwidth payments also reduced © 2012 UZH,
  • 26.
    Response Time  CoopSC improves response time © 2012 UZH,
  • 27.
    Data transferred/Payments (Updates)  Good behavior for low update rate  Economic and performance benefits © 2012 UZH,
  • 28.
    Response Times (Updates)  Response increases with the grow of update rate © 2012 UZH,
  • 29.
    Summary & Conclusion  Summary – Cooperative caching approach used for reducing the load of the database server – Update statements supported – CoopSC applied in the context of cloud environments  CoopSC reduces the amount of data transferred between cloud and outside world which has economic benefits  Performance benefits as long as cloud providers are stable © 2012 UZH,
  • 30.
  • 31.
    Update Handling -Algorithm procedure Execute(query) quads = query.getIntersecteQuad(updateLevel); before = database.getTimestamps(quads); plan = rewrite(query, before); result = plan.execute(); after = database.getTimestamps(quads); if (before == after) return result; else result database.execute(query); © 2012 UZH,