SlideShare a Scribd company logo
Aaron J. Elmore, Sudipto Das,
  Divyakant Agrawal, Amr El Abbadi
               Distributed Systems Lab
University of California Santa Barbara
   Serve thousands of applications (tenants)
    ◦ AppEngine, Azure, Force.com
   Tenants are (typically)
    ◦   Small
    ◦   SLA sensitive
    ◦   Erratic load patterns
    ◦   Subject to flash crowds
           i.e. the fark, digg, slashdot, reddit effect (for now)
   Support for Multitenancy is critical
   Our focus: DBMSs serving these platforms


                                    Sudipto Das {sudipto@cs.ucsb.edu}
What the service
What the tenant wants…        provider wants…

                         Sudipto Das {sudipto@cs.ucsb.edu}
Static provisioning for peak is inelastic

                       Capacity




                                          Resources
Resources




                                                                                    Capacity

                          Demand                                                    Demand
               Time                                                  Time

   Traditional Infrastructures                        Deployment in the Cloud

                           Unused resources
                                                       Slide Credits: Berkeley RAD Lab

                                 Sudipto Das {sudipto@cs.ucsb.edu}
Load Balancer


                                            Application/
                                            Web/Caching
                                            tier



                                           Database tier



       Sudipto Das {sudipto@cs.ucsb.edu}
   Migrate a tenant’s database in a Live
    system
    ◦ A critical operation to support elasticity
   Different from
    ◦ Migration between software versions
    ◦ Migration in case of schema evolution




                              Sudipto Das {sudipto@cs.ucsb.edu}
   VM migration [Clark et al., NSDI 2005]
   One tenant-per-VM
    ◦ Pros: allows fine-grained load balancing
    ◦ Cons
      Performance overhead
      Poor consolidation ratio [Curino et al., CIDR 2011]
   Multiple tenants in a VM
    ◦ Pros: good performance
    ◦ Cons: Migrate all tenants  Coarse-grained load
      balancing




                               Sudipto Das {sudipto@cs.ucsb.edu}
   Multiple tenants share the same
    database process
    ◦ Shared process multitenancy
    ◦ Example systems: SQL Azure, ElasTraS, RelationalCloud,
      and may more

   Migrate individual tenants
       VM migration cannot be used for fine-grained
        migration
   Target architecture: Shared Nothing
    ◦ Shared storage architectures: see our VLDB 2011 Paper



                               Sudipto Das {sudipto@cs.ucsb.edu}
Sudipto Das {sudipto@cs.ucsb.edu}
   How to ensure no downtime?
      Need to migrate the persistent database image
       (tens of MBs to GBs)
   How to guarantee correctness during
    failures?
      Nodes can fail during migration
      How to ensure transaction atomicity and durability?
      How to recover migration state after failure?
        Nodes recover after a failure
   How to guarantee serializability?
      Transaction correctness equivalent to normal
       operation
   How to minimize migration cost? …

                               Sudipto Das {sudipto@cs.ucsb.edu}
   Downtime
    ◦ Time tenant is unavailable
   Service Interruption
    ◦ Number of operations failing/transactions aborting
   Migration Overhead/Performance
    impact
    ◦ During normal operation, migration, and after
      migration
   Additional Data Transferred
    ◦ Data transferred in addition to DB’s persistent image



                              Sudipto Das {sudipto@cs.ucsb.edu}
   Migration executed in phases
      Starts with transfer of minimal information to destination
       (“wireframe”)
   Source and destination concurrently execute
    transactions in one migration phase
   Database pages used as granule of migration
      Pages “pulled” by destination on-demand
   Minimal transaction synchronization
      A page is uniquely owned by either source or destination
      Leverage page level locking
   Logging and handshaking protocols to
    tolerate failures

                                 Sudipto Das {sudipto@cs.ucsb.edu}
 For    this talk
    ◦ Small tenants
      i.e. not sharded across nodes.
    ◦ No replication
    ◦ No structural changes to indices
   Extensions in the paper
    ◦ Relaxes these assumptions




                                Sudipto Das {sudipto@cs.ucsb.edu}
P1
                       P2
    Owned Pages        P3



                        Pn

Active transactions
                      TS1,…,
                        TSk
                      Source          Destination
                                                               Page owned by Node

                                                               Page not owned by Node

                               Sudipto Das {sudipto@cs.ucsb.edu}
Freeze index wireframe and migrate

                       P1                          P1
                       P2                          P2
    Owned Pages        P3                          P3                 Un-owned Pages


                        Pn                          Pn
                      TS1,…,
Active transactions
                        TSk
                      Source             Destination
                                                                  Page owned by Node

                                                                  Page not owned by Node

                                  Sudipto Das {sudipto@cs.ucsb.edu}
Source                             Destination


         Sudipto Das {sudipto@cs.ucsb.edu}
Requests for un-owned pages can block

                       P1      P3 accessed            P1
                        P2         by TDi             P2
                        P3                            P3

                                P3 pulled
                        Pn        from                Pn
                                 source
Old, still active    TSk+1,…                      TD1,…,                 New transactions
transactions           , TSl                       TDm
                     Source                Destination
                                                                     Page owned by Node
          Index wireframes remain frozen
                                                                    Page not owned by Node

                                     Sudipto Das {sudipto@cs.ucsb.edu}
Pages can be pulled by the destination, if needed

                P1                           P1
                P2                           P2
                P3                           P3

                         P1, P2, …
                          pushed
                 Pn    from source           Pn

Completed
                                         TDm+1,
                                         …, TDn
              Source              Destination
                                                            Page owned by Node

                                                           Page not owned by Node

                            Sudipto Das {sudipto@cs.ucsb.edu}
Index wireframe un-frozen

                                       P1
                                       P2
                                       P3



                                       Pn
                                 TDn+1,…
                                   , TDp
         Source             Destination
                                                       Page owned by Node

                                                       Page not owned by Node

                       Sudipto Das {sudipto@cs.ucsb.edu}
   Once migrated, pages are never pulled
    back by source
    ◦ Transactions at source accessing migrated pages are
      aborted
   No structural changes to indices during
    migration
    ◦ Transactions (at both nodes) that make structural
      changes to indices abort
   Destination “pulls” pages on-demand
    ◦ Transactions at the destination experience higher
      latency compared to normal operation




                              Sudipto Das {sudipto@cs.ucsb.edu}
   Only concern is “dual mode”
    ◦ Init and Finish: only one node is executing transactions
   Local predicate locking of internal index
    and exclusive page level locking
    between nodes  no phantoms
   Strict 2PL  Transactions are locally
    serializable
   Pages transferred only once
    ◦ No Tdest  Tsource conflict dependency
   Guaranteed serializability


                               Sudipto Das {sudipto@cs.ucsb.edu}
   Transaction recovery
    ◦ For every database page, transactions at source
      ordered before transactions at destination
    ◦ After failure, conflicting transactions replayed in
      the same order
   Migration recovery
    ◦ Atomic transitions between migration modes
       Logging and handshake protocols
    ◦ Every page has exactly one owner
      Bookkeeping at the index level



                             Sudipto Das {sudipto@cs.ucsb.edu}
   In the presence of arbitrary repeated
    failures, Zephyr ensures:
    ◦ Updates made to database pages are consistent
    ◦ A failure does not leave a page without an owner
    ◦ Both source and destination are in the same
      migration mode
   Guaranteed termination and
    starvation freedom



                           Sudipto Das {sudipto@cs.ucsb.edu}
   Replicated Tenants
   Sharded Tenants
   Allow structural changes to the indices
    ◦ Using shared lock managers in the dual mode




                          Sudipto Das {sudipto@cs.ucsb.edu}
   Prototyped using an open source OLTP
    database H2
    ◦   Supports standard SQL/JDBC API
    ◦   Serializable isolation level
    ◦   Tree Indices
    ◦   Relational data model
   Modified the database engine
    ◦ Added support for freezing indices
    ◦ Page migration status maintained using index
    ◦ Details in the paper…
   Tungsten SQL Router migrates JDBC
    connections during migration


                                Sudipto Das {sudipto@cs.ucsb.edu}
   Two database nodes, each with a DB
    instance running
   Synthetic benchmark as load
    generator
    ◦ Modified YCSB to add transactions
      Small read/write transactions
   Compared against Stop and Copy
    (S&C)



                           Sudipto Das {sudipto@cs.ucsb.edu}
Default transaction
                                                            parameters:
                                                         10 operations per
                                                      transaction 80% Read,
 System                                               15% Update, 5% Inserts
                              Metadata
Controller
                                                     Workload: 60 sessions
                                                  100 Transactions per session
   Migrate

                                                      Hardware: 2.4 Ghz Intel
                                                     Core 2 Quads, 8GB RAM,
                                                     7200 RPM SATA HDs with
                                                          32 MB Cache
                                                         Gigabit ethernet


             Default DB Size: 100k rows
                     (~250 MB)
                                  Sudipto Das {sudipto@cs.ucsb.edu}
   Downtime (tenant unavailability)
    ◦ S&C: 3 – 8 seconds (needed to migrate,
      unavailable for updates)
    ◦ Zephyr: No downtime. Either source or destination
      is available
   Service interruption (failed operations)
    ◦ S&C: ~100 s – 1,000s. All transactions with updates
      are aborted
    ◦ Zephyr: ~10s – 100s. Orders of magnitude less
      interruption



                             Sudipto Das {sudipto@cs.ucsb.edu}
   Average increase in transaction latency
    (compared to the 6,000 transaction
    workload without migration)
    ◦ S&C: 10 – 15%. Cold cache at destination
    ◦ Zephyr: 10 – 20%. Pages fetched on-demand
   Data transfer
    ◦ S&C: Persistent database image
    ◦ Zephyr: 2 – 3% additional data transfer (messaging
      overhead)
   Total time taken to migrate
    ◦ S&C: 3 – 8 seconds. Unavailable for any writes
    ◦ Zephyr: 10 – 18 seconds. No-unavailability


                              Sudipto Das {sudipto@cs.ucsb.edu}
Orders of
                    magnitude
                    fewer failed
                    operations




Sudipto Das {sudipto@cs.ucsb.edu}
   Proposed Zephyr, a live database
    migration technique with no downtime
    for shared nothing architectures
    ◦ The first end to end solution with safety, correctness
      and liveness guarantees
   Prototype implementation on a
    relational OLTP database
   Low cost on a variety of workloads



                              Sudipto Das {sudipto@cs.ucsb.edu}
Txns




       Source                             Destination
                Sudipto Das {sudipto@cs.ucsb.edu}
Txns




       Source                             Destination
                Sudipto Das {sudipto@cs.ucsb.edu}
Txns




       Source                             Destination
                Sudipto Das {sudipto@cs.ucsb.edu}
Txns




       Source                             Destination
                Sudipto Das {sudipto@cs.ucsb.edu}
Txns




       Source                             Destination
                Sudipto Das {sudipto@cs.ucsb.edu}   37
Txns




       Source                             Destination
                Sudipto Das {sudipto@cs.ucsb.edu}
   Either source or destination is serving the
    tenant
    ◦ No downtime
   Serializable transaction execution
    ◦ Unique page ownership
    ◦ Local multi-granularity locking
   Safety in the presence of failures
    ◦ Transactions are atomic and durable
    ◦ Migration state is recovered from log
      Ensure consistency of the database state



                               Sudipto Das {sudipto@cs.ucsb.edu}
   Wireframe copy
      Typically orders of magnitude smaller than data
   Operational overhead during
    migration
      Extra data (in addition to database pages)
       transferred
   Transactions aborted during migration




                           Sudipto Das {sudipto@cs.ucsb.edu}
Failures due to
                   attempted
                   modification of
                   Index structure




Sudipto Das {sudipto@cs.ucsb.edu}
   Only committed
                       transaction
                       reported
                      Loss of cache for
                       both migration
                       types
                      Zephyr results in a
                       remote fetch




Sudipto Das {sudipto@cs.ucsb.edu}

More Related Content

Similar to Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryad
butest
 
Enabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series DataEnabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series Data
InfluxData
 
DDS vs AMQP
DDS vs AMQPDDS vs AMQP
DDS vs AMQP
Angelo Corsaro
 
Rain up presentation (erlang factory)
Rain up presentation (erlang factory)Rain up presentation (erlang factory)
Rain up presentation (erlang factory)
John Vlachoyiannis
 
10 things ever architect should know about the Windows Azure Platform - ericnel
10 things ever architect should know about the Windows Azure Platform -  ericnel10 things ever architect should know about the Windows Azure Platform -  ericnel
10 things ever architect should know about the Windows Azure Platform - ericnel
Eric Nelson
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Electronic Arts / DICE
 

Similar to Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms (6)

Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryad
 
Enabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series DataEnabling Edge-Cloud Duality of Time Series Data
Enabling Edge-Cloud Duality of Time Series Data
 
DDS vs AMQP
DDS vs AMQPDDS vs AMQP
DDS vs AMQP
 
Rain up presentation (erlang factory)
Rain up presentation (erlang factory)Rain up presentation (erlang factory)
Rain up presentation (erlang factory)
 
10 things ever architect should know about the Windows Azure Platform - ericnel
10 things ever architect should know about the Windows Azure Platform -  ericnel10 things ever architect should know about the Windows Azure Platform -  ericnel
10 things ever architect should know about the Windows Azure Platform - ericnel
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 

Recently uploaded

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 

Recently uploaded (20)

GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

  • 1. Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara
  • 2. Serve thousands of applications (tenants) ◦ AppEngine, Azure, Force.com  Tenants are (typically) ◦ Small ◦ SLA sensitive ◦ Erratic load patterns ◦ Subject to flash crowds  i.e. the fark, digg, slashdot, reddit effect (for now)  Support for Multitenancy is critical  Our focus: DBMSs serving these platforms Sudipto Das {sudipto@cs.ucsb.edu}
  • 3. What the service What the tenant wants… provider wants… Sudipto Das {sudipto@cs.ucsb.edu}
  • 4. Static provisioning for peak is inelastic Capacity Resources Resources Capacity Demand Demand Time Time Traditional Infrastructures Deployment in the Cloud Unused resources Slide Credits: Berkeley RAD Lab Sudipto Das {sudipto@cs.ucsb.edu}
  • 5. Load Balancer Application/ Web/Caching tier Database tier Sudipto Das {sudipto@cs.ucsb.edu}
  • 6. Migrate a tenant’s database in a Live system ◦ A critical operation to support elasticity  Different from ◦ Migration between software versions ◦ Migration in case of schema evolution Sudipto Das {sudipto@cs.ucsb.edu}
  • 7. VM migration [Clark et al., NSDI 2005]  One tenant-per-VM ◦ Pros: allows fine-grained load balancing ◦ Cons  Performance overhead  Poor consolidation ratio [Curino et al., CIDR 2011]  Multiple tenants in a VM ◦ Pros: good performance ◦ Cons: Migrate all tenants  Coarse-grained load balancing Sudipto Das {sudipto@cs.ucsb.edu}
  • 8. Multiple tenants share the same database process ◦ Shared process multitenancy ◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more  Migrate individual tenants  VM migration cannot be used for fine-grained migration  Target architecture: Shared Nothing ◦ Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {sudipto@cs.ucsb.edu}
  • 10. How to ensure no downtime?  Need to migrate the persistent database image (tens of MBs to GBs)  How to guarantee correctness during failures?  Nodes can fail during migration  How to ensure transaction atomicity and durability?  How to recover migration state after failure?  Nodes recover after a failure  How to guarantee serializability?  Transaction correctness equivalent to normal operation  How to minimize migration cost? … Sudipto Das {sudipto@cs.ucsb.edu}
  • 11. Downtime ◦ Time tenant is unavailable  Service Interruption ◦ Number of operations failing/transactions aborting  Migration Overhead/Performance impact ◦ During normal operation, migration, and after migration  Additional Data Transferred ◦ Data transferred in addition to DB’s persistent image Sudipto Das {sudipto@cs.ucsb.edu}
  • 12. Migration executed in phases  Starts with transfer of minimal information to destination (“wireframe”)  Source and destination concurrently execute transactions in one migration phase  Database pages used as granule of migration  Pages “pulled” by destination on-demand  Minimal transaction synchronization  A page is uniquely owned by either source or destination  Leverage page level locking  Logging and handshaking protocols to tolerate failures Sudipto Das {sudipto@cs.ucsb.edu}
  • 13.  For this talk ◦ Small tenants  i.e. not sharded across nodes. ◦ No replication ◦ No structural changes to indices  Extensions in the paper ◦ Relaxes these assumptions Sudipto Das {sudipto@cs.ucsb.edu}
  • 14. P1 P2 Owned Pages P3 Pn Active transactions TS1,…, TSk Source Destination Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
  • 15. Freeze index wireframe and migrate P1 P1 P2 P2 Owned Pages P3 P3 Un-owned Pages Pn Pn TS1,…, Active transactions TSk Source Destination Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
  • 16. Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
  • 17. Requests for un-owned pages can block P1 P3 accessed P1 P2 by TDi P2 P3 P3 P3 pulled Pn from Pn source Old, still active TSk+1,… TD1,…, New transactions transactions , TSl TDm Source Destination Page owned by Node Index wireframes remain frozen Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
  • 18. Pages can be pulled by the destination, if needed P1 P1 P2 P2 P3 P3 P1, P2, … pushed Pn from source Pn Completed TDm+1, …, TDn Source Destination Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
  • 19. Index wireframe un-frozen P1 P2 P3 Pn TDn+1,… , TDp Source Destination Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
  • 20. Once migrated, pages are never pulled back by source ◦ Transactions at source accessing migrated pages are aborted  No structural changes to indices during migration ◦ Transactions (at both nodes) that make structural changes to indices abort  Destination “pulls” pages on-demand ◦ Transactions at the destination experience higher latency compared to normal operation Sudipto Das {sudipto@cs.ucsb.edu}
  • 21. Only concern is “dual mode” ◦ Init and Finish: only one node is executing transactions  Local predicate locking of internal index and exclusive page level locking between nodes  no phantoms  Strict 2PL  Transactions are locally serializable  Pages transferred only once ◦ No Tdest  Tsource conflict dependency  Guaranteed serializability Sudipto Das {sudipto@cs.ucsb.edu}
  • 22. Transaction recovery ◦ For every database page, transactions at source ordered before transactions at destination ◦ After failure, conflicting transactions replayed in the same order  Migration recovery ◦ Atomic transitions between migration modes  Logging and handshake protocols ◦ Every page has exactly one owner  Bookkeeping at the index level Sudipto Das {sudipto@cs.ucsb.edu}
  • 23. In the presence of arbitrary repeated failures, Zephyr ensures: ◦ Updates made to database pages are consistent ◦ A failure does not leave a page without an owner ◦ Both source and destination are in the same migration mode  Guaranteed termination and starvation freedom Sudipto Das {sudipto@cs.ucsb.edu}
  • 24. Replicated Tenants  Sharded Tenants  Allow structural changes to the indices ◦ Using shared lock managers in the dual mode Sudipto Das {sudipto@cs.ucsb.edu}
  • 25. Prototyped using an open source OLTP database H2 ◦ Supports standard SQL/JDBC API ◦ Serializable isolation level ◦ Tree Indices ◦ Relational data model  Modified the database engine ◦ Added support for freezing indices ◦ Page migration status maintained using index ◦ Details in the paper…  Tungsten SQL Router migrates JDBC connections during migration Sudipto Das {sudipto@cs.ucsb.edu}
  • 26. Two database nodes, each with a DB instance running  Synthetic benchmark as load generator ◦ Modified YCSB to add transactions  Small read/write transactions  Compared against Stop and Copy (S&C) Sudipto Das {sudipto@cs.ucsb.edu}
  • 27. Default transaction parameters: 10 operations per transaction 80% Read, System 15% Update, 5% Inserts Metadata Controller Workload: 60 sessions 100 Transactions per session Migrate Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache Gigabit ethernet Default DB Size: 100k rows (~250 MB) Sudipto Das {sudipto@cs.ucsb.edu}
  • 28. Downtime (tenant unavailability) ◦ S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) ◦ Zephyr: No downtime. Either source or destination is available  Service interruption (failed operations) ◦ S&C: ~100 s – 1,000s. All transactions with updates are aborted ◦ Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {sudipto@cs.ucsb.edu}
  • 29. Average increase in transaction latency (compared to the 6,000 transaction workload without migration) ◦ S&C: 10 – 15%. Cold cache at destination ◦ Zephyr: 10 – 20%. Pages fetched on-demand  Data transfer ◦ S&C: Persistent database image ◦ Zephyr: 2 – 3% additional data transfer (messaging overhead)  Total time taken to migrate ◦ S&C: 3 – 8 seconds. Unavailable for any writes ◦ Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {sudipto@cs.ucsb.edu}
  • 30. Orders of magnitude fewer failed operations Sudipto Das {sudipto@cs.ucsb.edu}
  • 31. Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures ◦ The first end to end solution with safety, correctness and liveness guarantees  Prototype implementation on a relational OLTP database  Low cost on a variety of workloads Sudipto Das {sudipto@cs.ucsb.edu}
  • 32.
  • 33. Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
  • 34. Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
  • 35. Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
  • 36. Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
  • 37. Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu} 37
  • 38. Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
  • 39. Either source or destination is serving the tenant ◦ No downtime  Serializable transaction execution ◦ Unique page ownership ◦ Local multi-granularity locking  Safety in the presence of failures ◦ Transactions are atomic and durable ◦ Migration state is recovered from log  Ensure consistency of the database state Sudipto Das {sudipto@cs.ucsb.edu}
  • 40. Wireframe copy  Typically orders of magnitude smaller than data  Operational overhead during migration  Extra data (in addition to database pages) transferred  Transactions aborted during migration Sudipto Das {sudipto@cs.ucsb.edu}
  • 41. Failures due to attempted modification of Index structure Sudipto Das {sudipto@cs.ucsb.edu}
  • 42. Only committed transaction reported  Loss of cache for both migration types  Zephyr results in a remote fetch Sudipto Das {sudipto@cs.ucsb.edu}

Editor's Notes

  1. Good afternoon. Today I’ll be presenting our paper entitled “Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms”. This is joint work with my colleague Aaron Elmore and our advisors DivyAgrawal and Amr El Abbadi at UC Santa Barbara.
  2. Many of us in this room are familiar with the various popular cloud application platforms such as Google AppEngine, MS Azure, and Force.com. These platforms serve thousands of applications (or tenants) that are typically small, are sensitive to SLAs, have erratic or unpredictable load patterns often resulting from flash crowds.In order to allow effective resource utilization and to optimize the system’s operating cost, it is important to share resources between these tenants. Support for multitenancy in these systems is therefore critical.Our focus for this talk is multitenancy in the database systems that serve these application platforms.
  3. When we talk about multitenancy, the tenants’ and the providers’ interests often conflict.For instance, if we consider the case of a phone booth as the service, from the tenant’s perspective, it would like to have the entire phone booth for herself to make the call.On the other hand, from the provider’s perspective, it would want to pack as many tenants as possible into the phone booth so that every tenant can barely make a call.