Successfully reported this slideshow.

OpenStack Tokyo Talk Application Data Protection Service

7

Share

Loading in …3
×
1 of 31
1 of 31

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

OpenStack Tokyo Talk Application Data Protection Service

  1. 1. OpenStack Summit Tokyo 2015 Wang Hao, Software Engineer, Huawei IT Product Line Eran Gampel, Cloud Chief Architect , Huawei European Research Center Oshrit Feder, IBM Research - Haifa Cloud DR Orchestration: Beyond volume replication
  2. 2. Agenda Why we need disaster recovery? Replication in Cinder Hypervisor-based DR ADPaaS: Project Smaug Demo
  3. 3. Why do we need disaster recovery? Customers want 24x7 service availability Hardware Failures Human Error Accidents and Natural Disasters
  4. 4. Cinder Volume Replication Backup Snapshot Existing Data Protection Mechanisms In OpenStack
  5. 5. Got version 2 of replication in Liberty release Improve and make it more widely usable by other backend devices. None driver supported yet Implemented for Juno release Upstream OS code merged Support to IBM Storwize/SVC driver Begin from Icehouse summit Design summit on volume replication Status of Replication in Cinder
  6. 6. The main use of volume replication is resiliency in presence of failures. OpenStack Storage Backend Storage Backend Cinder DC#1 DC#2 Data Replication Use Case of Replication OpenStack
  7. 7. 4 1 2 3 5 6 7 8 9 Create Volume Type Create Volume Schedule Backend Replication Setup Replication Pair Replication StatusPromote Replica Recover from Replica Fail-Back Test Replication v1.0: Workflow
  8. 8. 4 1 2 3 5 6 7 8 9 Create Volume Type Create Volume Schedule Backend Replication Driver Selects Target & Setup Replication Pair Replication Status via Driver Report Failover to Secondary via API Recover from Replica Enable/Disable Replication Query Volume Replication Targets Replication v2.0: Workflow
  9. 9. Hypervisor-Level Replication Software-based Alternative for replication
  10. 10. Hypervisor LevelHardware Level Replication Solution Types Case in point: Hardware vs. Hypervisor Volume Storage HW Hypervisor VM IO Mirroring Replication Agent Volume Storage HW Volume Storage HW Hypervisor VM Volume Storage HW Source Target Source Target
  11. 11. Production Site DR Site DR Manager DR Manager Host IO Mirror VM VM VM Storage hypervisor VRGOpenStack Host Write Agent Storage hypervisor VRG OpenStack WAN OpenStack® Component New Component Vendor Component Protected VM Control Path Data Path Another choice: Hypervisor DR
  12. 12. IO Commands IO Completion IO Capture Write as normal Write ACK IO replication Queue IO Forwarding ,Compression and Encryption IO cache, Decompression and Decryption Write ACK IO Completion Write Write ACK IO Parsing Production Site DR Site Guest OS IO Mirror VRG VRG Write Agent Hypervisor DR: IO Mirroring
  13. 13. Setup Connection with vRG Start CBT Data Replication Consistency Check Queue Data Replication Queue overflow CBT done Finished1.Host abnormal restart 2. Swap(re-protect) Stop Hypervisor DR: IO Mirroring State Machine
  14. 14. 4 1 2 3 5 6 7 8 Configure Hypervisor Create VMs Protected Group Protection Policy Replication Start Create Recovery PlanFail-Over Re-Protect Fail-Back Hypervisor DR: Simplified Workflow
  15. 15. Replication Type HW Array Replication Hypervisor Replication Multi-Vendor Hardware Agnostic   No Impact on Compute Performance   No Special Network/Storage Privileges   No Special Admin Skillset Required   Transparent Deduplication   Virtualization-Ready   Cross VM Consistency Grouping Support   Cross Array Consistency Group Support   Hypervisor DR: HW(Array) vs. Hypervisor
  16. 16. Multiple Use Cases, Multiple Protection Plans Users need to be able to Choose the right protection plan Vendors need a way to plug different implementations
  17. 17. One API To Rule Them All
  18. 18. Is Data == Storage? Data Protection Service
  19. 19. DPaaS Architecture DPaaS Service APIs (REST) Service APIs (REST) File-Level Restore Policy Verification Replication Cinder Controller iSCSI FusionStorage eBackup Swift Cinder-Volume Cinder-Backup Message Queue Message Queue Cinder-API Swift Nova Horizon eBackup Metadata Backend Metadata OpenStack® Component New Component Huawei/Commercial Product Tiering Future release
  20. 20. But… We want to protect Applications, Services, Resources…
  21. 21. Case in point: Typical 3-tier Cloud App
  22. 22. Case in point: Typical 3-tier Cloud App Volume Web Net Router SG Web Srv 1 Project Web Srv 2 Image SG App Net App Server DB Net DB Server Image Image Volume
  23. 23. Data>> We need to protect all resources Storage
  24. 24. Introducing Smaug Application Data Protection as a Service
  25. 25. Smaug: Mission Statement Formalize Application Data Protection in OpenStack APIs, Services, Plugins, … Be able to protect Any Resource in OpenStack (as well as their dependencies) Allow Diversity of vendor solutions, capabilities and implementations without compromising usability
  26. 26. Smaug: Highlights Open Architecture Vendors create plugins that implement Protection mechanisms for different OpenStack resources User perspective: Protect App Deployment Configure and manage custom protection plans on the deployed resources (topology, VMs, volumes, images, …) Admin perspective: Define Protectable Resources Decide what plugins protect which resources, what is available for the user Decide where users can protect their resources
  27. 27. How to protect? (Protection Plans) Smaug: Application Data Protection as a Service What is protected? (Protected Resources) Where to protect? (Protection Banks) What was protected? (Protection Transactions) Who protects? (Protection Providers) Plan API Protection Resource API Protection Transaction API Bank API Pluggable Plan Enforcer Service Resource Protection Service Bank Vault Resource Protection Plugin Orchestrate
  28. 28. Overview Swift S3 … What is protected? (Protected Resources) VM Image Topology Volume How to protect? (Protection Plans) Protection Plan Name ID Protected Resource Trigger Retries Bank Options Volume Protection Plugin Backup Replication SnapshotWho protects? (Protection Providers) VM Protection Plugin Image Protection Plugin Topology Protection Plugin Protect Restore Verify OptionSchema ResultsSchema Protection API Read Write Bank API Where to protect? (Protection Banks) Bank Vault Cinder Nova … What was protected? (Protection Transactions) Ledger ProtectionTransaction implements Manual Time Event
  29. 29. Help us Build Smaug – Join the project https://launchpad.net/smaug IRC (gampel) eran.gampel@huawei.com oshritf@il.ibm.com Download Link
  30. 30. Demo Time Video -- Application DR With IBM Cloud Manger References Paris summit talk & demo European FP7 ORBIT Research project IBM Cloud Manager with Openstack
  31. 31. Thanks

Editor's Notes

  • Service continuity
    Hardware can fail, sometimes
    People make mistakes, sometimes
    Natural Calamities, or cataclysmic events (like fire, tornado, etc.)
  • Replication is for critical data and has relatively shorter lifespan
    Backup has longer lifespan, but is snapshot-based, so your RPO is not as good.
  • Cloud admin create a volume type with capabilities:replication="<is> True“
    End users use this volume type to create volume
    Cinder scheduler will choose a backend supporting replication
    The backend will create a volume replica & setup replication between two volumes
    Cinder have periodic task to update volumes’ replication status
    When disaster happen, the cloud admin promotes the replica
    Users can use those volumes in the secondary data center with its storage
    As part of the fail-back process, re-enable the replication between the primary and secondary volumes
    Users can test the replication by creating volume with –source-replica

  • 4. According the configuration in cinder.conf, driver will choose replication target device to create replica & setup replication between two volumes
    5. If replication is enable in driver, update the replication status in driver report periodic task
    6. When disaster happen, the cloud admin failover a replicating volume to it's secondary via “failover_replication” API
    8. Cloud admin also can enable/disable replication on a replication capable volume for some use case, like maintenance
    9. Cloud admin also can query a volume for a list of configured replication targets
  • IO Mirror state machine:
    CBT(changed Block Tracking) replication: based on “Bitmap”
    Queue replication: In this state, user can create a snapshot for replication data.
    Consistency check

    Start
    Setup Connection with Virtual Replication Gateway
    Initial Replication
    Host normal restart, data in queue during shutdown is written to disk by using CBT bitmap
    CBT Data Replication
    CBT bitmap is clear, proceed to Queue-based
    If Queue in overflow, switch to CBT
    On Host Abnormal Restart or Swap (re-protect)
    Do Consistency Check and then CBT data replication
  • Install and Configure Hypervisor with replication capabilities.
    DR admin creates a Protected Group for VMS in dashboard
    DR admin can define the Protection Policy (encryption, compression, RPO, etc)
    When admin create the protect group, replication start, IO Mirror will send IO data to VRG.
    DR admin creates a Recovery Plan for fail-over, replication test and fail-back
    When disaster happens, DR admin chooses the fail-over recovery plan by using snapshot or newest data in DR site
    DR admin can use re-protect to swap production site and DR site. System will replicate data from new production sit to new DR site.
    If needing fail-back, DR admin choose the recovery plan to make data consistency between production site and DR site.
  • So… what do we need??
  • Is data only storage?
    If it where so, we would need just Data Protection.
    For example… (move slide)
  • We start by define the API and the services frameworks
  • ×