Create a backup site with opensourceFabrizio Manfredi Furuholmen                                Beolink.org
Agenda                            Beolink.org           Introduction (Part I)                Disaster Recovery Plan     ...
Disaster Statistics & PotentialImplications                                                                          Beoli...
What is a Disaster Recovery Plan ?      Beolink.org                 Disaster recovery plan (DRP)                    Disast...
Classification of Risk                                Beolink.org Natural disasters    Such as tornadoes, floods, blizzard...
BIA                                                    Beolink.org  How many transactions can be lost without   significa...
Business Continuity Plan           Beolink.orgDisaster recovery planning (DRP)  is a subset of a larger process  known as ...
Technology problem ?                                                            Beolink.org                               ...
Disaster recovery starting point               Beolink.org    Recovery point objective (RPO)      Maximum data you can lo...
Basic Steps                    Beolink.org  ystem Duplication S     ardware    H     etwork    N     perating System  ...
Disaster Recovery Components             Beolink.org                               Elements                               ...
Standby                                                 Beolink.orgType        OS        Database        Application      ...
Bottlenecks                    Beolink.org Latency (speed of light)    Asynchronous operation    MultiStream/Parallel ...
Build a Disaster recovery site        Beolink.org                 What do you need ?                             14       ...
Infrastructure Software                     Beolink.org Componets         Software Virtualization   Vmware, XEN, KVM,     ...
Data Layer                                          Beolink.org       Off-Line                                            ...
Data Layer: Amanda                                                  Beolink.orgCompression                  Amanda.conf:  ...
Data Layer: rsync                                                          Beolink.orgDiffs Only actual changed pieces of ...
Data Layer: GlusterFS                         Beolink.org                                Features                         ...
Data Layer: GlusterFS                                                          Beolink.orgReplication        volume posix ...
Data Layer: Database              Beolink.org Builtin     Mysql Master – Slave     Mysql Multi-Master External     Tung...
Application Layer                       Beolink.org Data replication    Cluster JDBC    Group messaging (spread,     jg...
Application Layer: spread                                   Beolink.org     Spread     functions as a unified message bus f...
Application Layer: Example          Beolink.orgSession    Distributed file system    Replicated Database    AppServer C...
Routing Layer                              Beolink.org DNS: Bind                            Several                      ...
Routing Layer: Quagga                                   Beolink.org Quagga    CISCO IOS syntax    All routing protocol ...
Routing Layer: Heart               Beolink.orgMonitoring System (outside) Probe/check external Remote Invocation scripts...
Important Things                             Beolink.org      Business                        How the service             ...
Case Studies                  Beolink.org                Case Studies                      29                             ...
Case study                              Beolink.org Company type:    Heavy industry Location:        Single production sit...
Case study                               Beolink.orgWhat happened ?Perfect plan, everything OKBut…Production restart after...
Base Solution: small-middle companies   Beolink.org Switch by hand     Configuration     Restore     Starting point R...
Medium/High Solution: Enterprise   Beolink.org Automatic Switch     IP migration (one IP      per service)     Change F...
Conclusion                                           Beolink.org “Everything                Only Business          Right R...
I look forward to meeting you…                             Beolink.org                  XVI European AFS meeting 2009     ...
Thank youmanfred@freemails.chwww.beolink.org                        Beolink.org
Upcoming SlideShare
Loading in …5
×

Disaster recovery

709 views

Published on

Disaster recovery and business continuity planning are processes which help organizations prepare for disruptive events. The talk explains the basic concepts of business continuity, giving a brief overview on the business continuity plan and more detail informations (technical) on how to setup a Disaster Recovery site . We show two different approaches for creating a disaster recovery (DR) site, one the based on operating system layer and one based on the right design of the applications . The common elements on the two approaches are network design, data replication, monitoring system and system/configuration management. All these elements can be implemented with open source software, we explain advantages and disadvantages, performances and layouts on each solutions.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
709
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
49
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Disaster recovery

  1. 1. Create a backup site with opensourceFabrizio Manfredi Furuholmen Beolink.org
  2. 2. Agenda Beolink.org   Introduction (Part I)   Disaster Recovery Plan   Business continuity Plan   Disaster Recovery Components (Part II)   Starting point   Software list   Design Pattern   Case Studies (Part III) 2 8/23/09
  3. 3. Disaster Statistics & PotentialImplications Beolink.org  90% of businesses that lose data from a disaster are forced to close down within 2 years since the disaster.  80% of businesses without a well structured recovery plan are forced to close down within 12 months since the flood or fire.  43% of companies experiencing disasters never recover.  50% of companies experiencing a computer outage will be forced to close down within 5 years  Companies experiencing a computer outage lasting longer than 10 days will never recover its full financial capacity.  Less than 50% of all organizations in the UK have a business continuity plan or disaster recovery plan.  One out of 500 data centers experience a severe disaster every year. 3 8/23/09
  4. 4. What is a Disaster Recovery Plan ? Beolink.org Disaster recovery plan (DRP) Disaster recovery plan consists of the precautions taken so that the effects of a disaster will be minimized and the organization will be able to either maintain or quickly resume mission critical functions 4 8/23/09
  5. 5. Classification of Risk Beolink.org Natural disasters Such as tornadoes, floods, blizzards, earthquakes and fire Accidents Sabotage  Power and energy disruptions Service sector failure Such as communications, transportation … Environmental disasters Such as pollution and hazardous materials spills Cyber attacks and hacker activity 5 8/23/09
  6. 6. BIA Beolink.org  How many transactions can be lost without significantly impacting revenue or productivity?  Does a majority of the businesses depend upon one or more mission critical applications?  How much revenue per hour would be lost when these critical applications remain unavailable?  Are there periods of time, for example the end of fiscal quarters or holiday seasons, when an outage would cause a greater disruption?  How will productivity be affected if critical applications become unavailable?  In the event of an unexpected outage, how will your partners, vendors and customers be affected ?  Historically, what has been the total cost of lost productivity and revenue during downtime? 6 8/23/09
  7. 7. Business Continuity Plan Beolink.orgDisaster recovery planning (DRP) is a subset of a larger process known as Business Continuity Plan.Business Continuity Plan (BCP) A business continuity plan enables critical services or products to be continually delivered to clients. 7 8/23/09
  8. 8. Technology problem ? Beolink.org Understand your business Organizational Strategy Business Impacy Analysis Risk Assessment & Control BCM Strategies Organization BCM Strategy Process Level BCM Strategy Resource Recovery BCM Strategy Developing & Implementing BCM Response Crisis Management Business Continuity Plans Business Unit Plan Developing BCM Culture Assessing Designing and Delivery Measuring Results Exercise, Maintain & Audit Exercising BCM Controls BS 25999 compliance 8 8/23/09
  9. 9. Disaster recovery starting point Beolink.org  Recovery point objective (RPO) Maximum data you can loose  Recovery time objective (RTO) The amount of time you need to restart from the RPO 9 8/23/09
  10. 10. Basic Steps Beolink.org  ystem Duplication S   ardware H   etwork N   perating System O   pplication A  ata Replication D   ilesystem F   atabase D   pplication Data A  witch Procedure S   Activation  ollback R   ow to switch back H   ow to merge data H 10 8/23/09
  11. 11. Disaster Recovery Components Beolink.org Elements  Routing  Application  Data  Infrastructures Working mode  Off-line  Near on-line  Online 11 8/23/09
  12. 12. Standby Beolink.orgType OS Database Application Several minutesOff Line OFF OFF OFFNear ON OFF/ON OFFOnlineOnline ON ON ONLow Cost High Cost Keep your DR site Zero unreachable until switch 12 8/23/09
  13. 13. Bottlenecks Beolink.org Latency (speed of light)  Asynchronous operation  MultiStream/Parallel Bandwidth   Caching   Compression (AVG 60%)   Diffs transmission 13 8/23/09
  14. 14. Build a Disaster recovery site Beolink.org What do you need ? 14 8/23/09
  15. 15. Infrastructure Software Beolink.org Componets Software Virtualization Vmware, XEN, KVM, Cloud computing, jeos Firewall Pfsense, iptable System Cfengine, puppet Configuration Snapshoot ZFS,LVM Montoring Nagios, Zenoss, Zabbix 15 8/23/09
  16. 16. Data Layer Beolink.org Off-Line Days • Amanda, Bacula, tar : Tape incremenetal, os dump Backup Periodic Hours Replication Near on-line • rsync: compression, incremental Asynchronous Mins Replication On Line • GlousterFS, openAFS, DB Replication Synchronous Secs Replication 16 8/23/09
  17. 17. Data Layer: Amanda Beolink.orgCompression Amanda.conf: inparallel 4 # maximum dumpers that will run in parallel (max 63)  Client Fast/Best # this maximum can be increased at compile-time,  Server Fast/Best # modifying MAX_DUMPERS in server-src/driverio.hMulti Stream: netusage 600 Kbps se # maximum net bandwidth for Amanda, in KB per  Number of parrells dump …Diskbase define dumptype comp-high { global  Holding disk comment "very important partitions on fast machines"  changer file base compress client best kencrypt noBandwith priority high }  Global  Per interface define interface lo0 { comment "a local disk"Encryption } use 1000 kbps  SSH  KRB5 DiskList: Dbserver /dumps comp-user-tar … 17 8/23/09
  18. 18. Data Layer: rsync Beolink.orgDiffs Only actual changed pieces of files aretransferred, rather than the whole file. Compression The tiny pieces of diffs are then compressed onthe fly, further saving you file transfer time andreducing the load on the network. rsync --verbose --progress --stats Encryption --compress The stream from rsync is passed through the --rsh=/usr/local/bin/sshssh protocol to encrypt your session. --recursive --times Latancy --perms Pipelining of file transfers to minimize latency --links costs --delete --exclude "*bak" --exclude "*~” Simple Usage /www/* webserver:path_name 18 8/23/09
  19. 19. Data Layer: GlusterFS Beolink.org Features   Automatic Replication   Aggregation   Scalable Striping   Distributed Locking   Performance Modules   Pluggable I/O Scheduler   Pluggable Transport   Pluggable Auth Administration   NFS-like Backend   Self-healing   Staking Vol/modules And much more .. 19 8/23/09
  20. 20. Data Layer: GlusterFS Beolink.orgReplication volume posix volume remote1 type protocol/client   Client side type storage/posix option transport-type tcp   Server side option directory /data/export option remote-host end-volume storage1.example.com option remote-subvolume brickRaid volume locks end-volume  Raid 1 type features/locks volume remote2 subvolumes posix  Stripe … end-volume end-volume  Raid 10 Volume brick volume replicate Performance type performance/io-threads type cluster/replicate subvolumes remote1 remote2  I/O modules option thread-count 8   Cache modules subvolumes locks end-volume end-volume volume writebehind type performance/write-behind volume server option window-size 1MB type protocol/server subvolumes replicate option transport-type tcp end-volume option auth.addr.brick.allow * subvolumes brick volume cache type performance/io-cache end-volume option cache-size 512MB subvolumes writebehind end-volume 20 8/23/09
  21. 21. Data Layer: Database Beolink.org Builtin  Mysql Master – Slave  Mysql Multi-Master External  Tungsten Replicator  Slony-I (postgresql) 21 8/23/09
  22. 22. Application Layer Beolink.org Data replication  Cluster JDBC  Group messaging (spread, jgroup, ..)  Custom protocol Distributed Application Design  Data must be Unique  Primary key unique for all sites (hash, mixing key ..) 22 8/23/09
  23. 23. Application Layer: spread Beolink.org Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support.  Reliable and scalable messaging and group communication  Easy to use, deploy and maintain.  Highly scalable from one local area network to complex wide area networks.  Supports thousands of groups with different sets of members.  Enables message reliability in the presence of machine failures, process crashes and recoveries, and network partitions and merges.  Provides a range of reliability, ordering and stability guarantees for messages.  .Completely distributed algorithms with no central point of failure. 23 8/23/09
  24. 24. Application Layer: Example Beolink.orgSession   Distributed file system   Replicated Database   AppServer Cluster 24 8/23/09
  25. 25. Routing Layer Beolink.org DNS: Bind Several minutes  keep low value of TTL  dynamic update MPLS  Change MPLS configuration, keep the same ip configuration Routing Table : quagga  OSPF/BGP announce new routing for your side Seconds 25 8/23/09
  26. 26. Routing Layer: Quagga Beolink.org Quagga  CISCO IOS syntax  All routing protocol  Device base interface Quagga.conf: interface eth0 ip address XXX.XXX.XXX.XXX !! Work only for ! … Ospf.conf  Autonomous System  Large network router ospf ospf router-id XXX.XXX.XXX.XX log-adjacency-changes compatible rfc1583 auto-cost reference-bandwidth 10000 network XXX.XXX.XXX.XXX/XX area … 26 8/23/09
  27. 27. Routing Layer: Heart Beolink.orgMonitoring System (outside) Probe/check external Remote Invocation scripts Manual/AutomaticLinux HA (inside) Cluster definition with Quorum Server Local Scripts / Fencing script Manual/Automatic 27 8/23/09
  28. 28. Important Things Beolink.org Business How the service Which order to critical is linked to do things Application others Keep close When to run When to run disaster them (2) them (1) recovery site Where have you put employees ? 28 8/23/09
  29. 29. Case Studies Beolink.org Case Studies 29 8/23/09
  30. 30. Case study Beolink.org Company type: Heavy industry Location: Single production site Web Farm/IT: Inside the production site APPS: Order management, production control RPO/RTO: 2h DR site: distance 400 km Cost: a Lot Disaster Event: Flooding of the production site GOOD Solution ? 30 8/23/09
  31. 31. Case study Beolink.orgWhat happened ?Perfect plan, everything OKBut…Production restart after 6 months …WRONG SOLUTION  Disaster not related to Business  Misunderstanding between BCP and DRPIT Manager was Fired 31 8/23/09
  32. 32. Base Solution: small-middle companies Beolink.org Switch by hand   Configuration   Restore   Starting point RTO/RPO Hours   Linear to storage dimension Synchronization   Backup No more licenses needed 32 8/23/09
  33. 33. Medium/High Solution: Enterprise Beolink.org Automatic Switch   IP migration (one IP per service)   Change Firewall role RTO/RPO seconds Synchronization   Multi-master FS   Database Replication   Configuration Management Double licenses needed 33 8/23/09
  34. 34. Conclusion Beolink.org “Everything Only Business Right RTO/right now” is Keep it Simple Application RPO hard Are you sure Manual the data is on Latency Bandwidth rollback the filesystem ? Donʼt forget Periodic Donʼt forget How Far Away the power of Testing your time is Far Enough? rsync 34 8/23/09
  35. 35. I look forward to meeting you… Beolink.org XVI European AFS meeting 2009 Rome: September 28-30 Who should attend:   Everyone interested in deploying a globally accessible file system   Everyone interested in learning more about real world usage of Kerberos authentication in single realm and federated single sign-on environments   Everyone who wants to share their knowledge and experience with other members of the AFS and Kerberos communities   Everyone who wants to find out the latest developments affecting AFS and Kerberos More Info: www.openafs.it 35 8/23/09
  36. 36. Thank youmanfred@freemails.chwww.beolink.org Beolink.org

×