Successfully reported this slideshow.
Parmigiano, a Monastery,    Love and Faith     Technical lessons on how to do backup and disaster recovery in the cloud   ...
"The mind is not a vessel to be filled,but a fire to be ignited."- Plutarch                           2
AgendaI. Prologue           The story of Monte CassinoII. Lessons           BackupIII. Customer Story   Shaw MediaIV. Eart...
Part IPrologue
Abbey of     Monte    Cassino5
[   Why is Monte Cassino important?   ]                   6
[   The Treasure of Monte Cassino   ]                  7
800 papal documents20,500 volumes in the Old Library                Titian, one of the60,000 in the New Library           ...
High           Backup        Disaster    availability      storage       recovery[         Business continuity continuum  ...
High Availability :Keeping services alive.[       Business continuity continuum   ]                          10
High Availability :Keeping services alive.Backing up :Process of copying and archiving of data so it may be used torestore...
High Availability :Keeping services alive.Backing up :Process of copying and archiving of data so it may be used torestore...
Monastery :Brilliant, scalable, low-cost, highly durable backup systemOrigin of Universities (Charlemagne, 814 C.E.)      ...
Monastery :                                              Barbarians,Brilliant, scalable, low-cost, highly durable backup s...
[   Why is Monte Cassino important?   ]                   15
[   World War II   ]         16
Dec 1942: Many “treasures”                       are transported from Rome                        and other places to Mont...
Intercepted German message:          “Ist der Abt noch im Kloster?”                       “Ja.”      It means      “Milita...
[   Abbey of Monte Cassino   ]              19
Feb 1944: Schlegel and Becker     (Panzer-Division Hermann      Göring) had the treasures      transferred to the Vatican[...
[   Escape from Monte Cassino   ]                21
Lt. Col. Julius                                   Schlegel                            (an Austrian Roman                  ...
“Biggest bombing against a single target of all time”
[   Monte Cassino after bombing (1944)   ]                    24
[   Restoration in 1954   ]             25
[   The Abbey of Monte Cassino today   ]                   26
End of Prologue
Part IILessons from Monte Cassino
1. My backup should be accessible      a.k.a. the pain of        physical data           transfer                         ...
1. My backup should be accessible                         AWS Direct Connect   API       AWS Storage GatewayCustomer owns ...
GW-stored volumes[   AWS Storage Gateway            ]
z
GW-Cached volumes           GW-stored volumes                                     “Cold”                               “Co...
Public / AWS Direct Connect                                           VPN                       AWS Import/Export         ...
2. My backup should be able to scale                     35
2. My backup should be able to scale• “Infinite” scale with Amazon S3 and Amazon Glacier• Scale to multiple regions• Seaml...
Regions (8)   GovCloud Regions (1)[   Global AWS Infrastructure                          ]                   37           ...
Availability Zones (23)[   Global AWS Infrastructure                         ]                38                      (as ...
Seattle South Bend    New York (2) London       Amsterdam (2)                        Newark         Dublin                ...
3. My backup should be safe                    40
3. My backup should be safe• SSL Endpoints (Amazon S3 and Amazon Glacier)• Signed API calls• Store encrypted files• Server...
3. My backup should be safe                    42
4. My backup should work with a DR policy  (I don’t want to wait 10 years… )                    43
4. My backup should work with a DR policy• Easy to integrate within AWS or Hybrid• AWS Storage Gateway: Run services on Am...
5. Someone should care about it• Clear ownership• Permissions with IAM:   Users, groups -> roles• Logs• AWS support[      ...
1. My backup should be accessible    2. My backup should be able to scale    3. My backup should be safe    4. My backup s...
Part IIIA customer story
Augusto RosaManager, Server Operations - Shaw Media    augusto.rosa @ shawmedia.ca                  49
[   Shaw Media   ]        50
[                      Who we are                              ]• Shaw Media: Division of Shaw Communications Inc.• It rea...
[                      Before AWS                            ]• Data centers in Winnipeg and Toronto• Challenge to manage,...
[   Mission Impossible?   ]             53
[                         Mission    ]• Implement a new CMS• Empower the editorial team• Business objectives•Time frame of...
AWS
Amazon EC2                      Amazon SQSAmazon EMR                      Amazon SNSAuto Scaling                    Amazon...
[                         Phase One                             ]•   Fast deployment of servers, network rules, load balan...
[                       Phase Two                            ]• Full migration of 6 other websites and web services• From ...
[                Benefits of Using AWS                      ]•   Increased uptime from 98.8% to 99.99%•   Scale to success...
[   AWS Architecture   ]           60
[                     Some Numbers    ]• 50+ EC2 instances (various sizes)• 25+ TB traffic/month• 40M+ Route53 queries• 10...
[                   Lessons Learned                             ]• Architecting for AWS in mind from start• Use all Availa...
[                  Disaster Recovery                     ]• Learn from outages all the time• Implement changes to prevent ...
[                        Backup                             ]• Daily snapshots of all volumes automatically• VIP volumes: ...
[                           Future                            ]•   Move from public cloud to VPC•   Auto Scaling on Amazon...
Augusto RosaManager, Server Operations - Shaw Media    augusto.rosa @ shawmedia.ca                  66
Part IVThe 2012 Emilia Earthquake
[   May 20th, 2012: Earthquake in Italy   ]                     68
69
[   Parmigiano warehouse (0.5B € damage)   ]                     70
[   “Let’s do something NOW”   ]               71
[   Buy 1 Kg of Parmigiano for 1 Euro   ]                    72
[   Everybody helped   ]           73
Part VLessons from an Earthquake
1. You NEED a DR in place!    2. Testing your DR    3. Reducing costs    4. You can have different DR solutions[       Les...
1. You NEED a DR in place!                    76
DR with High Availability
App DR with Standby
Business Impact Analysis (RTO, RPO)                 79
Business Impact Analysis (RTO, RPO)• RTO (Recovery Time Objective):1)   Time for trying to fix the problem2)   The recover...
Different Types of DR Architecture1) Backup and Restore2) “Pilot light” for quick recovery into AWS (Cold standby)3) Warm ...
Cost                             Performance   Durability              ($/GB/month)Amazon S3         0.125         ***    ...
2. Testing your DR                     83
2. Testing your DR• Dev/test in the cloud is super easy• Spin up capacity only for the test• Regularly test your DR• Cost ...
s3cmd           ls          --recursives3://datasets.elasticmapreduce/ngrams/books/     |      awk    {print     $4;sub(/s...
s3cmd           ls          --recursives3://datasets.elasticmapreduce/ngrams/books/     |      awk    {print     $4;sub(/s...
s3cmd           ls          --recursives3://datasets.elasticmapreduce/ngrams/books/     |      awk    {print     $4;sub(/s...
s3cmd           ls          --recursives3://datasets.elasticmapreduce/ngrams/books/     |      awk    {print     $4;sub(/s...
s3cmd           ls          --recursives3://datasets.elasticmapreduce/ngrams/books/     |      awk    {print     $4;sub(/s...
s3cmd           ls          --recursives3://datasets.elasticmapreduce/ngrams/books/     |      awk    {print     $4;sub(/s...
3. Reducing costs                    91
3. Reducing costs1) AWS cost reduction (e.g., S3 cost reduction on Nov 28th)2) Reduced redundancy (Amazon S3)3) Retention ...
Standard     ReducedAmazon S3               $/GB/Month   $/GB/Month   0-1 TB         0.125       0.093  1-50 TB         0....
4. You can have different DR solutions                     94
4. You can have different DR solutions• Easy to integrate existing vendors with DR on AWS• Approach: One vendor/hybrid/mul...
1. You NEED a DR in place!    2. Testing your DR    3. Reducing costs    4. You can have different DR solutions[       Les...
Part VIConclusions
Action items          Backups      Disaster Recovery            CostAgility               Control           savings       ...
Parmigiano, a Monastery,     Love and Faith      Technical lessons on how to do Backup and Disaster Recovery in the Cloud ...
Parmigiano, a Monastery, Love and Faith: Technical lessons on how to do Backup and Disaster Recovery in the Cloud
Parmigiano, a Monastery, Love and Faith: Technical lessons on how to do Backup and Disaster Recovery in the Cloud
Upcoming SlideShare
Loading in …5
×

Parmigiano, a Monastery, Love and Faith: Technical lessons on how to do Backup and Disaster Recovery in the Cloud

11,299 views

Published on

Parmigiano, a Monastery, Love and Faith: Technical lessons on how to do Backup and Disaster Recovery in the Cloud.

Published in: Technology

Parmigiano, a Monastery, Love and Faith: Technical lessons on how to do Backup and Disaster Recovery in the Cloud

  1. 1. Parmigiano, a Monastery, Love and Faith Technical lessons on how to do backup and disaster recovery in the cloud Simone BrunozziSenior Technology Evangelist, Amazon Web Services @simon
  2. 2. "The mind is not a vessel to be filled,but a fire to be ignited."- Plutarch 2
  3. 3. AgendaI. Prologue The story of Monte CassinoII. Lessons BackupIII. Customer Story Shaw MediaIV. Earthquake What happened to my Parmigiano?V. Lessons Disaster RecoveryVI. Conclusions ... And a little surprise!
  4. 4. Part IPrologue
  5. 5. Abbey of Monte Cassino5
  6. 6. [ Why is Monte Cassino important? ] 6
  7. 7. [ The Treasure of Monte Cassino ] 7
  8. 8. 800 papal documents20,500 volumes in the Old Library Titian, one of the60,000 in the New Library most influential painters ever200 manuscripts on parchment100,000 prints and paintings (including 11 Titians)500 incunabula Gutenberg’s Bible was printed in 1455 C.E. A book printed before 1501 C.E.[ The Treasure of Monte Cassino ] 8 x
  9. 9. High Backup Disaster availability storage recovery[ Business continuity continuum ] 9
  10. 10. High Availability :Keeping services alive.[ Business continuity continuum ] 10
  11. 11. High Availability :Keeping services alive.Backing up :Process of copying and archiving of data so it may be used torestore the original after a data loss event[ Business continuity continuum ] 11
  12. 12. High Availability :Keeping services alive.Backing up :Process of copying and archiving of data so it may be used torestore the original after a data loss event.Disaster recovery :Recovery of technology infrastructure critical to anorganization after a natural or human-induced disaster.[ Business continuity continuum ] 12
  13. 13. Monastery :Brilliant, scalable, low-cost, highly durable backup systemOrigin of Universities (Charlemagne, 814 C.E.) The Empire Edict: Free needs educated education in people cathedrals and Let’s ask the monasteries Church! Lots of books (and backups)[ Origin of Backup ] 13 x
  14. 14. Monastery : Barbarians,Brilliant, scalable, low-cost, highly durable backup system. pestilences, fires,Origin of Universities (Charlemagne, 814 a.C.) invasions, wars, famines, revolts, etc.Indoctrination :One of the first critical function within an organization(Catholic Church) that needed continuation after any natural orhuman-induced disaster.It needed backup of books (Bibles, etc.) in order to function.[ Origin ] 14
  15. 15. [ Why is Monte Cassino important? ] 15
  16. 16. [ World War II ] 16
  17. 17. Dec 1942: Many “treasures” are transported from Rome and other places to Monte Cassino, for safety[ The Treasure of Monte Cassino ] 17
  18. 18. Intercepted German message: “Ist der Abt noch im Kloster?” “Ja.” It means “Military Division” It also means (abbreviated) “Abbot” (abbreviated)[ Lost in translation ] 18
  19. 19. [ Abbey of Monte Cassino ] 19
  20. 20. Feb 1944: Schlegel and Becker (Panzer-Division Hermann Göring) had the treasures transferred to the Vatican[ The Treasure of Monte Cassino ] 20 x
  21. 21. [ Escape from Monte Cassino ] 21
  22. 22. Lt. Col. Julius Schlegel (an Austrian Roman Catholic) Capt. Maximilian Becker (a Protestant surgeon)[ Escape from Monte Cassino ] 22
  23. 23. “Biggest bombing against a single target of all time”
  24. 24. [ Monte Cassino after bombing (1944) ] 24
  25. 25. [ Restoration in 1954 ] 25
  26. 26. [ The Abbey of Monte Cassino today ] 26
  27. 27. End of Prologue
  28. 28. Part IILessons from Monte Cassino
  29. 29. 1. My backup should be accessible a.k.a. the pain of physical data transfer 29
  30. 30. 1. My backup should be accessible AWS Direct Connect API AWS Storage GatewayCustomer owns the data Redundancy AWS AWS Import/Export
  31. 31. GW-stored volumes[ AWS Storage Gateway ]
  32. 32. z
  33. 33. GW-Cached volumes GW-stored volumes “Cold” “Cool” storage w
  34. 34. Public / AWS Direct Connect VPN AWS Import/Export z
  35. 35. 2. My backup should be able to scale 35
  36. 36. 2. My backup should be able to scale• “Infinite” scale with Amazon S3 and Amazon Glacier• Scale to multiple regions• Seamless• No need to provision• Cost tiers (cheaper at scale)[ Lessons from Monte Cassino ] 36
  37. 37. Regions (8) GovCloud Regions (1)[ Global AWS Infrastructure ] 37 (as of Nov 27th, 2012)
  38. 38. Availability Zones (23)[ Global AWS Infrastructure ] 38 (as of Nov 27th, 2012)
  39. 39. Seattle South Bend New York (2) London Amsterdam (2) Newark Dublin Stockholm Palo Alto TokyoSan Jose Paris Frankfurt (2) Ashburn (2) Milan Los Angeles (2) Jacksonville Madrid Osaka Dallas (2) Hong Kong St.Louis Miami Singapore (2) Sydney São Paulo Edge Locations (38) [ Global AWS Infrastructure ] 39 (as of Nov 27th, 2012)
  40. 40. 3. My backup should be safe 40
  41. 41. 3. My backup should be safe• SSL Endpoints (Amazon S3 and Amazon Glacier)• Signed API calls• Store encrypted files• Server-side encryption• Durability: multiple copies across different data centers• Local/cloud with AWS Storage Gateway[ Lessons from Montecassino ] 41
  42. 42. 3. My backup should be safe 42
  43. 43. 4. My backup should work with a DR policy (I don’t want to wait 10 years… ) 43
  44. 44. 4. My backup should work with a DR policy• Easy to integrate within AWS or Hybrid• AWS Storage Gateway: Run services on Amazon EC2 (DR)• Clear costs• Reduced costs• I decide redundancy/availability in relation to costs[ Lessons from Monte Cassino ] 44
  45. 45. 5. Someone should care about it• Clear ownership• Permissions with IAM: Users, groups -> roles• Logs• AWS support[ Lessons from Monte Cassino ] 46
  46. 46. 1. My backup should be accessible 2. My backup should be able to scale 3. My backup should be safe 4. My backup should work with a DR policy 5. Someone should care about it[ Lessons from Monte Cassino ] 47
  47. 47. Part IIIA customer story
  48. 48. Augusto RosaManager, Server Operations - Shaw Media augusto.rosa @ shawmedia.ca 49
  49. 49. [ Shaw Media ] 50
  50. 50. [ Who we are ]• Shaw Media: Division of Shaw Communications Inc.• It reaches almost 100% of Canadians; 18 specialty channels• Global national newscast: 1+ million viewers every weekday• Access to full episodes: 20 websites, 4 video-on-demand• It engages with 25+ million Canadians per week 51
  51. 51. [ Before AWS ]• Data centers in Winnipeg and Toronto• Challenge to manage, frequent power outages, downtime• Expensive hosting fees inherited from parent company• Technology was old and in disarray (total revamp needed) 52
  52. 52. [ Mission Impossible? ] 53
  53. 53. [ Mission ]• Implement a new CMS• Empower the editorial team• Business objectives•Time frame of 9 months• Be agile and cost effective 54
  54. 54. AWS
  55. 55. Amazon EC2 Amazon SQSAmazon EMR Amazon SNSAuto Scaling Amazon SESElastic Load Balancing AWS MarketplaceAmazon CloudFront Amazon FPSAmazon RDS Amazon DevPayAmazon DynamoDB Amazon Mechanical TurkAmazon SimpleDBAmazon ElastiCache Amazon Route 53 Amazon VPCAmazon IAM Amazon Direct ConnectAmazon CloudWatchAmazon Elastic Beanstalk Amazon S3Amazon CloudFormation Amazon Glacier Amazon EBSAmazon CloudSearch AWS Import/ExportAmazon SWF AWS Storage GatewayAlexa WIS and Alexa Top Sites AWS Support
  56. 56. [ Phase One ]• Fast deployment of servers, network rules, load balancers• First site under new CMS: Live in 4 weeks from scratch• Full migration of 29 sites from a physical DC in 9 months 57
  57. 57. [ Phase Two ]• Full migration of 6 other websites and web services• From 2nd physical DC into AWS in 2 months• Migration: Windows ‘03/SQL ‘05 -> Windows ‘08/SQL ’08• Creating new web farms takes 1 to 5 days (versus months)• Takes longer to procure licenses than the infrastructure• Ability to scale and automate 58
  58. 58. [ Benefits of Using AWS ]• Increased uptime from 98.8% to 99.99%• Scale to success, quicker response to business needs• 1+ M$ saved in capital and operational cost• No physical investment, smaller teams• Allowed using service management 3rd party companies• Easy backup on AWS -> 3 years retention (tax credits) 59
  59. 59. [ AWS Architecture ] 60
  60. 60. [ Some Numbers ]• 50+ EC2 instances (various sizes)• 25+ TB traffic/month• 40M+ Route53 queries• 10+ TB backup on Amazon S3... And growing! 61
  61. 61. [ Lessons Learned ]• Architecting for AWS in mind from start• Use all Availability Zones in area you choose to host; divideacross all• Plan for failures: Be crazy about it (things fail)• Backup backup backup• Monthly AMI• Windows/SQL Server workarounds (failover cluster, AD, etc.)• Engage with AWS Solutions Architects early 62
  62. 62. [ Disaster Recovery ]• Learn from outages all the time• Implement changes to prevent failures at cloud level• Document how you recover from failures• Single component may fail; architecture shouldn’t 63
  63. 63. [ Backup ]• Daily snapshots of all volumes automatically• VIP volumes: snapshots every 4 hours• Keep the last 10 snapshots• Dell Replay: It backs up file system files every 1 hour• Volumes replicated to Amazon S3 (Oregon) every 2 hours• SQL Server backup every 30 minutes• SQL Server backup volumes moved to Amazon S3 every 2hours 64
  64. 64. [ Future ]• Move from public cloud to VPC• Auto Scaling on Amazon EC2• Amazon S3 as image repository for all sites• Second cloud vendor as DR (instead of in-house)• Amazon ElastiCache for central caching for ASP.net apps 65
  65. 65. Augusto RosaManager, Server Operations - Shaw Media augusto.rosa @ shawmedia.ca 66
  66. 66. Part IVThe 2012 Emilia Earthquake
  67. 67. [ May 20th, 2012: Earthquake in Italy ] 68
  68. 68. 69
  69. 69. [ Parmigiano warehouse (0.5B € damage) ] 70
  70. 70. [ “Let’s do something NOW” ] 71
  71. 71. [ Buy 1 Kg of Parmigiano for 1 Euro ] 72
  72. 72. [ Everybody helped ] 73
  73. 73. Part VLessons from an Earthquake
  74. 74. 1. You NEED a DR in place! 2. Testing your DR 3. Reducing costs 4. You can have different DR solutions[ Lessons from an Earthquake ] 75
  75. 75. 1. You NEED a DR in place! 76
  76. 76. DR with High Availability
  77. 77. App DR with Standby
  78. 78. Business Impact Analysis (RTO, RPO) 79
  79. 79. Business Impact Analysis (RTO, RPO)• RTO (Recovery Time Objective):1) Time for trying to fix the problem2) The recovery itself3) Testing4) Tell users• RPO (Recovery Point Objective): how much data I can lose[ Lessons from an Earthquake ] 80
  80. 80. Different Types of DR Architecture1) Backup and Restore2) “Pilot light” for quick recovery into AWS (Cold standby)3) Warm standby solution on AWS4) Multi-site hybrid solution (AWS + on premises)[ Lessons from an Earthquake ] 81
  81. 81. Cost Performance Durability ($/GB/month)Amazon S3 0.125 *** ***** Amazon Glacier 0.01 * *****AWS Storage Gateway 0.125 (+ 125/GW) **** ***Amazon EBS 0.10 **** ***Amazon EBS (PIOPS) 0.125 ***** ***
  82. 82. 2. Testing your DR 83
  83. 83. 2. Testing your DR• Dev/test in the cloud is super easy• Spin up capacity only for the test• Regularly test your DR• Cost is minimal• What about data transfer speed?[ Lessons from an Earthquake ] 84
  84. 84. s3cmd ls --recursives3://datasets.elasticmapreduce/ngrams/books/ | awk {print $4;sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4} | parallel -j0 -N2 --progress /usr/bin/s3cmd --no-progress get {1} {2} 85 Special thanks to Craig Carl, AWS Solutions Architect
  85. 85. s3cmd ls --recursives3://datasets.elasticmapreduce/ngrams/books/ | awk {print $4;sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4} | parallel -j0 -N2 --progress /usr/bin/s3cmd --no-progress get {1} {2} Lists every object in the bucket 86
  86. 86. s3cmd ls --recursives3://datasets.elasticmapreduce/ngrams/books/ | awk {print $4;sub(/s3://datasets.elasticmapreduce/,"/array", $4); print $4} | parallel -j0 -N2 --progress /usr/bin/s3cmd --no-progress get {1} {2} Gets the path to the Amazon S3 object and the local destination path 87
  87. 87. s3cmd ls --recursives3://datasets.elasticmapreduce/ngrams/books/ | awk {print $4;sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4} | parallel -j0 -N2 --progress /usr/bin/s3cmd --no-progress get {1} {2} Runs parallel with as many threads as possible, -N2 tells parallel there were two arguments on stdin and assigns them to {1} and {2} 88
  88. 88. s3cmd ls --recursives3://datasets.elasticmapreduce/ngrams/books/ | awk {print $4;sub(/s3://datasets.elasticmapreduce/,"/array", $4); print $4} | parallel -j0 -N2 --progress /usr/bin/s3cmd --no-progress get {1} {2} It’s the command that GNU Parallel will run, {1} is substituted with the Amazon S3 object path, {2} is substituted with the local destination path 89
  89. 89. s3cmd ls --recursives3://datasets.elasticmapreduce/ngrams/books/ | awk {print $4;sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4} | parallel -j0 -N2 --progress /usr/bin/s3cmd --no-progress get {1} {2} Copying 2.4 TB down from 48 hours to 9 hours (5x faster) 90
  90. 90. 3. Reducing costs 91
  91. 91. 3. Reducing costs1) AWS cost reduction (e.g., S3 cost reduction on Nov 28th)2) Reduced redundancy (Amazon S3)3) Retention policy4) Hot/warm/cool/cold backup5) Reserved capacity/tiers[ Lessons from an Earthquake ] 92
  92. 92. Standard ReducedAmazon S3 $/GB/Month $/GB/Month 0-1 TB 0.125 0.093 1-50 TB 0.110 0.083 50-500 TB 0.95 0.073500-1,000 TB 0.90 0.063 1-5 PB 0.80 0.053 5+ PB 0.55 0.037
  93. 93. 4. You can have different DR solutions 94
  94. 94. 4. You can have different DR solutions• Easy to integrate existing vendors with DR on AWS• Approach: One vendor/hybrid/multiple vendors• One region/multi-regions (if you need geodiversity)[ Lessons from an Earthquake ] 95
  95. 95. 1. You NEED a DR in place! 2. Testing your DR 3. Reducing costs 4. You can have different DR solutions[ Lessons from an Earthquake ] 96
  96. 96. Part VIConclusions
  97. 97. Action items Backups Disaster Recovery CostAgility Control savings x
  98. 98. Parmigiano, a Monastery, Love and Faith Technical lessons on how to do Backup and Disaster Recovery in the Cloud Simone BrunozziSenior Technology Evangelist, Amazon Web Services @simon

×