SlideShare a Scribd company logo
1 of 102
Download to read offline
"The mind is not a vessel to be filled,
but a fire to be ignited."

- Plutarch




                           2
Agenda
I. Prologue           The story of Monte Cassino
II. Lessons           Backup
III. Customer Story   Shaw Media
IV. Earthquake        What happened to my Parmigiano?
V. Lessons            Disaster Recovery
VI. Conclusions       ... And a little surprise!
Part I
Prologue
5
Abbey of
     Monte
    Cassino




6
[   Why is Monte Cassino important?   ]
                   7
[   The Treasure of Monte Cassino   ]
                  8
800 papal documents
20,500 volumes in the Old Library                Titian, one of the
60,000 in the New Library                        most influential
                                                   painters ever
200 manuscripts on parchment
100,000 prints and paintings (including 11 Titians)
500 incunabula
                                       Gutenberg’s Bible
                                       was printed in 1455
                                              C.E.
                A book printed
                before 1501 C.E.




[        The Treasure of Monte Cassino                                ]
                                   9
                                                                          x
High           Backup        Disaster
    availability      storage       recovery




[         Business continuity continuum        ]
                        10
High Availability :
Keeping services alive




[       Business continuity continuum   ]
                         11
High Availability :
Keeping services alive

Backing up :
Process of copying and archiving of data so it may be used to
restore the original after a data loss event




[       Business continuity continuum                      ]
                             12
High Availability :
Keeping services alive.

Backing up :
Process of copying and archiving of data so it may be used to
restore the original after a data loss event.

Disaster recovery :
Recovery of technology infrastructure critical to an
organization after a natural or human-induced disaster.



[       Business continuity continuum                      ]
                              13
Monastery :
Brilliant, scalable, low-cost, highly durable backup system
Origin of Universities (Charlemagne, 814 C.E.)

              The Empire                             Edict: Free
             needs educated                         education in
                 people                            cathedrals and
                              Let’s ask the         monasteries
                                Church!

                                               Lots of books
                                              (and backups)



[                  Origin of Backup                                 ]
                                14
                                                                        x
Monastery :
                                               Barbarians,
Brilliant, scalable, low-cost, highly durable backup system.
                                            pestilences, fires,
Origin of Universities (Charlemagne, 814 a.C.)
                                             invasions, wars,
                                           famines, revolts, etc.
Indoctrination :
One of the first critical function within an organization
(Catholic Church) that needed continuation after any natural or
human-induced disaster.
It needed backup of books (Bibles, etc.) in order to function.




[                            Origin                                 ]
                                  15
[   Why is Monte Cassino important?   ]
                   16
[   World War II   ]
         17
Dec 1942: Many “treasures”
                       are transported from Rome
                        and other places to Monte
                            Cassino, for safety




[   The Treasure of Monte Cassino                   ]
                  18
Intercepted German message:
          “Ist der Abt noch im Kloster?”
                       “Ja.”
      It means
      “Military
      Division”         It also means
    (abbreviated)          “Abbot”
                        (abbreviated)




[                   Lost in translation    ]
                              19
[   Abbey of Monte Cassino   ]
              20
Feb 1944: Schlegel and Becker
     (Panzer-Division Hermann
      Göring) had the treasures
      transferred to the Vatican




[    The Treasure of Monte Cassino       ]
                                    21
                                             x
[   Escape from Monte Cassino   ]
                22
Lt. Col. Julius
                                   Schlegel
                            (an Austrian Roman
                                  Catholic)
    Capt. Maximilian
         Becker
      (a Protestant
        surgeon)




[      Escape from Monte Cassino                  ]
                       23
“Biggest bombing against a
 single target of all time”
[   Monte Cassino after bombing (1944)   ]
                    25
[   Restoration in 1954   ]
             26
[   The Abbey of Monte Cassino today   ]
                   27
End of Prologue
Part II
Lessons from Monte Cassino
1. My backup should be accessible




      a.k.a. the pain of
        physical data
           transfer




                           30
1. My backup should be accessible

                         AWS Direct Connect   API

       AWS Storage Gateway


Customer owns the data


  Redundancy
                                    AWS
  AWS Import/Export
GW-stored volumes




[   AWS Storage Gateway            ]
z
GW-Cached volumes

           GW-stored volumes



                                     “Cold”

                               “Cool” storage




                                              w
Public / AWS Direct Connect


                                           VPN




                       AWS Import/Export

                                                 z
2. My backup should be able to scale




                     36
2. My backup should be able to scale

• “Infinite” scale with Amazon S3 and Amazon Glacier
• Scale to multiple regions
• Seamless
• No need to provision
• Cost tiers (cheaper at scale)




[         Lessons from Monte Cassino                   ]
                              37
Regions (8)   GovCloud Regions (1)




[   Global AWS Infrastructure                           ]
                   38                     (as of Nov 27th, 2012)
Availability Zones (23)




[   Global AWS Infrastructure                          ]
                39                       (as of Nov 27th, 2012)
Seattle South Bend    New York (2) London       Amsterdam (2)
                        Newark         Dublin                 Stockholm
   Palo Alto
                                                                                                    Tokyo

San Jose
                                          Paris                 Frankfurt (2)
                                    Ashburn (2)         Milan
  Los Angeles (2)               Jacksonville      Madrid                                                Osaka
           Dallas (2)                                                                           Hong Kong
              St.Louis
                        Miami                                                   Singapore (2)           Sydney



                                          São Paulo



                                                                        Edge Locations (38)




 [                          Global AWS Infrastructure                                                           ]
                                                         40                                     (as of Nov 27th, 2012)
3. My backup should be safe




                    41
3. My backup should be safe

• SSL Endpoints (Amazon S3 and Amazon Glacier)
• Signed API calls
• Store encrypted files
• Server-side encryption
• Durability: multiple copies across different data centers
• Local/cloud with AWS Storage Gateway




[          Lessons from Montecassino                          ]
                                42
3. My backup should be safe




                    43
4. My backup should work with a DR policy




                    44
4. My backup should work with a DR policy

• Easy to integrate within AWS or Hybrid
• AWS Storage Gateway: Run services on Amazon EC2 (DR)
• Clear costs
• Reduced costs
• I decide redundancy/availability in relation to costs




[         Lessons from Monte Cassino                      ]
                             45
5. Someone should care about it

• Clear ownership
• Permissions with IAM:   Users, groups -> roles
• Logs
• AWS support




[         Lessons from Monte Cassino               ]
                                47
1. My backup should be accessible

    2. My backup should be able to scale

    3. My backup should be safe

    4. My backup should work with a DR policy

    5. Someone should care about it



[       Lessons from Monte Cassino              ]
                         48
Part III
A customer story
Augusto Rosa
Manager, Server Operations - Shaw Media
     augusto.rosa @ shawmedia.ca




                  50
[   Shaw Media   ]
        51
[                      Who we are                              ]
• Shaw Media: Division of Shaw Communications Inc.
• It reaches almost 100% of Canadians; 18 specialty channels
• Global national newscast: 1+ million viewers every weekday
• Access to full episodes: 20 websites, 4 video-on-demand
• It engages with 25+ million Canadians per week




                               52
[                      Before AWS                            ]
• Data centers in Winnipeg and Toronto
• Challenge to manage, frequent power outages, downtime
• Expensive hosting fees inherited from parent company
• Technology was old and in disarray (total revamp needed)




                              53
[   Mission Impossible?   ]
             54
[                         Mission    ]
• Implement a new CMS
• Empower the editorial team
• Business objectives
•Time frame of 9 months
• Be agile and cost effective




                                55
AWS
Amazon EC2                      Amazon SQS
Amazon EMR                      Amazon SNS
Auto Scaling                    Amazon SES
Elastic Load Balancing
                                AWS Marketplace
Amazon CloudFront               Amazon FPS
Amazon RDS                      Amazon DevPay
Amazon DynamoDB                 Amazon Mechanical Turk
Amazon SimpleDB
Amazon ElastiCache              Amazon Route 53
                                Amazon VPC
Amazon IAM                      Amazon Direct Connect
Amazon CloudWatch
Amazon Elastic Beanstalk        Amazon S3
Amazon CloudFormation           Amazon Glacier
                                Amazon EBS
Amazon CloudSearch              AWS Import/Export
Amazon SWF                      AWS Storage Gateway
Alexa WIS and Alexa Top Sites   AWS Support
[                         Phase One                             ]
•   Fast deployment of servers, network rules, load balancers
•   First site under new CMS: Live in 4 weeks from scratch
•   Full migration of 29 sites from a physical DC in 9 months




                                58
[                       Phase Two                            ]
• Full migration of 6 other websites and web services
• From 2nd physical DC into AWS in 2 months
• Migration: Windows ‘03/SQL ‘05 -> Windows ‘08/SQL ’08
• Creating new web farms takes 1 to 5 days (versus months)
• Takes longer to procure licenses than the infrastructure
• Ability to scale and automate




                              59
[                Benefits of Using AWS                      ]
•   Increased uptime from 98.8% to 99.99%
•   Scale to success, quicker response to business needs
•   1+ M$ saved in capital and operational cost
•   No physical investment, smaller teams
•   Allowed using service management 3rd party companies
•   Easy backup on AWS -> 3 years retention (tax credits)




                               60
[   AWS Architecture   ]




           61
[                     Some Numbers    ]
• 50+ EC2 instances (various sizes)
• 25+ TB traffic/month
• 40M+ Route53 queries
• 10+ TB backup on Amazon S3

... And growing!




                                62
[                   Lessons Learned                             ]
• Architecting for AWS in mind from start
• Use all Availability Zones in area you choose   to host; divide
across all
• Plan for failures: Be crazy about it (things fail)
• Backup backup backup
• Monthly AMI
• Windows/SQL Server workarounds (failover cluster, AD, etc.)
• Engage with AWS Solutions Architects early




                               63
[                  Disaster Recovery                     ]
• Learn from outages all the time
• Implement changes to prevent failures at cloud level
• Document how you recover from failures
• Single component may fail; architecture shouldn’t




                               64
[                        Backup                             ]
• Daily snapshots of all volumes automatically
• VIP volumes: snapshots every 4 hours
• Keep the last 10 snapshots
• Dell Replay: It backs up file system files every 1 hour
• Volumes replicated to Amazon S3 (Oregon) every 2 hours
• SQL Server backup every 30 minutes
• SQL Server backup volumes moved to Amazon S3 every        2
hours




                             65
[                           Future                            ]
•   Move from public cloud to VPC
•   Auto Scaling on Amazon EC2
•   Amazon S3 as image repository for all sites
•   Second cloud vendor as DR (instead of in-house)
•   Amazon ElastiCache for central caching for ASP.net apps




                                66
Augusto Rosa
Manager, Server Operations - Shaw Media
     augusto.rosa @ shawmedia.ca




                  67
Part IV
The 2012 Emilia Earthquake
[   May 20th, 2012: Earthquake in Italy   ]
                     69
70
[   Parmigiano warehouse (0.5B € damage)   ]
                     71
[   “Let’s do something NOW”   ]
               72
[   Buy 1 Kg of Parmigiano for 1 Euro   ]
                    73
[   Everybody helped   ]
           74
Part V
Lessons from an Earthquake
1. You NEED a DR in place!

    2. Testing your DR

    3. Reducing costs

    4. You can have different DR solutions




[       Lessons from an Earthquake           ]
                         76
1. You NEED a DR in place!




                    77
DR with High Availability
App DR with Standby
Business Impact Analysis (RTO, RPO)




                 80
Business Impact Analysis (RTO, RPO)

• RTO (Recovery Time Objective):
1)   Time for trying to fix the problem
2)   The recovery itself
3)   Testing
4)   Tell users

• RPO (Recovery Point Objective): how much data I can lose



[           Lessons from an Earthquake                       ]
                                 81
Different Types of DR Architecture

1) Backup and Restore
2) “Pilot light” for quick recovery into AWS (Cold standby)
3) Warm standby solution on AWS
4) Multi-site hybrid solution (AWS + on premises)




[          Lessons from an Earthquake                         ]
                                82
Cost
                           Performance   Durability
              ($/GB/month)

Amazon S3        0.125        ***         *****
 Amazon
 Glacier
                  0.01          *         *****
AWS Storage
 Gateway
                   0.125
               (+ 125/GW)     ****         ***
Amazon EBS        0.10        ****         ***
Amazon EBS
  (PIOPS)
                 0.125       *****         ***
2. Testing your DR




                     84
2. Testing your DR

• Dev/test in the cloud is super easy
• Spin up capacity only for the test
• Regularly test your DR
• Cost is minimal
• What about data transfer speed?




[          Lessons from an Earthquake   ]
                                85
s3cmd           ls           --recursive
s3://datasets.elasticmapreduce/ngrams/b
ooks/     |      awk     '{print     $4;
sub(/s3://datasets.elasticmapreduce/,
"/array", $4); print $4}' | parallel -
j0 -N2 --progress /usr/bin/s3cmd --no-
progress get {1} {2}




                   86    Special thanks to Craig Carl, AWS Solutions Architect
s3cmd           ls           --recursive
s3://datasets.elasticmapreduce/ngrams/b
ooks/     |      awk     '{print     $4;
sub(/s3://datasets.elasticmapreduce/,
"/array", $4); print $4}' | parallel -
j0 -N2 --progress /usr/bin/s3cmd --no-
progress get {1} {2}

                   Lists every object
                     in the bucket




                   87
s3cmd           ls           --recursive
s3://datasets.elasticmapreduce/ngrams/b
ooks/     |      awk     '{print     $4;
sub(/s3://datasets.elasticmapreduce/,
"/array", $4); print $4}' | parallel -
j0 -N2 --progress /usr/bin/s3cmd --no-
progress get {1} {2}

               Gets the path to the Amazon
                  S3 object and the local
                     destination path




                    88
s3cmd           ls           --recursive
s3://datasets.elasticmapreduce/ngrams/b
ooks/     |      awk     '{print     $4;
sub(/s3://datasets.elasticmapreduce/,
"/array", $4); print $4}' | parallel -
j0 -N2 --progress /usr/bin/s3cmd --no-
progress get {1} {2}

                 Runs parallel with as many
                threads as possible, '-N2' tells
                   parallel there were two
                   arguments on stdin and
                  assigns them to {1} and {2}

                     89
s3cmd           ls           --recursive
s3://datasets.elasticmapreduce/ngrams/b
ooks/     |      awk     '{print     $4;
sub(/s3://datasets.elasticmapreduce/,
"/array", $4); print $4}' | parallel -
j0 -N2 --progress /usr/bin/s3cmd --no-
progress get {1} {2}
                         It’s the command that GNU
                            Parallel will run, '{1}' is
                        substituted with the Amazon
                              S3 object path, '{2}' is
                          substituted with the local
                                destination path

                   90
s3cmd           ls           --recursive
s3://datasets.elasticmapreduce/ngrams/b
ooks/     |      awk     '{print     $4;
sub(/s3://datasets.elasticmapreduce/,
"/array", $4); print $4}' | parallel -
j0 -N2 --progress /usr/bin/s3cmd --no-
progress get {1} {2}


                           Copying 2.4 TB
                        down from 48 hours
                        to 9 hours (5x faster)

                   91
3. Reducing costs




                    92
3. Reducing costs

1) AWS cost reduction (e.g., S3 cost reduction on Nov 28th)
2) Reduced redundancy (Amazon S3)
3) Retention policy
4) Hot/warm/cool/cold backup
5) Reserved capacity/tiers




[        Lessons from an Earthquake                           ]
                              93
Standard     Reduced
Amazon S3
               $/GB/Month   $/GB/Month

   0-1 TB         0.125       0.093

  1-50 TB         0.110       0.083

 50-500 TB        0.95        0.073

500-1,000 TB      0.90        0.063

   1-5 PB         0.80        0.053

   5+ PB          0.55        0.037
4. You can have different DR solutions




                     95
4. You can have different DR solutions

• Easy to integrate existing vendors with DR on AWS
• Approach: One vendor/hybrid/multiple vendors
• One region/multi-regions (if you need geodiversity)




[         Lessons from an Earthquake                    ]
                               96
1. You NEED a DR in place!

    2. Testing your DR

    3. Reducing costs

    4. You can have different DR solutions




[       Lessons from an Earthquake           ]
                         97
Part VI
Conclusions
Action items
          Backups
      Disaster Recovery


Agility   Cost savings   Control


                                   x
Parmigiano, a Monastery,
     Love and Faith
      Technical lessons on how to do
 Backup and Disaster Recovery in the Cloud



                Simone Brunozzi
Senior Technology Evangelist, Amazon Web Services
                     @simon
We are sincerely eager to
 hear your feedback on
this presentation and on
        re:Invent.

    Please fill out an
evaluation form when you
     have a chance.

More Related Content

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

STG202 Parmigiano, a Monastery, Love and Faith: Technical Lessons on how to do Backup and Disaster Recovery in the Cloud - AWS re: Invent 2012

  • 1.
  • 2. "The mind is not a vessel to be filled, but a fire to be ignited." - Plutarch 2
  • 3. Agenda I. Prologue The story of Monte Cassino II. Lessons Backup III. Customer Story Shaw Media IV. Earthquake What happened to my Parmigiano? V. Lessons Disaster Recovery VI. Conclusions ... And a little surprise!
  • 5. 5
  • 6. Abbey of Monte Cassino 6
  • 7. [ Why is Monte Cassino important? ] 7
  • 8. [ The Treasure of Monte Cassino ] 8
  • 9. 800 papal documents 20,500 volumes in the Old Library Titian, one of the 60,000 in the New Library most influential painters ever 200 manuscripts on parchment 100,000 prints and paintings (including 11 Titians) 500 incunabula Gutenberg’s Bible was printed in 1455 C.E. A book printed before 1501 C.E. [ The Treasure of Monte Cassino ] 9 x
  • 10. High Backup Disaster availability storage recovery [ Business continuity continuum ] 10
  • 11. High Availability : Keeping services alive [ Business continuity continuum ] 11
  • 12. High Availability : Keeping services alive Backing up : Process of copying and archiving of data so it may be used to restore the original after a data loss event [ Business continuity continuum ] 12
  • 13. High Availability : Keeping services alive. Backing up : Process of copying and archiving of data so it may be used to restore the original after a data loss event. Disaster recovery : Recovery of technology infrastructure critical to an organization after a natural or human-induced disaster. [ Business continuity continuum ] 13
  • 14. Monastery : Brilliant, scalable, low-cost, highly durable backup system Origin of Universities (Charlemagne, 814 C.E.) The Empire Edict: Free needs educated education in people cathedrals and Let’s ask the monasteries Church! Lots of books (and backups) [ Origin of Backup ] 14 x
  • 15. Monastery : Barbarians, Brilliant, scalable, low-cost, highly durable backup system. pestilences, fires, Origin of Universities (Charlemagne, 814 a.C.) invasions, wars, famines, revolts, etc. Indoctrination : One of the first critical function within an organization (Catholic Church) that needed continuation after any natural or human-induced disaster. It needed backup of books (Bibles, etc.) in order to function. [ Origin ] 15
  • 16. [ Why is Monte Cassino important? ] 16
  • 17. [ World War II ] 17
  • 18. Dec 1942: Many “treasures” are transported from Rome and other places to Monte Cassino, for safety [ The Treasure of Monte Cassino ] 18
  • 19. Intercepted German message: “Ist der Abt noch im Kloster?” “Ja.” It means “Military Division” It also means (abbreviated) “Abbot” (abbreviated) [ Lost in translation ] 19
  • 20. [ Abbey of Monte Cassino ] 20
  • 21. Feb 1944: Schlegel and Becker (Panzer-Division Hermann Göring) had the treasures transferred to the Vatican [ The Treasure of Monte Cassino ] 21 x
  • 22. [ Escape from Monte Cassino ] 22
  • 23. Lt. Col. Julius Schlegel (an Austrian Roman Catholic) Capt. Maximilian Becker (a Protestant surgeon) [ Escape from Monte Cassino ] 23
  • 24. “Biggest bombing against a single target of all time”
  • 25. [ Monte Cassino after bombing (1944) ] 25
  • 26. [ Restoration in 1954 ] 26
  • 27. [ The Abbey of Monte Cassino today ] 27
  • 29. Part II Lessons from Monte Cassino
  • 30. 1. My backup should be accessible a.k.a. the pain of physical data transfer 30
  • 31. 1. My backup should be accessible AWS Direct Connect API AWS Storage Gateway Customer owns the data Redundancy AWS AWS Import/Export
  • 32. GW-stored volumes [ AWS Storage Gateway ]
  • 33. z
  • 34. GW-Cached volumes GW-stored volumes “Cold” “Cool” storage w
  • 35. Public / AWS Direct Connect VPN AWS Import/Export z
  • 36. 2. My backup should be able to scale 36
  • 37. 2. My backup should be able to scale • “Infinite” scale with Amazon S3 and Amazon Glacier • Scale to multiple regions • Seamless • No need to provision • Cost tiers (cheaper at scale) [ Lessons from Monte Cassino ] 37
  • 38. Regions (8) GovCloud Regions (1) [ Global AWS Infrastructure ] 38 (as of Nov 27th, 2012)
  • 39. Availability Zones (23) [ Global AWS Infrastructure ] 39 (as of Nov 27th, 2012)
  • 40. Seattle South Bend New York (2) London Amsterdam (2) Newark Dublin Stockholm Palo Alto Tokyo San Jose Paris Frankfurt (2) Ashburn (2) Milan Los Angeles (2) Jacksonville Madrid Osaka Dallas (2) Hong Kong St.Louis Miami Singapore (2) Sydney São Paulo Edge Locations (38) [ Global AWS Infrastructure ] 40 (as of Nov 27th, 2012)
  • 41. 3. My backup should be safe 41
  • 42. 3. My backup should be safe • SSL Endpoints (Amazon S3 and Amazon Glacier) • Signed API calls • Store encrypted files • Server-side encryption • Durability: multiple copies across different data centers • Local/cloud with AWS Storage Gateway [ Lessons from Montecassino ] 42
  • 43. 3. My backup should be safe 43
  • 44. 4. My backup should work with a DR policy 44
  • 45. 4. My backup should work with a DR policy • Easy to integrate within AWS or Hybrid • AWS Storage Gateway: Run services on Amazon EC2 (DR) • Clear costs • Reduced costs • I decide redundancy/availability in relation to costs [ Lessons from Monte Cassino ] 45
  • 46.
  • 47. 5. Someone should care about it • Clear ownership • Permissions with IAM: Users, groups -> roles • Logs • AWS support [ Lessons from Monte Cassino ] 47
  • 48. 1. My backup should be accessible 2. My backup should be able to scale 3. My backup should be safe 4. My backup should work with a DR policy 5. Someone should care about it [ Lessons from Monte Cassino ] 48
  • 50. Augusto Rosa Manager, Server Operations - Shaw Media augusto.rosa @ shawmedia.ca 50
  • 51. [ Shaw Media ] 51
  • 52. [ Who we are ] • Shaw Media: Division of Shaw Communications Inc. • It reaches almost 100% of Canadians; 18 specialty channels • Global national newscast: 1+ million viewers every weekday • Access to full episodes: 20 websites, 4 video-on-demand • It engages with 25+ million Canadians per week 52
  • 53. [ Before AWS ] • Data centers in Winnipeg and Toronto • Challenge to manage, frequent power outages, downtime • Expensive hosting fees inherited from parent company • Technology was old and in disarray (total revamp needed) 53
  • 54. [ Mission Impossible? ] 54
  • 55. [ Mission ] • Implement a new CMS • Empower the editorial team • Business objectives •Time frame of 9 months • Be agile and cost effective 55
  • 56. AWS
  • 57. Amazon EC2 Amazon SQS Amazon EMR Amazon SNS Auto Scaling Amazon SES Elastic Load Balancing AWS Marketplace Amazon CloudFront Amazon FPS Amazon RDS Amazon DevPay Amazon DynamoDB Amazon Mechanical Turk Amazon SimpleDB Amazon ElastiCache Amazon Route 53 Amazon VPC Amazon IAM Amazon Direct Connect Amazon CloudWatch Amazon Elastic Beanstalk Amazon S3 Amazon CloudFormation Amazon Glacier Amazon EBS Amazon CloudSearch AWS Import/Export Amazon SWF AWS Storage Gateway Alexa WIS and Alexa Top Sites AWS Support
  • 58. [ Phase One ] • Fast deployment of servers, network rules, load balancers • First site under new CMS: Live in 4 weeks from scratch • Full migration of 29 sites from a physical DC in 9 months 58
  • 59. [ Phase Two ] • Full migration of 6 other websites and web services • From 2nd physical DC into AWS in 2 months • Migration: Windows ‘03/SQL ‘05 -> Windows ‘08/SQL ’08 • Creating new web farms takes 1 to 5 days (versus months) • Takes longer to procure licenses than the infrastructure • Ability to scale and automate 59
  • 60. [ Benefits of Using AWS ] • Increased uptime from 98.8% to 99.99% • Scale to success, quicker response to business needs • 1+ M$ saved in capital and operational cost • No physical investment, smaller teams • Allowed using service management 3rd party companies • Easy backup on AWS -> 3 years retention (tax credits) 60
  • 61. [ AWS Architecture ] 61
  • 62. [ Some Numbers ] • 50+ EC2 instances (various sizes) • 25+ TB traffic/month • 40M+ Route53 queries • 10+ TB backup on Amazon S3 ... And growing! 62
  • 63. [ Lessons Learned ] • Architecting for AWS in mind from start • Use all Availability Zones in area you choose to host; divide across all • Plan for failures: Be crazy about it (things fail) • Backup backup backup • Monthly AMI • Windows/SQL Server workarounds (failover cluster, AD, etc.) • Engage with AWS Solutions Architects early 63
  • 64. [ Disaster Recovery ] • Learn from outages all the time • Implement changes to prevent failures at cloud level • Document how you recover from failures • Single component may fail; architecture shouldn’t 64
  • 65. [ Backup ] • Daily snapshots of all volumes automatically • VIP volumes: snapshots every 4 hours • Keep the last 10 snapshots • Dell Replay: It backs up file system files every 1 hour • Volumes replicated to Amazon S3 (Oregon) every 2 hours • SQL Server backup every 30 minutes • SQL Server backup volumes moved to Amazon S3 every 2 hours 65
  • 66. [ Future ] • Move from public cloud to VPC • Auto Scaling on Amazon EC2 • Amazon S3 as image repository for all sites • Second cloud vendor as DR (instead of in-house) • Amazon ElastiCache for central caching for ASP.net apps 66
  • 67. Augusto Rosa Manager, Server Operations - Shaw Media augusto.rosa @ shawmedia.ca 67
  • 68. Part IV The 2012 Emilia Earthquake
  • 69. [ May 20th, 2012: Earthquake in Italy ] 69
  • 70. 70
  • 71. [ Parmigiano warehouse (0.5B € damage) ] 71
  • 72. [ “Let’s do something NOW” ] 72
  • 73. [ Buy 1 Kg of Parmigiano for 1 Euro ] 73
  • 74. [ Everybody helped ] 74
  • 75. Part V Lessons from an Earthquake
  • 76. 1. You NEED a DR in place! 2. Testing your DR 3. Reducing costs 4. You can have different DR solutions [ Lessons from an Earthquake ] 76
  • 77. 1. You NEED a DR in place! 77
  • 78. DR with High Availability
  • 79. App DR with Standby
  • 80. Business Impact Analysis (RTO, RPO) 80
  • 81. Business Impact Analysis (RTO, RPO) • RTO (Recovery Time Objective): 1) Time for trying to fix the problem 2) The recovery itself 3) Testing 4) Tell users • RPO (Recovery Point Objective): how much data I can lose [ Lessons from an Earthquake ] 81
  • 82. Different Types of DR Architecture 1) Backup and Restore 2) “Pilot light” for quick recovery into AWS (Cold standby) 3) Warm standby solution on AWS 4) Multi-site hybrid solution (AWS + on premises) [ Lessons from an Earthquake ] 82
  • 83. Cost Performance Durability ($/GB/month) Amazon S3 0.125 *** ***** Amazon Glacier 0.01 * ***** AWS Storage Gateway 0.125 (+ 125/GW) **** *** Amazon EBS 0.10 **** *** Amazon EBS (PIOPS) 0.125 ***** ***
  • 85. 2. Testing your DR • Dev/test in the cloud is super easy • Spin up capacity only for the test • Regularly test your DR • Cost is minimal • What about data transfer speed? [ Lessons from an Earthquake ] 85
  • 86. s3cmd ls --recursive s3://datasets.elasticmapreduce/ngrams/b ooks/ | awk '{print $4; sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4}' | parallel - j0 -N2 --progress /usr/bin/s3cmd --no- progress get {1} {2} 86 Special thanks to Craig Carl, AWS Solutions Architect
  • 87. s3cmd ls --recursive s3://datasets.elasticmapreduce/ngrams/b ooks/ | awk '{print $4; sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4}' | parallel - j0 -N2 --progress /usr/bin/s3cmd --no- progress get {1} {2} Lists every object in the bucket 87
  • 88. s3cmd ls --recursive s3://datasets.elasticmapreduce/ngrams/b ooks/ | awk '{print $4; sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4}' | parallel - j0 -N2 --progress /usr/bin/s3cmd --no- progress get {1} {2} Gets the path to the Amazon S3 object and the local destination path 88
  • 89. s3cmd ls --recursive s3://datasets.elasticmapreduce/ngrams/b ooks/ | awk '{print $4; sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4}' | parallel - j0 -N2 --progress /usr/bin/s3cmd --no- progress get {1} {2} Runs parallel with as many threads as possible, '-N2' tells parallel there were two arguments on stdin and assigns them to {1} and {2} 89
  • 90. s3cmd ls --recursive s3://datasets.elasticmapreduce/ngrams/b ooks/ | awk '{print $4; sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4}' | parallel - j0 -N2 --progress /usr/bin/s3cmd --no- progress get {1} {2} It’s the command that GNU Parallel will run, '{1}' is substituted with the Amazon S3 object path, '{2}' is substituted with the local destination path 90
  • 91. s3cmd ls --recursive s3://datasets.elasticmapreduce/ngrams/b ooks/ | awk '{print $4; sub(/s3://datasets.elasticmapreduce/, "/array", $4); print $4}' | parallel - j0 -N2 --progress /usr/bin/s3cmd --no- progress get {1} {2} Copying 2.4 TB down from 48 hours to 9 hours (5x faster) 91
  • 93. 3. Reducing costs 1) AWS cost reduction (e.g., S3 cost reduction on Nov 28th) 2) Reduced redundancy (Amazon S3) 3) Retention policy 4) Hot/warm/cool/cold backup 5) Reserved capacity/tiers [ Lessons from an Earthquake ] 93
  • 94. Standard Reduced Amazon S3 $/GB/Month $/GB/Month 0-1 TB 0.125 0.093 1-50 TB 0.110 0.083 50-500 TB 0.95 0.073 500-1,000 TB 0.90 0.063 1-5 PB 0.80 0.053 5+ PB 0.55 0.037
  • 95. 4. You can have different DR solutions 95
  • 96. 4. You can have different DR solutions • Easy to integrate existing vendors with DR on AWS • Approach: One vendor/hybrid/multiple vendors • One region/multi-regions (if you need geodiversity) [ Lessons from an Earthquake ] 96
  • 97. 1. You NEED a DR in place! 2. Testing your DR 3. Reducing costs 4. You can have different DR solutions [ Lessons from an Earthquake ] 97
  • 99.
  • 100. Action items Backups Disaster Recovery Agility Cost savings Control x
  • 101. Parmigiano, a Monastery, Love and Faith Technical lessons on how to do Backup and Disaster Recovery in the Cloud Simone Brunozzi Senior Technology Evangelist, Amazon Web Services @simon
  • 102. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.