Azug
                    Successfully breeding rabits


Yves Goeleven
Capgemini
@YvesGoeleven
http://cloudshaper.wordpress.com
Agenda
•   Introduction
•   Rabits?
•   Cloud Power
•   The Weakest Link
•   Understanding Capacity
•   Self Everything
Rabits ?
• Rapid bits,
  – small public apps like websites & phone apps
  – They want to live outside, in the wild
  – They need to get there fast
  – Once they are there, they’ll need some space to
    multiply, scale
  – And they move on quickly
Examples
• Apps
  – Personal & work related
• Branding websites
  – Product launches, Special campaigns
• Predictable big events
  – Olympics, Elections
• Unpredictable events
  – Disasters, Terrorist attacks
  – Celebrity Death
Business context
• The world has changed over the past decade
  – Consumerization of IT
  – Technology in every day life
  – Globalization
• New and large scale business opportunities
  appear
  – 2.1 Billion internet users
  – 4.6 Billion phones
  – 1.4 Billion Households with TV’s
• Elasticity needed as demand varies wildly
Key Success Factors
                           How to prevent
   Global,                   road kill?
   Short time to market,
   Performant,
   Highly scalable,
   Highly available,
   Relatively cheap,
   Easy
Good thing rabits have cloud power
Agenda
•   Introduction
•   Rabits?
•   Cloud Power
•   The Weakest Link
•   Understanding Capacity
•   Self Everything
Fast Time To Market - Services




           Easy authorization to applications
           Caching & Workflow & …
Global - Spread your rabits all over the world
Performance - Impact of global
Performance - Impact of global


    50ms
Performance - Impact of global

           100ms

    50ms
Performance - Impact of global

           100ms

    50ms
Traffic manager to the rescue
Key Success Factors


   Global,
   Short time to market,
   Performant,
   Highly scalable,
   Highly available,
   Relatively cheap,
   Easy
Agenda
•   Introduction
•   Rabits?
•   Cloud Power
•   The Weakest Link
•   Understanding Capacity
•   Self Everything
The weakest link
• Overall scalability and availability
  – Limited by the weakest component
• If the backend can only handle
  30 users
  – It doesn’t matter that the front-
    end could handle 1.000.000
The weakest link
 • Typically the weakest link is one of the
   following:
   – Integration points
   – Data stores
   – Long processes
Remember
• Everything has limits!
• Including azure resources
  – Storage account: 5000 requests/sec
  – Storage container: 500 requests/sec
  – Bandwith depending on instance size
  – Etc...
• Luckily you can get multiple of these
But what if you can’t?
 • Keep them out of the critical path
   – Cache view model data or output
   – Queue commands
Cache
• Windows Azure Appfabric Cache
  – Distributed cache
• Reduces queries
  – To less scalable components


• Store data close to the app
  – Otherwise the whole point is moot
Queued command processing
• Avoid being swarmed by incoming
  commands
  – Use a queue to throttle
• Handle commands at a controlled speed
  – that of the least scalable component
Recommended Architecture: CQRS

  Queries
                              Cache
      Queries
                                               View Model
                                                 Updater

                                     Publish
       Input             Handler


                  Web



                                      Persistence
     Validation         Validation                           Storage


                          Rules
                                                    Worker
  Commands
Recommended Architecture: EDA
             Event generators



              Event Stream


                                Time




             Event consumers
Side effects of these architectures
• Caches need to be updated regularly
  – Time based
  – Event based
• User interface must be adapted
  – Task orientation required
  – ISO 9241-151 requires this anyway
What if things break?
• Make sure you have a backup instance!
• Fabric controller
  – At least 2 instances in seperate fault domains
• Traffic manager
  – Spread over multiple datacenters
• Azure storage
  – Automatically replicated across datacenters
• SQL Azure
  – Replicate using data sync
Multiple instances
• Don’t rely on machine dependencies
  – Avoid reliance on memory (except as cache)
  – Session state is evil
  – WCF default wsdl addressing behavior
  – Ensure encryption algorithms use service
    certificates
  – ...
Technology can help
Key Success Factors


   Global,
      Windows Azure Tech
   •Short time to market,
      – Queue
   Performant,    storage
   HighlyAppfabric Service Bus
      – scalable,
   •Highly available, Tech
      Framework
      – NServiceBus
   Relatively cheap,
   Easy
      – SignalR
Agenda
•   Introduction
•   Rabits?
•   Cloud Power
•   The Weakest Link
•   Understanding Capacity
•   Self Everything
Keeping it cheap
  • Understanding capacity
         – Pay for what you can ‘potentially’ use, aka the capacity
         – Instances are baskets of capacity : CPU, Memory, …
         – Ensure everything is efficiently used before scaling out

Compute                                 Instance
                CPU           Memory               I/O Performance       Cost per hour
Instance Size                           Storage

Extra Small     1.0 GHz       768 MB    20 GB      Low (5 Mbps)          $0.05

Small           1.6 GHz       1.75 GB   225 GB     Moderate (100 Mbps)   $0.12

Medium          2 x 1.6 GHz   3.5 GB    490 GB     High (200 Mbps)       $0.24

Large           4 x 1.6 GHz   7 GB      1,000 GB   High (400 Mbps)       $0.48

Extra Large     8 x 1.6 GHz   14 GB     2,040 GB   High (800 Mbps)       $0.96
Example
• 1 XS webrole instance (1 Ghz, 768 Mb, 5Mbps)
   – Dynamic home page but with relatively static content
• Limited to 50 concurrent users, yet only
   –   10% CPU used
   –   80% Memory used (by OS)
   –   Plenty of free disk space
   –   Limited by bandwidth IO
• Scaling out to 2 instances
   – Moves the tipping point
   – But wastes 90% cpu, 20% Memory
   – Twice
Demo: Hammering
                    the rabit


Yves Goeleven
Capgemini
@YvesGoeleven
http://cloudshaper.wordpress.com
Offload static content
• Better is to remove the bottle neck
     – In this case IO
• Offload static content to
     – Blob storage, CDN
• Leaves more power to handle
  dynamic workload
     – Increases number of users served
     – Better utilization of CPU & Memory
     – Relative to bandwidth

 CDN = Content Delivery network
 •   Content cache near internet edges (24 datacenters), static content close to user
 •   Great response times, > 200% performance improvement in my test
Cache, cache, cache
• The internet has multiple levels of cache
  –   Browser & proxy cache
  –   Kernel & output cache
  –   Memory
  –   Windows Azure Appfabric Cache
• Ensures low latency
  – Memory is faster than IO
  – Less time waiting for IO
  – Means more resources to handle requests
Demo: Hammering
                    the rabit again


Yves Goeleven
Capgemini
@YvesGoeleven
http://cloudshaper.wordpress.com
Balance your workloads
• Visual studio projects force you in a 1 logical role
  = 1 physical role instance mindset
   – Website = web role, Background process = Worker
     role
   – Becomes expensive and wastes a lot of capacity
• Combine different types of workload
  in same webrole instance                      Website
   – Website (Bandwith heavy)
   – Background process (Cpu heavy)
                                              Background
                                                Process

• Immediate 50% cost reduction!             Web Role Instance
Monitor your roles
• Ideally all roles operate at 80% overall capacity
  utilisation
   – Leaves room for sudden peaks
   – Still efficient use of the capacity you rented
• Monitoring your roles is key
   – Add performance counters for CPU, Memory, …
   – Store measurements in Windows Azure Storage
• On premises monitoring software
   – Polls storage for metrics
   – F.e Cerabrata Diagnostics Manager
The holy grail
• Smart auto scaling & dynamic workload
  allocation           Bandwidth




                                                         Bandwidth




                                                                                           Bandwidth
                                         Memory




                                                                           Memory
                                                                     CPU
       Memory

                Disk




                                   CPU



                                                  Disk




                                                                                    Disk
 CPU




          Role                              Role                              Role

                                   Scale out at 80%
Key Success Factors


   Global,
   Short time to market,
   Performant,
   Highly scalable,
   Highly available,
   Relatively cheap,
   Easy
Agenda
•   Introduction
•   Rabits?
•   Cloud Power
•   The Weakest Link
•   Understanding Capacity
•   Self Everything
Issues of scale
• Rabits join millions of people all over the world
• Some traditional tasks suddenly become very
  hard
• How to do?
   –   End user training
   –   Helpdesk & support
   –   User acceptance tests
   –   …
Self everything
• Self service
   – Signup, pay, use, maintain…
• Self marketing
   – Use the power of social networks
• Self supporting
   – Easy to use, inductive, UI
   – Build a community for support
• Self educating, testing
   – Offer early beta’s to the public
   – Provide means for feedback
Key Success Factors


   Global,
   Short time to market,
   Performant,
   Highly scalable,
   Highly available,
   Relatively cheap,
   Easy
Questions?


Yves Goeleven
Capgemini
@YvesGoeleven
http://cloudshaper.wordpress.com

Azug - successfully breeding rabits

  • 1.
    Azug Successfully breeding rabits Yves Goeleven Capgemini @YvesGoeleven http://cloudshaper.wordpress.com
  • 2.
    Agenda • Introduction • Rabits? • Cloud Power • The Weakest Link • Understanding Capacity • Self Everything
  • 3.
    Rabits ? • Rapidbits, – small public apps like websites & phone apps – They want to live outside, in the wild – They need to get there fast – Once they are there, they’ll need some space to multiply, scale – And they move on quickly
  • 4.
    Examples • Apps – Personal & work related • Branding websites – Product launches, Special campaigns • Predictable big events – Olympics, Elections • Unpredictable events – Disasters, Terrorist attacks – Celebrity Death
  • 5.
    Business context • Theworld has changed over the past decade – Consumerization of IT – Technology in every day life – Globalization • New and large scale business opportunities appear – 2.1 Billion internet users – 4.6 Billion phones – 1.4 Billion Households with TV’s • Elasticity needed as demand varies wildly
  • 6.
    Key Success Factors How to prevent Global, road kill? Short time to market, Performant, Highly scalable, Highly available, Relatively cheap, Easy
  • 7.
    Good thing rabitshave cloud power
  • 8.
    Agenda • Introduction • Rabits? • Cloud Power • The Weakest Link • Understanding Capacity • Self Everything
  • 9.
    Fast Time ToMarket - Services Easy authorization to applications Caching & Workflow & …
  • 10.
    Global - Spreadyour rabits all over the world
  • 11.
  • 12.
    Performance - Impactof global 50ms
  • 13.
    Performance - Impactof global 100ms 50ms
  • 14.
    Performance - Impactof global 100ms 50ms
  • 15.
  • 16.
    Key Success Factors Global, Short time to market, Performant, Highly scalable, Highly available, Relatively cheap, Easy
  • 17.
    Agenda • Introduction • Rabits? • Cloud Power • The Weakest Link • Understanding Capacity • Self Everything
  • 18.
    The weakest link •Overall scalability and availability – Limited by the weakest component • If the backend can only handle 30 users – It doesn’t matter that the front- end could handle 1.000.000
  • 19.
    The weakest link • Typically the weakest link is one of the following: – Integration points – Data stores – Long processes
  • 20.
    Remember • Everything haslimits! • Including azure resources – Storage account: 5000 requests/sec – Storage container: 500 requests/sec – Bandwith depending on instance size – Etc... • Luckily you can get multiple of these
  • 21.
    But what ifyou can’t? • Keep them out of the critical path – Cache view model data or output – Queue commands
  • 22.
    Cache • Windows AzureAppfabric Cache – Distributed cache • Reduces queries – To less scalable components • Store data close to the app – Otherwise the whole point is moot
  • 23.
    Queued command processing •Avoid being swarmed by incoming commands – Use a queue to throttle • Handle commands at a controlled speed – that of the least scalable component
  • 24.
    Recommended Architecture: CQRS Queries Cache Queries View Model Updater Publish Input Handler Web Persistence Validation Validation Storage Rules Worker Commands
  • 25.
    Recommended Architecture: EDA Event generators Event Stream Time Event consumers
  • 26.
    Side effects ofthese architectures • Caches need to be updated regularly – Time based – Event based • User interface must be adapted – Task orientation required – ISO 9241-151 requires this anyway
  • 27.
    What if thingsbreak? • Make sure you have a backup instance! • Fabric controller – At least 2 instances in seperate fault domains • Traffic manager – Spread over multiple datacenters • Azure storage – Automatically replicated across datacenters • SQL Azure – Replicate using data sync
  • 28.
    Multiple instances • Don’trely on machine dependencies – Avoid reliance on memory (except as cache) – Session state is evil – WCF default wsdl addressing behavior – Ensure encryption algorithms use service certificates – ...
  • 29.
    Technology can help KeySuccess Factors Global, Windows Azure Tech •Short time to market, – Queue Performant, storage HighlyAppfabric Service Bus – scalable, •Highly available, Tech Framework – NServiceBus Relatively cheap, Easy – SignalR
  • 30.
    Agenda • Introduction • Rabits? • Cloud Power • The Weakest Link • Understanding Capacity • Self Everything
  • 31.
    Keeping it cheap • Understanding capacity – Pay for what you can ‘potentially’ use, aka the capacity – Instances are baskets of capacity : CPU, Memory, … – Ensure everything is efficiently used before scaling out Compute Instance CPU Memory I/O Performance Cost per hour Instance Size Storage Extra Small 1.0 GHz 768 MB 20 GB Low (5 Mbps) $0.05 Small 1.6 GHz 1.75 GB 225 GB Moderate (100 Mbps) $0.12 Medium 2 x 1.6 GHz 3.5 GB 490 GB High (200 Mbps) $0.24 Large 4 x 1.6 GHz 7 GB 1,000 GB High (400 Mbps) $0.48 Extra Large 8 x 1.6 GHz 14 GB 2,040 GB High (800 Mbps) $0.96
  • 32.
    Example • 1 XSwebrole instance (1 Ghz, 768 Mb, 5Mbps) – Dynamic home page but with relatively static content • Limited to 50 concurrent users, yet only – 10% CPU used – 80% Memory used (by OS) – Plenty of free disk space – Limited by bandwidth IO • Scaling out to 2 instances – Moves the tipping point – But wastes 90% cpu, 20% Memory – Twice
  • 33.
    Demo: Hammering the rabit Yves Goeleven Capgemini @YvesGoeleven http://cloudshaper.wordpress.com
  • 34.
    Offload static content •Better is to remove the bottle neck – In this case IO • Offload static content to – Blob storage, CDN • Leaves more power to handle dynamic workload – Increases number of users served – Better utilization of CPU & Memory – Relative to bandwidth CDN = Content Delivery network • Content cache near internet edges (24 datacenters), static content close to user • Great response times, > 200% performance improvement in my test
  • 35.
    Cache, cache, cache •The internet has multiple levels of cache – Browser & proxy cache – Kernel & output cache – Memory – Windows Azure Appfabric Cache • Ensures low latency – Memory is faster than IO – Less time waiting for IO – Means more resources to handle requests
  • 36.
    Demo: Hammering the rabit again Yves Goeleven Capgemini @YvesGoeleven http://cloudshaper.wordpress.com
  • 37.
    Balance your workloads •Visual studio projects force you in a 1 logical role = 1 physical role instance mindset – Website = web role, Background process = Worker role – Becomes expensive and wastes a lot of capacity • Combine different types of workload in same webrole instance Website – Website (Bandwith heavy) – Background process (Cpu heavy) Background Process • Immediate 50% cost reduction! Web Role Instance
  • 38.
    Monitor your roles •Ideally all roles operate at 80% overall capacity utilisation – Leaves room for sudden peaks – Still efficient use of the capacity you rented • Monitoring your roles is key – Add performance counters for CPU, Memory, … – Store measurements in Windows Azure Storage • On premises monitoring software – Polls storage for metrics – F.e Cerabrata Diagnostics Manager
  • 39.
    The holy grail •Smart auto scaling & dynamic workload allocation Bandwidth Bandwidth Bandwidth Memory Memory CPU Memory Disk CPU Disk Disk CPU Role Role Role Scale out at 80%
  • 40.
    Key Success Factors Global, Short time to market, Performant, Highly scalable, Highly available, Relatively cheap, Easy
  • 41.
    Agenda • Introduction • Rabits? • Cloud Power • The Weakest Link • Understanding Capacity • Self Everything
  • 42.
    Issues of scale •Rabits join millions of people all over the world • Some traditional tasks suddenly become very hard • How to do? – End user training – Helpdesk & support – User acceptance tests – …
  • 43.
    Self everything • Selfservice – Signup, pay, use, maintain… • Self marketing – Use the power of social networks • Self supporting – Easy to use, inductive, UI – Build a community for support • Self educating, testing – Offer early beta’s to the public – Provide means for feedback
  • 44.
    Key Success Factors Global, Short time to market, Performant, Highly scalable, Highly available, Relatively cheap, Easy
  • 45.