Software architectures
    for the cloud
       Georgios Gousios
          TU Delft
What is the “cloud”?




           http://siliconangle.com/blog/2011/10/01/this-week-in-the-cloud-from-backup-to-mobile/cloud-computing-3-2/
Characteristics

• “Infinite” virtualized resources
• Elasticity
• Services / APIs
• Global distribution, replication
• Pay as you go
Enabling technologies
• Multicores, cheaper RAM
• Virtualization
  • Xen,VMWare, KVM etc
  • Hardware assisted
• Tiered / Distributed / Replicated storage
• Datacenter capacity increasing in superlinear
  fashion
* as a service
•   Infrastructure
    •   Compute power
    •   Storage
    •   Networking
•   Platform
    •   Architectural components
    •   Security
    •   Data processing
•   Software
IaaS


•   Full control of stack           •   More difficult to setup and
                                        maintain

•   Can work with existing
                                    •   Scaling needs to be
    apps
                                        handled explicitly
PaaS

                                    •   Vendor lock in
•   Very fast to bootstrap
                                    •   Not all tools, frameworks
•   No need to take care of             languages available
    infrastructure
                                    •   Limited ability to
•   Scaling can be automatic            implement custom
                                        solutions
Cloud Applications
• Selection of IaaS and/or PaaS platform
• Providing access to data through APIs
• Cloud-specific non-functional requirements
 • High availability
 • Performance
 • Scalability
High availability
•   Minimize downtime
•   Usually reported as number of 9’s
    •   99,999% availability means ~8.5hr/year downtime
•   Achieved using redundancy
    •   Hardware, data center locations
    •   Data replication
    •   Application design to handle fail overs
Performance

• Minimize response time
• Concurrency and parallelism
• Hide interaction times with external
  services
Vertical   Scalability
 (more
HW per
machine)




              Horizontal (more machines)
Vertical   Scalability
 (more
HW per             Expectation
machine)




              Horizontal (more machines)
Vertical   Scalability
 (more
HW per             Expectation
machine)




                                 Reality




              Horizontal (more machines)
Architectural
        Components
• Not only SQL databases
• Queues
• Caches
• Load balancers
• Content delivery networks
• Service APIs
NoSQL
•   Document stores
•   Key - Value stores
•   Object, Graph databases
•   ACID properties vs performance
    •   Eventual consistency
•   Distribution and replication
•   Ad-hoc or API based querying
DB Layout (MongoDB)
             http://www.mongodb.org/display/DOCS/Sharding+Introduction
> d = {foo: 1, bar: {baz :2}}
{ "foo" : 1, "bar" : { "baz" : 2 } }
> db.test.insert(d)

> d = {foo: 5, bar: {baz :2}}
{ "foo" : 5, "bar" : { "baz" : 2 } }
> db.test.insert(d)

> db.test.find({"foo": 5})
{ "_id" : ObjectId("507aab34c502814115c6f61a"), "foo" : 5, "bar" : { "baz" : 2 } }

> db.test.find({"bar.baz":2})
{ "_id" : ObjectId("507aab34c502814115c6f61a"), "foo" : 1, "bar" : { "baz" : 2 } }




        Session example (MongoDB)
Queues

• Producers, consumers and messages
• FIFO processing
• Message durability
• Exchanges and message routing
• Good for offline or batch jobs
Application layout
               http://www.rabbitmq.com/tutorials/tutorial-five-python.html
AMQP.start(:host => '127.0.0.1', ...) do |connection|
   channel = AMQP::Channel.new(connection)
   exchange = channel.topic("test", :durable => true, :auto_delete => false)
   queue    = channel.queue("hello", :auto_delete => true)
                     .bind(exchange, :routing_key => "hello.*")

  queue.subscribe(:ack => true) do |headers, msg|
     begin
         process_msg(msg)
         headers.ack
      rescue
         headers.reject(:requeue => !headers.redelivered?)
      end
     end

   (1..10).times do |x|
    exchange.publish "Hello, world!", :routing_key => "hello.#{i}"
  end
end




    Processing messages (AMQP)
Caches
•   In memory key/value pairs
•   Distributed
•   Timeout based
•   Store results of expensive computations
    •   Dynamic page renderings
    •   Database queries
    •   Lookups to external services
def expensive_computation(id)
  result = cache_get(id)
  return result.value if not result.nil? && result.age < :max_age

  val = do_expensive_computation(id)
  cache.set(id, Result.new(val))

  val
end




             Using caching
Load Balancers
• Distribute incoming requests to processing
  nodes
• Handle node failures, heart beating
• Network to application layer
• Adapt resources to load
 • Can start/stop instances
Content delivery
        networks
• Optimized for delivery of static files
• Geographic proximity to clients
• A higher level form of caching
• Usually an external service
Typical CDN architecture
Service APIs
•   Messaging
•   Managing in-application payments
•   Indexing application datastores
•   Authentication - OAuth
•   Logging and monitoring
•   Big data analytics, Map/Reduce
•   Maps, geolocation
OpenID login
               https://developers.google.com/accounts/docs/OpenID
<script src="https://maps.googleapis.com/maps/api/js"/>
<script>
  var map;
  function initialize() {
    var mapOptions = {
       zoom: 8,
       center: new google.maps.LatLng(-34.397, 150.644),
       mapTypeId: google.maps.MapTypeId.ROADMAP
    };
    map = new google.maps.Map(
       document.getElementById('map_canvas'),
       mapOptions);
    }

    google.maps.event.addDomListener(window, 'load',
                                     initialize);
</script>




                            Google Maps
POST: http://www.bugsense.com/api/errors
  {
      "client": {
         "name": "bugsense-android",
         "version": "0.6"
      },
      "request": {
         "remote_ip": "10.0.0.1",
         "custom_data": {
            "key1": "value1",
            "key2": "value2"
         }
      },
      "exception": {
         "message": "java.lang.RuntimeException",
         "where": "MainActivity.java:47",
         "klass": "java.lang.RuntimeException",
         "backtrace": "..."
      },
      "application_environment": {
         "phone": "android",
         "appver": "1.2",
         "appname": "com.sfalma",
         "osver": "2.3",
         "wifi_on": "true",
         "mobile_net_on": "true",
         "gps_on": "true",
         "screen_dpi(x:y)": "120.0:120.0",
         "screen:width": "240",
         "screen:height": "400",
         "screen:orientation": "normal"
      }
  }




             App monitoring: Bugsense
                                                    http://www.bugsense.com/features
AWS Map/Reduce
                                              http://commons.wikimedia.org/wiki/File:Text-x-java-source.svg
        http://commons.wikimedia.org/wiki/File:Gnuplot_tcp_analysis.png http://aws.amazon.com/articles/1632
Case study
Background
• Photo sharing service
• 80M users in less than 2 years
 • 10M users in 10 days when Android
    version came out
• (Apr 2012) Team: 5 engineers
• Runs on the AWS cloud
Non functional
       requirements
• Never loose data, never go out of service
• Counter network effects
 • more users will bring in more users
 • updates by famous users
• Minimal costs
Functional
        Requirements
• Applications for all mobile platforms
• Store large static files
• Store metadata
 • tagging, geolocation, friends
• Indexing to support search
• API with support for callbacks
How would you
     architect it?
Compute Node      Key/Value database
Machine Cluster   Object store
Queue             Relational Database
Load Balancer     Content delivery net
Cache
Amazon S3       Photos




            Metadata

 Postgres                Django/Nginx




       Basic architecture
Photos
Amazon S3



             Metadata

 Postgres                 Django/Nginx   elastic load
                                          balancer




            Scaling web traffic
Photos
            Amazon S3


               Keys: A-H
Postgres

           Keys: I-O
                                       Django/Nginx   elastic load
                                                       balancer
                        DB Sharding
Postgres
            Keys: P-Z




Postgres




      Scaling the data layer
Amazon S3          Photos




                                Keys: A-H
                Postgres

                            Keys: I-O
                                                       Django/Nginx   elastic load
                                                                       balancer
                                         DB Sharding
                Postgres
                             Keys: P-Z




                Postgres


Elastic Block
    Store




       Ensuring adequate DB performance
Amazon S3          Photos

                                                              Django/Nginx


                                Keys: A-H
                Postgres

                            Keys: I-O
                                                                              elastic load
                                                                               balancer
                                         DB Sharding
                Postgres
                             Keys: P-Z



                                                           Redis             Memcache
                Postgres
                                                       Sessions,
                                                       data feeds
Elastic Block
    Store



         Improving responsiveness
Content delivery


Object store                           Application servers




    DB          DB Sharding                                    load
                                                             balancer


                              Sessions,
                              data feeds



               Improving responsiveness
                     (simplified)
Content delivery


 Amazon S3
                                         Application servers




     DB               DB Sharding                                load
                                                               balancer


                          Job Queue
                                        Sessions,
                                        data feeds
    Job Workers
      (indexing,
 push notifications)




Handling background jobs
Benefits from running
     on cloud
•   Always available hardware
    •   120 nodes of different RAM/CPU
        configurations
•   Reasonably guaranteed data availability
    •   Object store offers 99,9999999% availability
•   Automated load balancing, scaling
•   0 startup hardware costs
To sum up
•   The cloud is a collection of technologies
•   Facilitates business development with 0 upfront
    hardware investment
•   Facilitates application development by means of
    services and APIs
•   Applications must be
    •   Distributed by design
    •   Fault tolerant by design
References
•   M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D.
    Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,”
    Commun. ACM, vol. 53, pp. 50–58, Apr. 2010.

•   G. Garrison, S. Kim, and R. L. Wakefield, “Success factors for deploying cloud
    computing,” Commun. ACM, vol. 55, pp. 62–68, Sept. 2012.

•   P. Louridas, “Up in the air: Moving your applications to the cloud,” Software, IEEE,
    vol. 27, no. 4, pp. 6–11, 2010.

•   B. Wilder, Cloud Architecture Patterns. O’Reilly Media, Inc., Sep 2012.

•   Instagram Engineering. What powers instagram: Hundreds of instances, dozens of
    technologies, 2011.

Software architectures for the cloud

  • 1.
    Software architectures for the cloud Georgios Gousios TU Delft
  • 2.
    What is the“cloud”? http://siliconangle.com/blog/2011/10/01/this-week-in-the-cloud-from-backup-to-mobile/cloud-computing-3-2/
  • 3.
    Characteristics • “Infinite” virtualizedresources • Elasticity • Services / APIs • Global distribution, replication • Pay as you go
  • 4.
    Enabling technologies • Multicores,cheaper RAM • Virtualization • Xen,VMWare, KVM etc • Hardware assisted • Tiered / Distributed / Replicated storage • Datacenter capacity increasing in superlinear fashion
  • 5.
    * as aservice • Infrastructure • Compute power • Storage • Networking • Platform • Architectural components • Security • Data processing • Software
  • 6.
    IaaS • Full control of stack • More difficult to setup and maintain • Can work with existing • Scaling needs to be apps handled explicitly
  • 7.
    PaaS • Vendor lock in • Very fast to bootstrap • Not all tools, frameworks • No need to take care of languages available infrastructure • Limited ability to • Scaling can be automatic implement custom solutions
  • 8.
    Cloud Applications • Selectionof IaaS and/or PaaS platform • Providing access to data through APIs • Cloud-specific non-functional requirements • High availability • Performance • Scalability
  • 9.
    High availability • Minimize downtime • Usually reported as number of 9’s • 99,999% availability means ~8.5hr/year downtime • Achieved using redundancy • Hardware, data center locations • Data replication • Application design to handle fail overs
  • 10.
    Performance • Minimize responsetime • Concurrency and parallelism • Hide interaction times with external services
  • 11.
    Vertical Scalability (more HW per machine) Horizontal (more machines)
  • 12.
    Vertical Scalability (more HW per Expectation machine) Horizontal (more machines)
  • 13.
    Vertical Scalability (more HW per Expectation machine) Reality Horizontal (more machines)
  • 14.
    Architectural Components • Not only SQL databases • Queues • Caches • Load balancers • Content delivery networks • Service APIs
  • 15.
    NoSQL • Document stores • Key - Value stores • Object, Graph databases • ACID properties vs performance • Eventual consistency • Distribution and replication • Ad-hoc or API based querying
  • 16.
    DB Layout (MongoDB) http://www.mongodb.org/display/DOCS/Sharding+Introduction
  • 17.
    > d ={foo: 1, bar: {baz :2}} { "foo" : 1, "bar" : { "baz" : 2 } } > db.test.insert(d) > d = {foo: 5, bar: {baz :2}} { "foo" : 5, "bar" : { "baz" : 2 } } > db.test.insert(d) > db.test.find({"foo": 5}) { "_id" : ObjectId("507aab34c502814115c6f61a"), "foo" : 5, "bar" : { "baz" : 2 } } > db.test.find({"bar.baz":2}) { "_id" : ObjectId("507aab34c502814115c6f61a"), "foo" : 1, "bar" : { "baz" : 2 } } Session example (MongoDB)
  • 18.
    Queues • Producers, consumersand messages • FIFO processing • Message durability • Exchanges and message routing • Good for offline or batch jobs
  • 19.
    Application layout http://www.rabbitmq.com/tutorials/tutorial-five-python.html
  • 20.
    AMQP.start(:host => '127.0.0.1',...) do |connection| channel = AMQP::Channel.new(connection) exchange = channel.topic("test", :durable => true, :auto_delete => false) queue = channel.queue("hello", :auto_delete => true) .bind(exchange, :routing_key => "hello.*") queue.subscribe(:ack => true) do |headers, msg| begin process_msg(msg) headers.ack rescue headers.reject(:requeue => !headers.redelivered?) end end (1..10).times do |x| exchange.publish "Hello, world!", :routing_key => "hello.#{i}" end end Processing messages (AMQP)
  • 21.
    Caches • In memory key/value pairs • Distributed • Timeout based • Store results of expensive computations • Dynamic page renderings • Database queries • Lookups to external services
  • 22.
    def expensive_computation(id) result = cache_get(id) return result.value if not result.nil? && result.age < :max_age val = do_expensive_computation(id) cache.set(id, Result.new(val)) val end Using caching
  • 23.
    Load Balancers • Distributeincoming requests to processing nodes • Handle node failures, heart beating • Network to application layer • Adapt resources to load • Can start/stop instances
  • 24.
    Content delivery networks • Optimized for delivery of static files • Geographic proximity to clients • A higher level form of caching • Usually an external service
  • 25.
  • 26.
    Service APIs • Messaging • Managing in-application payments • Indexing application datastores • Authentication - OAuth • Logging and monitoring • Big data analytics, Map/Reduce • Maps, geolocation
  • 27.
    OpenID login https://developers.google.com/accounts/docs/OpenID
  • 28.
    <script src="https://maps.googleapis.com/maps/api/js"/> <script> var map; function initialize() { var mapOptions = { zoom: 8, center: new google.maps.LatLng(-34.397, 150.644), mapTypeId: google.maps.MapTypeId.ROADMAP }; map = new google.maps.Map( document.getElementById('map_canvas'), mapOptions); } google.maps.event.addDomListener(window, 'load', initialize); </script> Google Maps
  • 29.
    POST: http://www.bugsense.com/api/errors { "client": { "name": "bugsense-android", "version": "0.6" }, "request": { "remote_ip": "10.0.0.1", "custom_data": { "key1": "value1", "key2": "value2" } }, "exception": { "message": "java.lang.RuntimeException", "where": "MainActivity.java:47", "klass": "java.lang.RuntimeException", "backtrace": "..." }, "application_environment": { "phone": "android", "appver": "1.2", "appname": "com.sfalma", "osver": "2.3", "wifi_on": "true", "mobile_net_on": "true", "gps_on": "true", "screen_dpi(x:y)": "120.0:120.0", "screen:width": "240", "screen:height": "400", "screen:orientation": "normal" } } App monitoring: Bugsense http://www.bugsense.com/features
  • 30.
    AWS Map/Reduce http://commons.wikimedia.org/wiki/File:Text-x-java-source.svg http://commons.wikimedia.org/wiki/File:Gnuplot_tcp_analysis.png http://aws.amazon.com/articles/1632
  • 31.
  • 32.
    Background • Photo sharingservice • 80M users in less than 2 years • 10M users in 10 days when Android version came out • (Apr 2012) Team: 5 engineers • Runs on the AWS cloud
  • 33.
    Non functional requirements • Never loose data, never go out of service • Counter network effects • more users will bring in more users • updates by famous users • Minimal costs
  • 34.
    Functional Requirements • Applications for all mobile platforms • Store large static files • Store metadata • tagging, geolocation, friends • Indexing to support search • API with support for callbacks
  • 35.
    How would you architect it? Compute Node Key/Value database Machine Cluster Object store Queue Relational Database Load Balancer Content delivery net Cache
  • 36.
    Amazon S3 Photos Metadata Postgres Django/Nginx Basic architecture
  • 37.
    Photos Amazon S3 Metadata Postgres Django/Nginx elastic load balancer Scaling web traffic
  • 38.
    Photos Amazon S3 Keys: A-H Postgres Keys: I-O Django/Nginx elastic load balancer DB Sharding Postgres Keys: P-Z Postgres Scaling the data layer
  • 39.
    Amazon S3 Photos Keys: A-H Postgres Keys: I-O Django/Nginx elastic load balancer DB Sharding Postgres Keys: P-Z Postgres Elastic Block Store Ensuring adequate DB performance
  • 40.
    Amazon S3 Photos Django/Nginx Keys: A-H Postgres Keys: I-O elastic load balancer DB Sharding Postgres Keys: P-Z Redis Memcache Postgres Sessions, data feeds Elastic Block Store Improving responsiveness
  • 41.
    Content delivery Object store Application servers DB DB Sharding load balancer Sessions, data feeds Improving responsiveness (simplified)
  • 42.
    Content delivery AmazonS3 Application servers DB DB Sharding load balancer Job Queue Sessions, data feeds Job Workers (indexing, push notifications) Handling background jobs
  • 43.
    Benefits from running on cloud • Always available hardware • 120 nodes of different RAM/CPU configurations • Reasonably guaranteed data availability • Object store offers 99,9999999% availability • Automated load balancing, scaling • 0 startup hardware costs
  • 44.
    To sum up • The cloud is a collection of technologies • Facilitates business development with 0 upfront hardware investment • Facilitates application development by means of services and APIs • Applications must be • Distributed by design • Fault tolerant by design
  • 45.
    References • M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Commun. ACM, vol. 53, pp. 50–58, Apr. 2010. • G. Garrison, S. Kim, and R. L. Wakefield, “Success factors for deploying cloud computing,” Commun. ACM, vol. 55, pp. 62–68, Sept. 2012. • P. Louridas, “Up in the air: Moving your applications to the cloud,” Software, IEEE, vol. 27, no. 4, pp. 6–11, 2010. • B. Wilder, Cloud Architecture Patterns. O’Reilly Media, Inc., Sep 2012. • Instagram Engineering. What powers instagram: Hundreds of instances, dozens of technologies, 2011.