SlideShare a Scribd company logo
1 of 97
Lorenzo Alberton
                     @lorenzoalberton


Scalable Architectures:
            Taming the
      Twitter Firehose
            Patterns for scalable
             real-time platforms



                        PHPDay 2012
               Verona, 18th May 2012
                                        1
Outline

1) SOAs
   scaling the platform




                          2
Outline

1) SOAs
   scaling the platform

   2) Message Queues
          scaling the communication




                                      2
Outline

1) SOAs
   scaling the platform

   2) Message Queues
          scaling the communication

          3) Monitoring
               scaling the maintainability
                                             2
DataSift Architecture
           High-level overview




                                 3
DataSift Architecture




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   4
1/4) Ingestion of Input Streams




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   5
2/4) Filtering




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   6
3/4) Delivery / Frontend




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   7
4/4) Monitoring / Historics / Analytics




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   8
DataSift



           350+ Million
            input messages/day *


                  ~330 Million
           from the Twitter Firehose alone

                                             9
DataSift



           2 Terabyte
      messages processed in real time
           and stored every day

               ~1 Petabyte
              of storage available

                                        10
DataSift



           Thousands
    of concurrent, custom output streams


            all crafted with tender love
                and surgical precision

                                           11
SOA
Service-Oriented Architectures




                                 12
Service-Oriented Architectures




       Service                    Service                             Service
         A                           B                                  C

   Loose Coupling - Separation of Responsibilities
                 http://en.wikipedia.org/wiki/Service-oriented_architecture     13
Service-Oriented Architectures

                                                                   Consumer




       Service                    Service                             Service
         A                           B                                  C

 Separate ConsumersSeparation of Responsibilities
   Loose Coupling - From Service Implementation
                 http://en.wikipedia.org/wiki/Service-oriented_architecture     13
Service-Oriented Architectures

 Consumer Consumer




      Proxy Cache

       Service                       Service                             Service
         A                              B                                  C

 Separate ConsumersSeparation of Responsibilities
   Loose Couplingcaching atService Implementation
       Aggressive - From application level
                    http://en.wikipedia.org/wiki/Service-oriented_architecture     13
Service-Oriented Architectures

                                 Orchestrator




         Service                    Service                             Service
           A                           B                                  C


Orchestration of distinctFrom ServiceResponsibilities
  Separate ConsumersSeparation of Implementation
    Loose Couplingcaching at accessible over a network
        Aggressive - units application level
                   http://en.wikipedia.org/wiki/Service-oriented_architecture     13
Service-Oriented Architectures

                                 Orchestrator



             JSON                                              Thrift
                               XML

         Service                    Service                             Service
           A                           B                                  C


Communication distinctFrom Service Implementation
  Separate ConsumersSeparation interoperablenetwork
Orchestration of via a -well-definedof Responsibilities
    Loose Couplingcaching at accessible over a format
        Aggressive        units application level
                   http://en.wikipedia.org/wiki/Service-oriented_architecture     13
Independent Horizontal Scaling

                    Service
                      A

   Orchestrator




                    Service
                       B

                                 14
Independent Horizontal Scaling

                    Service
                      A

   Orchestrator




                    Service
                       B

                                 14
Independent Horizontal Scaling

                                  Service
                                    A

   Orchestrator




                  Load Balancer
                                  Service
                                    B1      Load balancing
                                            -
                                  Service
                                  Service   Multiple nodes
                                     B
                                    B2

                                                             14
Independent Horizontal Scaling




                  Rev.Proxy
                                            Better single-node
                                  Service   performances with
                                    A       application-level
                                            caching

   Orchestrator




                  Load Balancer
                                  Service
                                    B1      Load balancing
                                            -
                                  Service
                                  Service   Multiple nodes
                                     B
                                    B2

                                                             14
Cell Architectures

                                       Ensure that everything
        +1                            you develop has at least
                                      one additional instance
     N + 1 design
                                        of that system in the
                                           event of failure.

                                          Have multiple live,
                                        isolated nodes of the
        multiple
                                       same type to distribute
      live nodes                               the load.

              http://highscalability.com/blog/2012/5/9/cell-architectures.html   15
Cardinality of Nodes on Each Service


                                  3                  2                                                    2
                                                                         5
                                                     2
                                                                        2
                                     2
                   8                             8
                              5

                               7                                                           60+
                                       7 7
                                     7 7 7
   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   16
Load-Balancing




           Example


                     17
Load-Balancing with HAProxy
 # /etc/haproxy.cfg

 global                           frontend http-in
    daemon                           bind *:80
    maxconn 256                      default_backend mysvc
                                   
 defaults                         backend mysvc
    mode http                        server s1 10.0.1.10:7474 maxconn 32
    timeout connect 5000ms           server s2 10.0.1.11:7474 maxconn 32
    timeout client 50000ms         
    timeout server 50000ms        listen admin
                                     bind *:8080
                                     stats enable



Start by running
/usr/sbin/haproxy -f /etc/haproxy.cfg

                             http://haproxy.1wt.eu/                   18
Load-Balancing with Varnish
backend node01 {                         backend node02 {
  .host = "svc01.myhost.com";              .host = "svc02.myhost.com";
  .probe = {                               .probe = {
     .url = "/";                              .url = "/";
     .interval = 1s;                          .interval = 1s;
     .timeout = 50 ms;                        .timeout = 50 ms;
     .window = 2;                             .window = 2;
     .threshold = 2;                          .threshold = 2;
  }                                        }
}                                        }
director mysvcdir round-robin {
   {.backend = node01;}                                                        mysvc
   {.backend = node02;}
                                            Request                 50%       node 01


                                                      Varnish
}
sub vcl_recv {
   set req.backend = mysvcdir;                                      50%        mysvc
   return(pass);
                                                                round-robin   node 02
}
                         http://varnish-cache.org                                       19
Caching




          Example


                    20
Caching with Varnish
No special directives required to cache normal requests.
Just use the defaults, and set Cache-Control headers.

  <?php
  $ttl = 300; //cache for 5 minutes
  $ts = new DateTime('@' . (time() + $ttl));
  header("Expires: " . $ts->format(DateTime::RFC1123));
  header("Cache-Control: max-age=$ttl, must-revalidate");
  ?>



 Warning: by default, pages with cookies are not cached
                                                          21
Application Programming Interfaces



                APIs
      Software-to-Software Contract




                                      22
API Docs Guidelines

Simple (RESTful verbs, actions on resources)


Well defined (action, endpoint, parameters, response)


Discoverable (self-describing endpoint)


Working documentation

                                                       23
APIs everywhere: Internal & External




    http://mashery.com/solution/iodocs   http://console.datasift.com/   24
Service API discovery
GET /<servicename>/api




                         25
Service Host Discovery - Config Mgr
GET /configuration/<servicename>/hosts

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8


{
    “service”: “<servicename>”,
    “hosts”:[
         “10.0.1.33:80”,
         “10.0.1.34:80”
    ],
    “base_path”: “/svc/xyz/”
}
                                                26
Service Host Discovery - Zookeeper
 ZooKeeper is a centralized service for maintaining configuration
 information, naming, providing distributed synchronization, and
 providing group services.                  http://zookeeper.apache.org/

<?php
$zk = new Zookeeper();
$zk->connect('localhost:2181');
                                                              //server
$params = array(array(
     'perms' => Zookeeper::PERM_ALL,
     'scheme' => 'world',
     'id'     => 'anyone'
));
if (!$zk->exists('/services/mysvc/host') {
    $zk->create('/services', 'config for internal services', $params);
    $zk->create('/services/mysvc', 'config for mysvc', $params);
    $zk->create('/services/mysvc/host', 'http://my.site.com', $params);
}

                                                                           27
Service Host Discovery - Zookeeper
 ZooKeeper is a centralized service for maintaining configuration
 information, naming, providing distributed synchronization, and
 providing group services.                  http://zookeeper.apache.org/

<?php
$zk = new Zookeeper();
$zk->connect('localhost:2181');
                                                              //server
$params = array(array(
     'perms' => Zookeeper::PERM_ALL,
     'scheme' => 'world',
     'id'     => 'anyone'
));
if (!$zk->exists('/services/mysvc/host') {
    $zk->create('/services', 'config for internal services', $params);
    $zk->create('/services/mysvc', 'config for mysvc', $params);
    $zk->create('/services/mysvc/host', 'http://my.site.com', $params);
}

                                                                           27
Service Host Discovery - Zookeeper
 ZooKeeper is a centralized service for maintaining configuration
 information, naming, providing distributed synchronization, and
 providing group services.                  http://zookeeper.apache.org/




<?php
$zk = new Zookeeper();
$zk->connect('localhost:2181');
                                                                   //client
$host = $zk->get('/services/mysvc/host');
...




                                                                              28
SOA - Scale Each Component




       http://www.thisnext.com/item/647CD0BE/Matryoshkas-Nesting-Dolls   29
SOA - Scale Each Component




  http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   30
SOA - Scale Each Component

                                SOA: Independently
                                 scalable services.
                            Example on distributing
                               processing load:




  http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   30
Workers for sharing processing load




                                      31
Workers for sharing processing load

    Distribute
  processing load
  among workers.

   Lightweight
  orchestration,
  heavy lifting in
    separate,
  asynchronous
    processes

                                      31
Scale all things!




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   32
Scale all things!




                                           Example on scaling
                                           large data volumes:




   http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html   32
In Case of “Big Data”...




                           33
In Case of “Big Data”...



                With lots of data,
               move the processing
                 logic itself to the
                   storage nodes
                (I/O is expensive)

                  Map/Reduce,
                Parallel Processing

                                       33
Message Queues
  Asynchronous Communication




                               34
Messaging
ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast)

       Internal communication: pass messages to the next processing
       stage in the pipeline, control events, monitoring.
       Very high throughput. Socket library.

Kafka/Redis: PUSH-PULL with persistence*

       Internal message / workload buffering and distribution

Node.js: WebSockets / HTTP Streaming

       Message delivery (output)

                                                                      35
Message queues as Buffers (Decoupling)



     P                                C
          Unpredictable load spikes




                                          36
Message queues as Buffers (Decoupling)



     P                                    C
           Unpredictable load spikes




     P                                    C


         Load normalisation / smoothing

                                              36
Message queues as Buffers (Decoupling)



     P                                    C
           Unpredictable load spikes




     P                                    C


         Load normalisation / smoothing
         Batching ⇒ higher throughput
                                              36
Redis Buffer




           Example


                     37
Redis Processing Queue
<?php                                                              //producer(s)
$redis = new Redis();
$redis->connect('127.0.0.1', 6379, 1.5); // timeout 1.5 seconds
...
// push items to the queue as they are produced
$redis->lPush('queue:xyz', $item);
...


<?php
...                                                               //consumer(s)
while (true) {
    // read items off the queue as they are available
    // block for up to 2 seconds (timeout)
    $item = redis->brPop('queue:xyz', 2);
    ...
}

     https://github.com/nicolasff/phpredis   https://github.com/chrisboulton/php-resque   38
Kafka Buffer




           Example


                     39
Kafka Processing Queue
Producer
<?php
$host = '127.0.0.1';
$port = 9092;
$producer = new Kafka_Producer($host, $port);
$messages = array(
    'aaa',
    'bbb',
    'ccc',
);
$topic = 'test';

// send a batch of messages (MessageSet)
$bytes_sent = $producer->send($messages, $topic);

         https://github.com/apache/kafka/tree/trunk/clients/php/src/examples
                                                                               40
Kafka Processing Queue
Consumer
<?php
$timeout = 2;        $maxSize = 1000000;
$host = '127.0.0.1'; $port = 9092;
$partition = 0;      $offset = 0;
$topic = 'test';

$consumer = new Kafka_SimpleConsumer($host, $port, $timeout,
                                     $maxSize);
while (true) {
    $request = new Kafka_FetchRequest($topic, $partition,
                                      $offset, $maxSize);
    $messages = $consumer->fetch($request);
    foreach ($messages as $msg) { echo $msg->payload(); }
    $offset += $messages->validBytes();
}

                                                               41
0mq PUSH-PULL (Workload Distribution)

                                               Consumer 1


                                   p ull

                push                    pull
 Producer                                      Consumer 2

            (blocking operation,   pu
                                     ll
              until delivered to
                one worker)
                                               Consumer 3




                                                        42
Workload Distribution




            Example


                        43
ZeroMQ Producer (PUSH)

<?php
$context = new ZMQContext();
$producer = $context->getSocket(ZMQ::SOCKET_PUSH);
$producer->bind('tcp://*:5555');

// send tasks to workers.
foreach ($tasks as $task) {
    // Blocking operation until the message
    // is received by one (and only one) worker
    $producer->send($task);
}
...

                http://zguide.zeromq.org/php:all     44
ZeroMQ Consumers (PULL)

<?php
$context = new ZMQContext();
$worker = $context->getSocket(ZMQ::SOCKET_PULL);
$worker->connect('tcp://myhost:5555');

// process tasks forever
while (true) {
    // receive a message (blocking operation)
    $task = $worker->recv();
    ...
}


                                                   45
0mq PUSH-PULL (Mux)


 Producer 1
              pus
                 hR
                   1, R
                         2, R
                                3
               push R4
 Producer 2                           pull          Consumer

                      5 , R6        fair-queuing:
                  ushR
                p                   R1, R4, R5,
                                    R2, R6, R3
 Producer 3




                                                               46
0mq PUB-SUB (High Availability)

                                      Listener 1


Publisher 1

                                     Listener 2


Publisher 2
                                      Listener 3



              [Broadcast]   [Dynamic Subscriptions]

                                                   47
0mq PUB-SUB (High Availability)


                              DC 1
Publisher 1




Publisher 2

                              DC 2




                                     48
High Availability - Replication




            Example


                                  49
ZeroMQ Producer (PUB)
<?php
$context = new ZMQContext();
$producer = $context->getSocket(ZMQ::SOCKET_PUB);
$producer->bind('tcp://*:5555');
$messages = array(
    // topic => msg
    array('painters' => 'Michelangelo'),
    array('painters' => 'Raffaello'),
    array('sculptors' => 'Donatello'),
);
// send messages to listeners.
foreach ($messages as $msg) {
    // Non-blocking operation. No ACK.
    // Message sent to ALL subscribers
    $producer->sendMulti($msg);
}
                                                    50
ZeroMQ Consumer (SUB)
<?php
$context = new ZMQContext();
$producer = $context->getSocket(ZMQ::SOCKET_SUB);
$producer->connect('tcp://myhost:5555');
$topic = 'painters'; // ignore sculptors
$producer->setSockOption(
    ZMQ::SOCKOPT_SUBSCRIBE,
    $topic
);

// Listen to   messages with given topic
while (true)   {
    list($t,   $m) = $producer->recvMulti();
    // $t is   the topic (‘painters.*’)
}

                                                    51
Interesting Ideas
     Some Architecture Ideas




                               52
Internal “Firehose”

  Publishers                       Subscribers
                             Alice’s       John’s
       Y Z                  timeline       Inbox
   X
                       subscribe
                      to topic X

                      Data Bus
               subscribe
               to topic Y

                     System          Fred’s        Tech
                     Monitor       Followers     Blog Feed



                                                             53
Internal “Firehose”

  Publishers     Data Feeds,                  Subscribers
                User-generated          Alice’s       John’s
                   content,            timeline       Inbox
   X   Y Z
               System events, ...
                                  subscribe
                                 to topic X

                                 Data Bus
                          subscribe
                          to topic Y

                                System          Fred’s        Tech
                                Monitor       Followers     Blog Feed



                                                                        53
Internal “Firehose”

  Publishers   Applications,         Subscribers
                 Services,
                Monitors,        Alice’s     John’s
       Y Z       Routers,       timeline     Inbox
   X
               Repeaters, subscribe
                           ...
                           to topic X

                           Data Bus
                    subscribe
                    to topic Y

                          System          Fred’s      Tech
                          Monitor       Followers   Blog Feed



                                                                53
Internal “Firehose”

  Publishers                              Subscribers
                                    Alice’s       John’s
       Y Z                         timeline       Inbox
   X
                              subscribe
                             to topic X

                             Data Bus
                      subscribe
                      to topic Y
       Everyone
    connected to            System          Fred’s        Tech
   the data bus, no         Monitor       Followers     Blog Feed
    directed graph

                                                                    53
Internal “Firehose”

  Publishers                       Subscribers
                             Alice’s       John’s
       Y Z                  timeline       Inbox
   X
                       subscribe
                      to topic X

                      Data Bus
               subscribe
               to topic Y

                     System          Fred’s        Tech
                     Monitor       Followers     Blog Feed



                                                             53
Monitoring
                 Measure Anything, Measure Everything




http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/   54
Monitoring: Measure Everything




                                 55
Monitoring: Measure Everything




 1. Is there a problem?    User experience / Business metrics monitors

 2. Where is the problem? System monitors (threshold - variance)

 3. What is the problem?   Application monitors




                                                                         55
Monitoring: Measure Everything




 1. Is there a problem?    User experience / Business metrics monitors

 2. Where is the problem? System monitors (threshold - variance)

 3. What is the problem?   Application monitors

                Keep Signal vs. Noise ratio high
                                                                         55
Monitoring: Measure Everything


                                                      StatsD



 1. Is there a problem?    User experience / Business metrics monitors

 2. Where is the problem? System monitors (threshold - variance)

 3. What is the problem?   Application monitors

                Keep Signal vs. Noise ratio high
                                                                         55
Instrumentation




     https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd   56
Look! Monitors!




                  57
Look! Monitors!




                  57
StatsD + Graphite



                       Example
 StatsD: Node.JS daemon. Listens for messages over a UDP port and
 extracts metrics, which are dumped to Graphite for further processing
 and visualisation.

 Graphite: Real-time graphing system. Data is sent to carbon
 (processing back-end) which stores data into Graphite’s db. Data
 visualised via Graphite’s web interface.

                                                                         58
StatsD Metrics
<?php                                         ; statsd.ini
$statsTypePrefix = 'workerX.received.type.'; [statsd]
                                              host = yourhost
$statsTimeKey = 'workerX.processing_time';
                                              port = 8125
while (true) {
  $batch = $worker->getBatchOfWork();
  foreach ($batch as $item) {
    // time how long it takes to process this item...
    $time_start = microtime(true);
    // ... process item here ...
    $time = (int)(1000 * (microtime(true) - $time_start));
    StatsD::timing($statsTimeKey, $time); // time in ms

    // count items by type
    StatsD::increment($statsTypePrefix . $item['type']);
}

                      https://github.com/etsy/statsd/           59
StatsD Metrics
<?php                                         ; statsd.ini
$statsTypePrefix = 'workerX.received.type.'; [statsd]
                                              host = yourhost
$statsTimeKey = 'workerX.processing_time';
                                              port = 8125
while (true) {
  $batch = $worker->getBatchOfWork();
  foreach ($batch as $item) {
    // time how long it takes to process this item...
    $time_start = microtime(true);
    // ... process item here ...
    $time = (int)(1000 * (microtime(true) - $time_start));
    StatsD::timing($statsTimeKey, $time); // time in ms

    // count items by type
    StatsD::increment($statsTypePrefix . $item['type']);
}

                      https://github.com/etsy/statsd/           59
Graphite Output




  workerX.processing_time.mean       workerX.processing_time.upper_90




                           http://graphite.wikidot.com/                 60
Graphite Output


                                                      monitor average,
                                                         percentiles,
                                                     standard deviation




  workerX.processing_time.mean       workerX.processing_time.upper_90




                           http://graphite.wikidot.com/                   60
Look! Rib cages! Network Load Viz




       http://www.network-weathermap.com/   http://cacti.net   61
Look! Rib cages! Network Load Viz

                                                         Not enough!

                                             Contextualise metrics




       http://www.network-weathermap.com/   http://cacti.net           61
Cacti + WeatherMap



                       Example
 Cacti: Network graphing solution harnessing the power of RRDTool’s
 data storage and graphing functionality. Provides a fast poller, graph
 templating, multiple data acquisition methods.

 Weathermap: Cacti plugin to integrate network maps into the
 Cacti web UI. Includes a web-based map editor.

                                                                          62
Network Load Visualisation



                                                                                  345/s
                                           8432/s                                 225/s
                                                                                  296/s
                                                                                  335/s
                                                                    7312/s        311/s
                                                                                  289/s
                                                                        145/s

          4410/s                           5320/s




   80/s                                                                  1331/s

                      5320/s


                                                               5320/s

     13/s


                                           2954/s       44/s
                               3296/s                                                     4322/s
   219/s


                                  2954/s       5320/s            832/s

                                                                                          5320/s




          Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite   63
Network Load Visualisation



                                                                                  345/s
                                           8432/s                                 225/s
                                                                                  296/s
                                                                                  335/s
                                                                    7312/s        311/s
                                                                                  289/s
                                                                        145/s

                                           5320/s

                                                                   augmentation
          4410/s




   80/s                                                               service
                                                                         1331/s



                                                                    timing out?
                      5320/s


                                                               5320/s

     13/s


                                           2954/s       44/s
                               3296/s                                                     4322/s
   219/s


                                  2954/s       5320/s            832/s

                                                                                          5320/s




          Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite   63
Network Load Visualisation



                                                filtering server
                                           8432/s
                                                                                  345/s


                                                    slightly
                                                                                  225/s
                                                                                  296/s
                                                                                  335/s

                                                 overloaded?        7312/s        311/s
                                                                                  289/s
                                                                        145/s

          4410/s                           5320/s




   80/s                                                                  1331/s

                      5320/s


                                                               5320/s

     13/s


                                           2954/s       44/s
                               3296/s                                                     4322/s
   219/s


                                  2954/s       5320/s            832/s

                                                                                          5320/s




          Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite   63
Network Load Visualisation



                                                                                  345/s
                                           8432/s                                 225/s
                                                                                  296/s
                                                                                  335/s
                                                                    7312/s        311/s
                                                                                  289/s
                                                                        145/s

          4410/s                           5320/s




                                                                         1331/s

                                                                                                    consumer
   80/s
                      5320/s


                                                               5320/s                              slower than
     13/s
                                                                                                    producer?
                                           2954/s       44/s
                               3296/s                                                     4322/s
   219/s


                                  2954/s       5320/s            832/s

                                                                                          5320/s




          Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite              63
Monitoring Reporting Guidelines

Make the subtle obvious
Make the complex/busy simple/clean
Group information by context
Detect anomalies/deviation from norm
Turn raw numbers into graphs
Appeal to intuition
                                       64
We’re Hiring!




http://datasift.com/whoweare/jobs
                                65
References
  http://www.slideshare.net/quipo/the-art-of-scalability-managing-growth
  https://bitly.com/vCSd49 (DataSift architecture on HighScalability)
  http://www.slideshare.net/combell/varnish-in-action-phpday2011
  https://vimeo.com/couchmode/chariottechcast/videos/sort:date/40988625
  http://blog.stuartherbert.com/php/2011/09/21/real-time-graphing-with-
  graphite/
  http://zguide.zeromq.org/page:all




Image credits:
http://accidental-entrepreneur.com/wp-content/uploads/2011/04/fire-hose.jpg
http://www.alibaba.com/product-free/103854677/Q_FIRE_FIRE_HOSE.html

                                                                           66
Lorenzo Alberton
                  @lorenzoalberton




   Thank you!
       lorenzo@alberton.info
http://www.alberton.info/talks




           https://joind.in/6372
                                     67

More Related Content

What's hot

mnesia脑裂问题综述
mnesia脑裂问题综述mnesia脑裂问题综述
mnesia脑裂问题综述Feng Yu
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningPuneet Behl
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networkingSim Janghoon
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Develop QNAP NAS App by Docker
Develop QNAP NAS App by DockerDevelop QNAP NAS App by Docker
Develop QNAP NAS App by DockerTerry Chen
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchElasticsearch
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSGrowing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSDatabricks
 
TCAMのしくみ
TCAMのしくみTCAMのしくみ
TCAMのしくみogatay
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWSMatthew (정재화)
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with ContainersEDB
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 

What's hot (20)

mnesia脑裂问题综述
mnesia脑裂问题综述mnesia脑裂问题综述
mnesia脑裂问题综述
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Develop QNAP NAS App by Docker
Develop QNAP NAS App by DockerDevelop QNAP NAS App by Docker
Develop QNAP NAS App by Docker
 
Geospatial Advancements in Elasticsearch
Geospatial Advancements in ElasticsearchGeospatial Advancements in Elasticsearch
Geospatial Advancements in Elasticsearch
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Growing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RSGrowing the Delta Ecosystem to Rust and Python with Delta-RS
Growing the Delta Ecosystem to Rust and Python with Delta-RS
 
TCAMのしくみ
TCAMのしくみTCAMのしくみ
TCAMのしくみ
 
DNS再入門
DNS再入門DNS再入門
DNS再入門
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with Containers
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 

Viewers also liked

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designLorenzo Alberton
 
The Art of Scalability - Managing growth
The Art of Scalability - Managing growthThe Art of Scalability - Managing growth
The Art of Scalability - Managing growthLorenzo Alberton
 
Scaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesScaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesLorenzo Alberton
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeLorenzo Alberton
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresLorenzo Alberton
 

Viewers also liked (7)

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Monitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard designMonitoring at scale - Intuitive dashboard design
Monitoring at scale - Intuitive dashboard design
 
The Art of Scalability - Managing growth
The Art of Scalability - Managing growthThe Art of Scalability - Managing growth
The Art of Scalability - Managing growth
 
Scaling Teams, Processes and Architectures
Scaling Teams, Processes and ArchitecturesScaling Teams, Processes and Architectures
Scaling Teams, Processes and Architectures
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks Age
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Trees In The Database - Advanced data structures
Trees In The Database - Advanced data structuresTrees In The Database - Advanced data structures
Trees In The Database - Advanced data structures
 

Similar to Scalable Architectures - Taming the Twitter Firehose

Developing polyglot applications on Cloud Foundry (#oredev 2012)
Developing polyglot applications on Cloud Foundry (#oredev 2012)Developing polyglot applications on Cloud Foundry (#oredev 2012)
Developing polyglot applications on Cloud Foundry (#oredev 2012)Chris Richardson
 
PHP Day 2011 PHP goes to the cloud
PHP Day 2011 PHP goes to the cloudPHP Day 2011 PHP goes to the cloud
PHP Day 2011 PHP goes to the cloudpietrobr
 
Service mesh in action with onap
Service mesh in action with onapService mesh in action with onap
Service mesh in action with onapHuabing Zhao
 
AWS Dev Lounge: Taking Control of Your Microservices with AWS App Mesh
AWS Dev Lounge: Taking Control of Your Microservices with AWS App MeshAWS Dev Lounge: Taking Control of Your Microservices with AWS App Mesh
AWS Dev Lounge: Taking Control of Your Microservices with AWS App MeshAmazon Web Services
 
OpenNebula Interoperability
OpenNebula InteroperabilityOpenNebula Interoperability
OpenNebula Interoperabilitydmamolina
 
Building a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istioBuilding a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istioSAMIR BEHARA
 
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2Damir Bersinic
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategydrmarcustillett
 
The Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET frameworkThe Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET frameworkMassimo Bonanni
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingJoe Huang
 
Alcatellucentsdn2013
Alcatellucentsdn2013Alcatellucentsdn2013
Alcatellucentsdn2013deepersnet
 
ONOS-Based VIM Implementation
ONOS-Based VIM ImplementationONOS-Based VIM Implementation
ONOS-Based VIM ImplementationOPNFV
 
NFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrail
NFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrailNFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrail
NFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrailozkan01
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practiceOpenCity Community
 
Api service mesh and microservice tooling
Api service mesh and microservice toolingApi service mesh and microservice tooling
Api service mesh and microservice toolingLuca Mattia Ferrari
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAIMC Institute
 
CCA09 Cloud Computing Standards and OCCI
CCA09 Cloud Computing Standards and OCCICCA09 Cloud Computing Standards and OCCI
CCA09 Cloud Computing Standards and OCCIbefreax
 
Software Architecture Definition for On-demand Cloud Provisioning
Software Architecture Definition for On-demand Cloud ProvisioningSoftware Architecture Definition for On-demand Cloud Provisioning
Software Architecture Definition for On-demand Cloud ProvisioningClovis Chapman
 

Similar to Scalable Architectures - Taming the Twitter Firehose (20)

Developing polyglot applications on Cloud Foundry (#oredev 2012)
Developing polyglot applications on Cloud Foundry (#oredev 2012)Developing polyglot applications on Cloud Foundry (#oredev 2012)
Developing polyglot applications on Cloud Foundry (#oredev 2012)
 
PHP Day 2011 PHP goes to the cloud
PHP Day 2011 PHP goes to the cloudPHP Day 2011 PHP goes to the cloud
PHP Day 2011 PHP goes to the cloud
 
Service mesh in action with onap
Service mesh in action with onapService mesh in action with onap
Service mesh in action with onap
 
AWS Dev Lounge: Taking Control of Your Microservices with AWS App Mesh
AWS Dev Lounge: Taking Control of Your Microservices with AWS App MeshAWS Dev Lounge: Taking Control of Your Microservices with AWS App Mesh
AWS Dev Lounge: Taking Control of Your Microservices with AWS App Mesh
 
OpenNebula Interoperability
OpenNebula InteroperabilityOpenNebula Interoperability
OpenNebula Interoperability
 
Building a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istioBuilding a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istio
 
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
Prairie DevCon-What's New in Hyper-V in Windows Server "8" Beta - Part 2
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
 
The Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET frameworkThe Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET framework
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
 
Alcatellucentsdn2013
Alcatellucentsdn2013Alcatellucentsdn2013
Alcatellucentsdn2013
 
ONOS-Based VIM Implementation
ONOS-Based VIM ImplementationONOS-Based VIM Implementation
ONOS-Based VIM Implementation
 
NFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrail
NFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrailNFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrail
NFV SDN Summit March 2014 D3 03 bruno_rijsman NFV with OpenContrail
 
Was ist neu in Exchange 2013?
Was ist neu in Exchange 2013?Was ist neu in Exchange 2013?
Was ist neu in Exchange 2013?
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practice
 
Microservices with Spring
Microservices with SpringMicroservices with Spring
Microservices with Spring
 
Api service mesh and microservice tooling
Api service mesh and microservice toolingApi service mesh and microservice tooling
Api service mesh and microservice tooling
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
 
CCA09 Cloud Computing Standards and OCCI
CCA09 Cloud Computing Standards and OCCICCA09 Cloud Computing Standards and OCCI
CCA09 Cloud Computing Standards and OCCI
 
Software Architecture Definition for On-demand Cloud Provisioning
Software Architecture Definition for On-demand Cloud ProvisioningSoftware Architecture Definition for On-demand Cloud Provisioning
Software Architecture Definition for On-demand Cloud Provisioning
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Recently uploaded (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

Scalable Architectures - Taming the Twitter Firehose

  • 1. Lorenzo Alberton @lorenzoalberton Scalable Architectures: Taming the Twitter Firehose Patterns for scalable real-time platforms PHPDay 2012 Verona, 18th May 2012 1
  • 2. Outline 1) SOAs scaling the platform 2
  • 3. Outline 1) SOAs scaling the platform 2) Message Queues scaling the communication 2
  • 4. Outline 1) SOAs scaling the platform 2) Message Queues scaling the communication 3) Monitoring scaling the maintainability 2
  • 5. DataSift Architecture High-level overview 3
  • 6. DataSift Architecture http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 4
  • 7. 1/4) Ingestion of Input Streams http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 5
  • 8. 2/4) Filtering http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 6
  • 9. 3/4) Delivery / Frontend http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 7
  • 10. 4/4) Monitoring / Historics / Analytics http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 8
  • 11. DataSift 350+ Million input messages/day * ~330 Million from the Twitter Firehose alone 9
  • 12. DataSift 2 Terabyte messages processed in real time and stored every day ~1 Petabyte of storage available 10
  • 13. DataSift Thousands of concurrent, custom output streams all crafted with tender love and surgical precision 11
  • 15. Service-Oriented Architectures Service Service Service A B C Loose Coupling - Separation of Responsibilities http://en.wikipedia.org/wiki/Service-oriented_architecture 13
  • 16. Service-Oriented Architectures Consumer Service Service Service A B C Separate ConsumersSeparation of Responsibilities Loose Coupling - From Service Implementation http://en.wikipedia.org/wiki/Service-oriented_architecture 13
  • 17. Service-Oriented Architectures Consumer Consumer Proxy Cache Service Service Service A B C Separate ConsumersSeparation of Responsibilities Loose Couplingcaching atService Implementation Aggressive - From application level http://en.wikipedia.org/wiki/Service-oriented_architecture 13
  • 18. Service-Oriented Architectures Orchestrator Service Service Service A B C Orchestration of distinctFrom ServiceResponsibilities Separate ConsumersSeparation of Implementation Loose Couplingcaching at accessible over a network Aggressive - units application level http://en.wikipedia.org/wiki/Service-oriented_architecture 13
  • 19. Service-Oriented Architectures Orchestrator JSON Thrift XML Service Service Service A B C Communication distinctFrom Service Implementation Separate ConsumersSeparation interoperablenetwork Orchestration of via a -well-definedof Responsibilities Loose Couplingcaching at accessible over a format Aggressive units application level http://en.wikipedia.org/wiki/Service-oriented_architecture 13
  • 20. Independent Horizontal Scaling Service A Orchestrator Service B 14
  • 21. Independent Horizontal Scaling Service A Orchestrator Service B 14
  • 22. Independent Horizontal Scaling Service A Orchestrator Load Balancer Service B1 Load balancing - Service Service Multiple nodes B B2 14
  • 23. Independent Horizontal Scaling Rev.Proxy Better single-node Service performances with A application-level caching Orchestrator Load Balancer Service B1 Load balancing - Service Service Multiple nodes B B2 14
  • 24. Cell Architectures Ensure that everything +1 you develop has at least one additional instance N + 1 design of that system in the event of failure. Have multiple live, isolated nodes of the multiple same type to distribute live nodes the load. http://highscalability.com/blog/2012/5/9/cell-architectures.html 15
  • 25. Cardinality of Nodes on Each Service 3 2 2 5 2 2 2 8 8 5 7 60+ 7 7 7 7 7 http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 16
  • 26. Load-Balancing Example 17
  • 27. Load-Balancing with HAProxy # /etc/haproxy.cfg global frontend http-in    daemon    bind *:80    maxconn 256    default_backend mysvc     defaults backend mysvc    mode http    server s1 10.0.1.10:7474 maxconn 32    timeout connect 5000ms    server s2 10.0.1.11:7474 maxconn 32    timeout client 50000ms      timeout server 50000ms listen admin      bind *:8080    stats enable Start by running /usr/sbin/haproxy -f /etc/haproxy.cfg http://haproxy.1wt.eu/ 18
  • 28. Load-Balancing with Varnish backend node01 { backend node02 { .host = "svc01.myhost.com"; .host = "svc02.myhost.com"; .probe = { .probe = { .url = "/"; .url = "/"; .interval = 1s; .interval = 1s; .timeout = 50 ms; .timeout = 50 ms; .window = 2; .window = 2; .threshold = 2; .threshold = 2; } } } } director mysvcdir round-robin { {.backend = node01;} mysvc {.backend = node02;} Request 50% node 01 Varnish } sub vcl_recv { set req.backend = mysvcdir; 50% mysvc return(pass); round-robin node 02 } http://varnish-cache.org 19
  • 29. Caching Example 20
  • 30. Caching with Varnish No special directives required to cache normal requests. Just use the defaults, and set Cache-Control headers. <?php $ttl = 300; //cache for 5 minutes $ts = new DateTime('@' . (time() + $ttl)); header("Expires: " . $ts->format(DateTime::RFC1123)); header("Cache-Control: max-age=$ttl, must-revalidate"); ?> Warning: by default, pages with cookies are not cached 21
  • 31. Application Programming Interfaces APIs Software-to-Software Contract 22
  • 32. API Docs Guidelines Simple (RESTful verbs, actions on resources) Well defined (action, endpoint, parameters, response) Discoverable (self-describing endpoint) Working documentation 23
  • 33. APIs everywhere: Internal & External http://mashery.com/solution/iodocs http://console.datasift.com/ 24
  • 34. Service API discovery GET /<servicename>/api 25
  • 35. Service Host Discovery - Config Mgr GET /configuration/<servicename>/hosts HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 { “service”: “<servicename>”, “hosts”:[ “10.0.1.33:80”, “10.0.1.34:80” ], “base_path”: “/svc/xyz/” } 26
  • 36. Service Host Discovery - Zookeeper ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. http://zookeeper.apache.org/ <?php $zk = new Zookeeper(); $zk->connect('localhost:2181'); //server $params = array(array( 'perms' => Zookeeper::PERM_ALL, 'scheme' => 'world', 'id' => 'anyone' )); if (!$zk->exists('/services/mysvc/host') { $zk->create('/services', 'config for internal services', $params); $zk->create('/services/mysvc', 'config for mysvc', $params); $zk->create('/services/mysvc/host', 'http://my.site.com', $params); } 27
  • 37. Service Host Discovery - Zookeeper ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. http://zookeeper.apache.org/ <?php $zk = new Zookeeper(); $zk->connect('localhost:2181'); //server $params = array(array( 'perms' => Zookeeper::PERM_ALL, 'scheme' => 'world', 'id' => 'anyone' )); if (!$zk->exists('/services/mysvc/host') { $zk->create('/services', 'config for internal services', $params); $zk->create('/services/mysvc', 'config for mysvc', $params); $zk->create('/services/mysvc/host', 'http://my.site.com', $params); } 27
  • 38. Service Host Discovery - Zookeeper ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. http://zookeeper.apache.org/ <?php $zk = new Zookeeper(); $zk->connect('localhost:2181'); //client $host = $zk->get('/services/mysvc/host'); ... 28
  • 39. SOA - Scale Each Component http://www.thisnext.com/item/647CD0BE/Matryoshkas-Nesting-Dolls 29
  • 40. SOA - Scale Each Component http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 30
  • 41. SOA - Scale Each Component SOA: Independently scalable services. Example on distributing processing load: http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 30
  • 42. Workers for sharing processing load 31
  • 43. Workers for sharing processing load Distribute processing load among workers. Lightweight orchestration, heavy lifting in separate, asynchronous processes 31
  • 44. Scale all things! http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 32
  • 45. Scale all things! Example on scaling large data volumes: http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 32
  • 46. In Case of “Big Data”... 33
  • 47. In Case of “Big Data”... With lots of data, move the processing logic itself to the storage nodes (I/O is expensive) Map/Reduce, Parallel Processing 33
  • 48. Message Queues Asynchronous Communication 34
  • 49. Messaging ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast) Internal communication: pass messages to the next processing stage in the pipeline, control events, monitoring. Very high throughput. Socket library. Kafka/Redis: PUSH-PULL with persistence* Internal message / workload buffering and distribution Node.js: WebSockets / HTTP Streaming Message delivery (output) 35
  • 50. Message queues as Buffers (Decoupling) P C Unpredictable load spikes 36
  • 51. Message queues as Buffers (Decoupling) P C Unpredictable load spikes P C Load normalisation / smoothing 36
  • 52. Message queues as Buffers (Decoupling) P C Unpredictable load spikes P C Load normalisation / smoothing Batching ⇒ higher throughput 36
  • 53. Redis Buffer Example 37
  • 54. Redis Processing Queue <?php //producer(s) $redis = new Redis(); $redis->connect('127.0.0.1', 6379, 1.5); // timeout 1.5 seconds ... // push items to the queue as they are produced $redis->lPush('queue:xyz', $item); ... <?php ... //consumer(s) while (true) { // read items off the queue as they are available // block for up to 2 seconds (timeout) $item = redis->brPop('queue:xyz', 2); ... } https://github.com/nicolasff/phpredis https://github.com/chrisboulton/php-resque 38
  • 55. Kafka Buffer Example 39
  • 56. Kafka Processing Queue Producer <?php $host = '127.0.0.1'; $port = 9092; $producer = new Kafka_Producer($host, $port); $messages = array( 'aaa', 'bbb', 'ccc', ); $topic = 'test'; // send a batch of messages (MessageSet) $bytes_sent = $producer->send($messages, $topic); https://github.com/apache/kafka/tree/trunk/clients/php/src/examples 40
  • 57. Kafka Processing Queue Consumer <?php $timeout = 2; $maxSize = 1000000; $host = '127.0.0.1'; $port = 9092; $partition = 0; $offset = 0; $topic = 'test'; $consumer = new Kafka_SimpleConsumer($host, $port, $timeout, $maxSize); while (true) { $request = new Kafka_FetchRequest($topic, $partition, $offset, $maxSize); $messages = $consumer->fetch($request); foreach ($messages as $msg) { echo $msg->payload(); } $offset += $messages->validBytes(); } 41
  • 58. 0mq PUSH-PULL (Workload Distribution) Consumer 1 p ull push pull Producer Consumer 2 (blocking operation, pu ll until delivered to one worker) Consumer 3 42
  • 59. Workload Distribution Example 43
  • 60. ZeroMQ Producer (PUSH) <?php $context = new ZMQContext(); $producer = $context->getSocket(ZMQ::SOCKET_PUSH); $producer->bind('tcp://*:5555'); // send tasks to workers. foreach ($tasks as $task) { // Blocking operation until the message // is received by one (and only one) worker $producer->send($task); } ... http://zguide.zeromq.org/php:all 44
  • 61. ZeroMQ Consumers (PULL) <?php $context = new ZMQContext(); $worker = $context->getSocket(ZMQ::SOCKET_PULL); $worker->connect('tcp://myhost:5555'); // process tasks forever while (true) { // receive a message (blocking operation) $task = $worker->recv(); ... } 45
  • 62. 0mq PUSH-PULL (Mux) Producer 1 pus hR 1, R 2, R 3 push R4 Producer 2 pull Consumer 5 , R6 fair-queuing: ushR p R1, R4, R5, R2, R6, R3 Producer 3 46
  • 63. 0mq PUB-SUB (High Availability) Listener 1 Publisher 1 Listener 2 Publisher 2 Listener 3 [Broadcast] [Dynamic Subscriptions] 47
  • 64. 0mq PUB-SUB (High Availability) DC 1 Publisher 1 Publisher 2 DC 2 48
  • 65. High Availability - Replication Example 49
  • 66. ZeroMQ Producer (PUB) <?php $context = new ZMQContext(); $producer = $context->getSocket(ZMQ::SOCKET_PUB); $producer->bind('tcp://*:5555'); $messages = array( // topic => msg array('painters' => 'Michelangelo'), array('painters' => 'Raffaello'), array('sculptors' => 'Donatello'), ); // send messages to listeners. foreach ($messages as $msg) { // Non-blocking operation. No ACK. // Message sent to ALL subscribers $producer->sendMulti($msg); } 50
  • 67. ZeroMQ Consumer (SUB) <?php $context = new ZMQContext(); $producer = $context->getSocket(ZMQ::SOCKET_SUB); $producer->connect('tcp://myhost:5555'); $topic = 'painters'; // ignore sculptors $producer->setSockOption( ZMQ::SOCKOPT_SUBSCRIBE, $topic ); // Listen to messages with given topic while (true) { list($t, $m) = $producer->recvMulti(); // $t is the topic (‘painters.*’) } 51
  • 68. Interesting Ideas Some Architecture Ideas 52
  • 69. Internal “Firehose” Publishers Subscribers Alice’s John’s Y Z timeline Inbox X subscribe to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 53
  • 70. Internal “Firehose” Publishers Data Feeds, Subscribers User-generated Alice’s John’s content, timeline Inbox X Y Z System events, ... subscribe to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 53
  • 71. Internal “Firehose” Publishers Applications, Subscribers Services, Monitors, Alice’s John’s Y Z Routers, timeline Inbox X Repeaters, subscribe ... to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 53
  • 72. Internal “Firehose” Publishers Subscribers Alice’s John’s Y Z timeline Inbox X subscribe to topic X Data Bus subscribe to topic Y Everyone connected to System Fred’s Tech the data bus, no Monitor Followers Blog Feed directed graph 53
  • 73. Internal “Firehose” Publishers Subscribers Alice’s John’s Y Z timeline Inbox X subscribe to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 53
  • 74. Monitoring Measure Anything, Measure Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/ 54
  • 76. Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 55
  • 77. Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 55
  • 78. Monitoring: Measure Everything StatsD 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 55
  • 79. Instrumentation https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd 56
  • 82. StatsD + Graphite Example StatsD: Node.JS daemon. Listens for messages over a UDP port and extracts metrics, which are dumped to Graphite for further processing and visualisation. Graphite: Real-time graphing system. Data is sent to carbon (processing back-end) which stores data into Graphite’s db. Data visualised via Graphite’s web interface. 58
  • 83. StatsD Metrics <?php ; statsd.ini $statsTypePrefix = 'workerX.received.type.'; [statsd] host = yourhost $statsTimeKey = 'workerX.processing_time'; port = 8125 while (true) { $batch = $worker->getBatchOfWork(); foreach ($batch as $item) { // time how long it takes to process this item... $time_start = microtime(true); // ... process item here ... $time = (int)(1000 * (microtime(true) - $time_start)); StatsD::timing($statsTimeKey, $time); // time in ms // count items by type StatsD::increment($statsTypePrefix . $item['type']); } https://github.com/etsy/statsd/ 59
  • 84. StatsD Metrics <?php ; statsd.ini $statsTypePrefix = 'workerX.received.type.'; [statsd] host = yourhost $statsTimeKey = 'workerX.processing_time'; port = 8125 while (true) { $batch = $worker->getBatchOfWork(); foreach ($batch as $item) { // time how long it takes to process this item... $time_start = microtime(true); // ... process item here ... $time = (int)(1000 * (microtime(true) - $time_start)); StatsD::timing($statsTimeKey, $time); // time in ms // count items by type StatsD::increment($statsTypePrefix . $item['type']); } https://github.com/etsy/statsd/ 59
  • 85. Graphite Output workerX.processing_time.mean workerX.processing_time.upper_90 http://graphite.wikidot.com/ 60
  • 86. Graphite Output monitor average, percentiles, standard deviation workerX.processing_time.mean workerX.processing_time.upper_90 http://graphite.wikidot.com/ 60
  • 87. Look! Rib cages! Network Load Viz http://www.network-weathermap.com/ http://cacti.net 61
  • 88. Look! Rib cages! Network Load Viz Not enough! Contextualise metrics http://www.network-weathermap.com/ http://cacti.net 61
  • 89. Cacti + WeatherMap Example Cacti: Network graphing solution harnessing the power of RRDTool’s data storage and graphing functionality. Provides a fast poller, graph templating, multiple data acquisition methods. Weathermap: Cacti plugin to integrate network maps into the Cacti web UI. Includes a web-based map editor. 62
  • 90. Network Load Visualisation 345/s 8432/s 225/s 296/s 335/s 7312/s 311/s 289/s 145/s 4410/s 5320/s 80/s 1331/s 5320/s 5320/s 13/s 2954/s 44/s 3296/s 4322/s 219/s 2954/s 5320/s 832/s 5320/s Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite 63
  • 91. Network Load Visualisation 345/s 8432/s 225/s 296/s 335/s 7312/s 311/s 289/s 145/s 5320/s augmentation 4410/s 80/s service 1331/s timing out? 5320/s 5320/s 13/s 2954/s 44/s 3296/s 4322/s 219/s 2954/s 5320/s 832/s 5320/s Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite 63
  • 92. Network Load Visualisation filtering server 8432/s 345/s slightly 225/s 296/s 335/s overloaded? 7312/s 311/s 289/s 145/s 4410/s 5320/s 80/s 1331/s 5320/s 5320/s 13/s 2954/s 44/s 3296/s 4322/s 219/s 2954/s 5320/s 832/s 5320/s Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite 63
  • 93. Network Load Visualisation 345/s 8432/s 225/s 296/s 335/s 7312/s 311/s 289/s 145/s 4410/s 5320/s 1331/s consumer 80/s 5320/s 5320/s slower than 13/s producer? 2954/s 44/s 3296/s 4322/s 219/s 2954/s 5320/s 832/s 5320/s Graphite datasource for Weathermap: https://github.com/alexforrow/php-weathermap-graphite 63
  • 94. Monitoring Reporting Guidelines Make the subtle obvious Make the complex/busy simple/clean Group information by context Detect anomalies/deviation from norm Turn raw numbers into graphs Appeal to intuition 64
  • 96. References http://www.slideshare.net/quipo/the-art-of-scalability-managing-growth https://bitly.com/vCSd49 (DataSift architecture on HighScalability) http://www.slideshare.net/combell/varnish-in-action-phpday2011 https://vimeo.com/couchmode/chariottechcast/videos/sort:date/40988625 http://blog.stuartherbert.com/php/2011/09/21/real-time-graphing-with- graphite/ http://zguide.zeromq.org/page:all Image credits: http://accidental-entrepreneur.com/wp-content/uploads/2011/04/fire-hose.jpg http://www.alibaba.com/product-free/103854677/Q_FIRE_FIRE_HOSE.html 66
  • 97. Lorenzo Alberton @lorenzoalberton Thank you! lorenzo@alberton.info http://www.alberton.info/talks https://joind.in/6372 67

Editor's Notes

  1. I&amp;#x2019;m Lorenzo, I&amp;#x2019;m Italian but live in the UK. \nI&amp;#x2019;ve been working on several large scale websites like the BBC, Channel 5, Ladbrokes, iPlayer.\nI spent the past two years as Chief Architect at DataSift, a hot big-data startup. \n
  2. \n
  3. \n
  4. I&amp;#x2019;m going to introduce DataSift to explain what we do and how we do it.\nDon&amp;#x2019;t worry, this is not a sales pitch, I&amp;#x2019;m just using DataSift as an example of how to build a scalable architecture based on lessons learnt in the past.\n
  5. Some architecture porn.\n\n
  6. Sources are Twitter, Facebook, YouTube, Flickr, Boards, Forums, etc.\nNews agencies: Thomson Reuters, Associated Press, Al-Jazeera, NYT, Chicago Tribune, etc.\nData Normalisation + Augmentation. Make data rich and structured.\nLanguage detection, demographics (gender detection), trends analysis, sentiment analysis, influence ranking, topic analysis, entities.\n
  7. 2nd stage: the core filtering engine. A scalable, highly parallel, custom-built C++ Virtual Machine.\nCan process thousands of incoming messages per second, and thousands of custom filters.\n
  8. Web site, public API, Output streams (HTTP Streaming, WebSockets), Buffered streams (batches of messages), and finally...\n
  9. ...storage. We record everything in our Hadoop cluster (historical access, analytics).\nWe also have watchdogs to keep track of usage limits, licenses, etc.\n
  10. I&amp;#x2019;m going to give you some numbers to give you a sense of the scale we&amp;#x2019;re operating at.\nBetween 3 and 9K/sec depending on the time of the day.\n\n
  11. \n
  12. \n
  13. Now, everyone here heard about service-oriented architectures, but I&amp;#x2019;m going to share some of the lessons I learnt in the past on how to scale a platform, that helped me designing and scaling DataSift and other large enterprise sites before it.\n
  14. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  15. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  16. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  17. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  18. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  19. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  20. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  21. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  22. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  23. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  24. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  25. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  26. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  27. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  28. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  29. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  30. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  31. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  32. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  33. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  34. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  35. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  36. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  37. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  38. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  39. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  40. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  41. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  42. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  43. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  44. The first characteristic of a SOA is having several loosely-coupled services.\nSeparate consumers from service implementation\nOrchestration of distinct units accessible over a network\nCommunication with data in a well-defined interoperable format\n
  45. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  46. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  47. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  48. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  49. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  50. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  51. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  52. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  53. Having decoupled services means you can scale each one horizontally. \nIf a service is under heavy load, on fire, you can have more node of the same to keep the service up, without having to duplicate the entire monolithic platform.\n
  54. Avoid failover (hot-swap) configuration. They don&amp;#x2019;t work well and usually involve downtime or data loss.\nCells provide a unit of parallelization that can be adjusted to any size as the user base grows.\nCell are added in an incremental fashion as more capacity is required.\nCells isolate failures. One cell failure does not impact other cells.\nCells provide isolation as the storage and application horsepower to process requests is independent of other cells.\nCells enable nice capabilities like the ability to test upgrades, implement rolling upgrades, and test different versions of software.\nCells can fail, be upgraded, and distributed across datacenters independent of other cells.\n\n
  55. As an example, this is the current cardinality of servers we have for each service.\nEach box in the diagram has between 2 and 60+ nodes.\n
  56. Let&amp;#x2019;s have a look at how to practically implement load-balancing and application caching.\n
  57. You can buy a hardware appliance (excellent, expensive), or use a software like HA-Proxy.\nSet the service nodes as backend servers.\nHA-Proxy will do health-checks, and reroute the traffic to the healthy nodes.\n
  58. Use a random director to have weights (send more load to a more powerful machine).\nThe random director uses a random number to seed the backend selection.\nThe client director picks a backend based on the clients identity. You can set the VCL variable client.identity to identify the client by picking up the value of a session cookie or similar.\nThe hash director will pick a backend based on the URL hash value (req.hash).\nThe fallback director will pick the first backend that is healthy. It considers them in the order in which they are listed in its definition.\n
  59. \n
  60. It works out of the box, just set Cache-Control headers.\nIt supports ETags to cache several versions of the same page for different customers.\nEdge-Side Includes. Thijs\n
  61. We&amp;#x2019;ve seen some characteristics of Service Oriented Architectures, what they are and why they are useful. \nThere&amp;#x2019;s another incredibly important defining characteristic of SOAs: the API, i.e. the contract between any two services. It&amp;#x2019;s a software-to-software interface, not a user interface.\n
  62. Keep it simple: RESTful verbs, actions on resources, simple data structures in exchange data format \nDefine the action, the endpoint, the parameters, the response\nReserve endpoint for description of the service&amp;#x2019;s API.\nUse the response to generate API docs.\nFeed to test console as configuration.\n
  63. I recommend a tool that really makes your API docs alive.\nMashery IO Docs: example of working documentation.\nDefine an API for all services (internal AND external)\nReserve an endpoint to describe the API for the service itself\nRESTful. Personal preference for plain-text format (XML or JSON)\n
  64. Reserve the root endpoint (or a /discovery or /self endpoint) to a description of the service&amp;#x2019;s API.\nBonus: if the response is in the Mashery IO Docs&amp;#x2019; format, you can have a web interface to document and test the API.\n
  65. Instead of hard-coding the configuration of all the services everywhere, expose the configuration via a separate service.\n\n
  66. ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.\nIt looks like a distributed file system, each node can have children and properties.\nEach service can register itself at startup and become available to receive requests.\n
  67. The consumer simply reads the properties of a node (file / path)\n\n
  68. As we saw, each component should be able to scale horizontally. \n\n
  69. There are two possible problems:\n- when processing itself is expensive\n- when there&amp;#x2019;s too much data\n
  70. There are two possible problems:\n- when processing itself is expensive\n- when there&amp;#x2019;s too much data\n
  71. Internally\nUse queues and workers to make processes asynchronous, distribute data to parallel workers. \nCurl-multi, low timeouts.\n\n\n
  72. Internally\nUse queues and workers to make processes asynchronous, distribute data to parallel workers. \nCurl-multi, low timeouts.\n\n\n
  73. \n
  74. \n
  75. don&amp;#x2019;t move the data to the processing nodes. I/O is very expensive.\n
  76. 2nd part of the talk: moving data around (communication across services).\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  77. At DataSift we use different message systems, depending on volume, destination, communication type.\n
  78. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  79. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  80. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  81. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  82. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  83. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  84. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  85. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  86. Source/sink, Producer/consumer\n- Asynchronous communication\n- Decoupling (buffers)\n- Load balancing\n- Distribution\n- High throughput\n- In memory, Persistent, Distributed\n
  87. \n
  88. http://www.justincarmony.com/blog/2012/01/10/php-workers-with-redis-solo/\nhttp://blog.meltingice.net/programming/creating-processing-queues-redis/\n\n
  89. \n
  90. http://www.justincarmony.com/blog/2012/01/10/php-workers-with-redis-solo/\nhttp://blog.meltingice.net/programming/creating-processing-queues-redis/\n\n
  91. \n
  92. We&amp;#x2019;ve seen simple buffering. Let&amp;#x2019;s now see a few more useful patterns.\nThe first example shows how to move from one processor to several nodes, to distribute the data and process it in parallel.\nPUSH-PULL is an efficient pattern for workload distribution \n
  93. \n
  94. \n
  95. Workload distribution with workers\n
  96. You can also invert producers and consumer and have a multiplexer to join messages coming from several nodes back into a single one.\n
  97. The second pattern shows how to distribute data in a non-exclusive way: each consumer gets a copy of the same data, the items are not removed from the queue when one consumer gets them. \nThe producer doesn&amp;#x2019;t need to know who&amp;#x2019;s listening, it doesn&amp;#x2019;t need to have a registry of addresses of connected consumers.\nMongrel2\n
  98. You can also broadcast to different datacenters.\nListeners can only subscribe to one or more topics. Different output channels.\nZeroMQ v3: filtering done on the publisher side\n
  99. broadcasting\n
  100. \n
  101. \n
  102. \n
  103. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
  104. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
  105. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
  106. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
  107. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
  108. An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\nThumbler: internal firehose. Each service subscribes to interesting events.\n
  109. Statistics are better than logs. At certain volumes, logs are just noise (and a waste of space), make your application dynamically configurable to turn logging on only when strictly necessary.&amp;#xA0; Statsd / Graphite.\nMonitor everything. Set alerts based on deviance from norm, not just on absolute thresholds.\n\n
  110. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  111. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  112. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  113. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  114. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  115. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  116. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  117. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  118. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  119. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  120. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  121. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  122. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  123. Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of &quot;How come we didn&apos;t catch that earlier?&quot; addresses the incident, not the problem. The alternative question &quot;What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?&quot; addresses the people and the processes that allowed the event you just had and every other event for which you didn&apos;t have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. &quot;How do we know when it&apos;s starting to behave poorly?&quot; First, you need to answer the question &quot;Is there a problem?&quot; with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it&apos;s usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it&apos;s performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It&apos;s advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
  124. We collect millions of events every second.\nThe importance of people: devops who know what to monitor, how, how to use and write tools, and have 100% dedication. Useful: mobile phone apps receiving alerts from Zenoss.\nWe use different technologies. It&amp;#x2019;s very easy to set up a new ZeroMQ listener.\nWe use StatsD (from Flickr / Etsy), Zenoss, Graphite\n
  125. Here&amp;#x2019;s a photo of our monitoring wall. We even have an emergency lighting with a siren, triggered by Zenoss alerts.\n
  126. Here&amp;#x2019;s a photo of our monitoring wall. We even have an emergency lighting with a siren, triggered by Zenoss alerts.\n
  127. http://www.apievangelist.com/2011/06/23/api-ecosystem-tracking-with-statsd-and-graphite/\nhttp://mat.github.com/statsd-railscamp2011-slides/\n\n
  128. With the Etsy library, you can sample the sending rate. UDP.We created a wrapper to buffer and aggregate stats in memory for a while and then to flush them at regular intervals, to save a LOT of bandwidth.\n
  129. With the Etsy library, you can sample the sending rate. UDP.We created a wrapper to buffer and aggregate stats in memory for a while and then to flush them at regular intervals, to save a LOT of bandwidth.\n
  130. With the Etsy library, you can sample the sending rate.We created a wrapper to buffer and aggregate stats in memory for a while and then to flush them at regular intervals, to save a LOT of bandwidth.\n
  131. Monitoring at application level, system level, infrastructure level. Heatmap of any link of the pipeline (physical and logical). Network rib-cages like this one are NOT ENOUGH! You want to contextualise the metrics you receive.\n + Cacti\n
  132. \n
  133. When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&amp;#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
  134. When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&amp;#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
  135. When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&amp;#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
  136. When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&amp;#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
  137. When you process real-time data in a complex pipeline made of several stages, you need a way of immediately telling IF there is a problem and WHERE it is. You don&amp;#x2019;t have time to debug, you need to SEE. \nMeasure throughput and latency.\n
  138. Information density is important, but don&amp;#x2019;t overdo it: keep the signal-to-noise high.\nUse colours. Cognitive process: let the visual cortex do the work. Normalise.\nIntuition is involuntary, fast, effortless, invisible.\nAttention is voluntary, slow, difficult, visible.\n
  139. \n
  140. happy to talk about any of them\n
  141. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  142. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  143. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  144. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  145. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  146. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  147. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  148. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  149. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  150. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  151. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  152. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  153. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  154. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  155. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  156. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  157. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  158. - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&amp;#x2019;s performing differently than it normally operates in addition to telling you when it&amp;#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
  159. Synchronous calls, if used excessively or incorrectly cause undue burden on the system and prevent it from scaling.\nSystems designed to interact synchronously have a higher failure rate than asynchronous ones. Their ability to scale is tied to the slowest system in the chain of communications. It&amp;#x2019;s better to use callbacks, and timeouts to recover gracefully should they not receive responses in a timely fashion.\nSynchronisation is when two or more pieces of work must be in a specific order to accomplish a task. Asynchronous coordination between the original method and the invoked method requires a mechanism that the original method determines when or if a called method has completed executing (callbacks). Ensure they have a chance to recover gracefully with timeouts should they not receive responses in a timely fashion.\nA related problem is stateful versus stateless applications. An application that uses state relies on the current condition of execution as a determinant of the next action to be performed. \nThere are 3 basic approaches to solving the complexities of scaling an application that uses session data: 1) Avoidance (using no sessions or sticky sessions) avoid replication: Share-nothing architecture; 2) Decentralisation (store session data in the browser&amp;#x2019;s cookie or in a db whose key is referenced by a hash in the cookie); 3) Centralisation (store cookies in the db / memcached).\n\n
  160. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&amp;#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&amp;Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  161. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&amp;#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&amp;Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  162. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&amp;#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&amp;Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  163. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&amp;#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&amp;Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  164. You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&amp;#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&amp;Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
  165. What is the best way to handle large volumes of traffic? Answer: &amp;#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&amp;#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&amp;#x2019;s origin server, and usually lower costs. The total capacity of the CDN&amp;#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&amp;#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
  166. What is the best way to handle large volumes of traffic? Answer: &amp;#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&amp;#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&amp;#x2019;s origin server, and usually lower costs. The total capacity of the CDN&amp;#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&amp;#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
  167. What is the best way to handle large volumes of traffic? Answer: &amp;#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&amp;#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&amp;#x2019;s origin server, and usually lower costs. The total capacity of the CDN&amp;#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&amp;#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
  168. \n
  169. \n
  170. \n
  171. \n
  172. shameless plug\n
  173. \n
  174. \n