SlideShare a Scribd company logo
1 of 86
Download to read offline
all the little pieces
                           distributed systems with PHP

                              Andrei Zmievski ⁜ Digg
                              OSCON 2009 ⁜ San Jose
                                   twitter: @a


Thursday, July 23, 2009
Who is this guy?

                   • Open Source Fellow @ Digg
                   • PHP Core Developer since 1999
                   • Architect of the Unicode/i18n support
                   • Release Manager for PHP 6
                   • Beer lover (and brewer)


Thursday, July 23, 2009
Why distributed?



                   • Because Moore’s Law will not save you
                   • Despite what DHH says




Thursday, July 23, 2009
Share nothing


                   • Your Mom was wrong
                   • No shared data on application servers
                   • Distribute it to shared systems



Thursday, July 23, 2009
distribute…


                   • memory (memcached)
                   • storage (mogilefs)
                   • work (gearman)



Thursday, July 23, 2009
Building blocks


                   • GLAMMP - have you heard of it?
                   • Gearman + LAMP + Memcached
                   • Throw in Mogile too



Thursday, July 23, 2009
memcached
Thursday, July 23, 2009
background

                   • created by Danga Interactive
                   • high-performance, distributed
                          memory object caching system

                   • sustains Digg, Facebook, LiveJournal,
                          Yahoo!, and many others

                   • if you aren’t using it, you are crazy


Thursday, July 23, 2009
background


                   • Very fast over the network and very
                          easy to set up
                   • Designed to be transient
                   • You still need a database



Thursday, July 23, 2009
architecture

                    client                  memcached




                    client                  memcached




                    client                  memcached




Thursday, July 23, 2009
architecture


                             memcached




Thursday, July 23, 2009
slab #4                 4104 bytes


                          slab #3        1368 bytes    1368 bytes


                          slab #2    456 bytes 456 bytes 456 bytes


                                     152   152   152   152   152
                          slab #1
                                    bytes bytes bytes bytes bytes




Thursday, July 23, 2009
memory architecture

                   • memory allocated on startup,
                          released on shutdown
                   • variable sized slabs (30+ by default)
                   • each object is stored in the slab most
                          fitting its size

                   • fragmentation can be problematic


Thursday, July 23, 2009
memory architecture


                   • items are deleted:
                          • on set
                          • on get, if it’s expired
                          • if slab is full, then use LRU



Thursday, July 23, 2009
applications

                   • object cache
                   • output cache
                   • action flood control / rate limiting
                   • simple queue
                   • and much more


Thursday, July 23, 2009
PHP clients


                   • a few private ones (Facebook, Yahoo!,
                          etc)
                   • pecl/memcache
                   • pecl/memcached



Thursday, July 23, 2009
pecl/memcached

                   • based on libmemcached
                   • released in January 2009
                   • surface API similarity to pecl/
                          memcache
                   • parity with other languages


Thursday, July 23, 2009
API
                   • get         • cas
                   • set         • *_by_key
                   • add         • getMulti
                   • replace     • setMulti
                   • delete      • getDelayed / fetch*
                   • append      • callbacks
                   • prepend


Thursday, July 23, 2009
consistent hashing
                                  IP1



                          IP2-1

                                              IP2-2




                                        IP3

Thursday, July 23, 2009
compare-and-swap (cas)


                   • “check and set”
                   • no update if object changed
                   • relies on CAS token



Thursday, July 23, 2009
compare-and-swap (cas)
      $m = new Memcached();
      $m->addServer('localhost', 11211);

      do {
                $ips = $m->get('ip_block', null, $cas);

                if ($m->getResultCode() == Memcached::RES_NOTFOUND) {

                          $ips = array($_SERVER['REMOTE_ADDR']);
                          $m->add('ip_block', $ips);

                } else {

                          $ips[] = $_SERVER['REMOTE_ADDR'];
                          $m->cas($cas, 'ip_block', $ips);
                }

      } while ($m->getResultCode() != Memcached::RES_SUCCESS);

Thursday, July 23, 2009
delayed “lazy” fetching


                   • issue request with getDelayed()
                   • do other work
                   • fetch results with fetch() or
                          fetchAll()




Thursday, July 23, 2009
binary protocol

                   • performance
                     • every request is parsed
                     • can happen thousands times a
                            second
                   • extensibility
                          • support more data in the protocol


Thursday, July 23, 2009
callbacks


                   • read-through cache callback
                   • if key is not found, invoke callback,
                          save value to memcache and return it




Thursday, July 23, 2009
callbacks


                   • result callback
                   • invoked by getDelayed() for every
                          found item

                   • should not call fetch() in this case



Thursday, July 23, 2009
buffered writes


                   • queue up write requests
                   • send when a threshold is exceeded or
                          a ‘get’ command is issued




Thursday, July 23, 2009
key prefixing


                   • optional prefix prepended to all the
                          keys automatically
                   • allows for namespacing, versioning,
                          etc.




Thursday, July 23, 2009
key locality



                   • allows mapping a set of keys to a
                          specific server




Thursday, July 23, 2009
multiple serializers


                   • PHP
                   • igbinary
                   • JSON (soon)



Thursday, July 23, 2009
future


                   • UDP support
                   • replication
                   • server management (ejection, status
                          callback)




Thursday, July 23, 2009
tips & tricks
                   • 32-bit systems with > 4GB memory:
                          memcached -m4096 -p11211
                          memcached -m4096 -p11212
                          memcached -m4096 -p11213




Thursday, July 23, 2009
tips & tricks


                   • write-through or write-back cache
                   • Warm up the cache on code push
                   • Version the keys (if necessary)



Thursday, July 23, 2009
tips & tricks

                   • Don’t think row-level DB-style
                          caching; think complex objects
                   • Don’t run memcached on your DB
                          server — your DBAs might send you
                          threatening notes

                   • Use multi-get — run things in parallel


Thursday, July 23, 2009
delete by namespace
      1         $ns_key = $memcache->get("foo_namespace_key");
      2         // if not set, initialize it
      3         if ($ns_key === false)
      4            $memcache->set("foo_namespace_key",
      5           rand(1, 10000));
      6         // cleverly use the ns_key
      7         $my_key = "foo_".$ns_key."_12345";
      8         $my_val = $memcache->get($my_key);
      9         // to clear the namespace:
      10        $memcache->increment("foo_namespace_key");




Thursday, July 23, 2009
storing lists of data

                   • Store items under indexed keys:
                          comment.12, comment.23, etc
                   • Then store the list of item IDs in
                          another key: comments
                   • To retrieve, fetch comments and then
                          multi-get the comment IDs



Thursday, July 23, 2009
preventing stampeding



                   • embedded probabilistic timeout
                   • gearman unique task trick




Thursday, July 23, 2009
optimization


                   • watch stats (eviction rate, fill, etc)
                          • getStats()
                          • telnet + “stats” commands
                          • peep (heap inspector)



Thursday, July 23, 2009
slabs

                   • Tune slab sizes to your needs:
                          • -f chunk size growth factor (default
                            1.25)

                          • -n minimum space allocated for key
                            +value+flags (default 48)



Thursday, July 23, 2009
slabs
         slab             class   1:   chunk   size   104   perslab 10082
         slab             class   2:   chunk   size   136   perslab 7710
         slab             class   3:   chunk   size   176   perslab 5957
         slab             class   4:   chunk   size   224   perslab 4681
         ...
         slab             class   38: chunk size 394840 perslab         2
         slab             class   39: chunk size 493552 perslab         2



                                       Default: 38 slabs



Thursday, July 23, 2009
slabs
                           Most objects: ∼1-2KB, some larger

         slab             class   1:   chunk   size   1048   perslab   1000
         slab             class   2:   chunk   size   1064   perslab    985
         slab             class   3:   chunk   size   1080   perslab    970
         slab             class   4:   chunk   size   1096   perslab    956
         ...
         slab             class 198: chunk size       9224 perslab      113
         slab             class 199: chunk size       9320 perslab      112

                            memcached -n 1000 -f 1.01


Thursday, July 23, 2009
memcached @ digg



Thursday, July 23, 2009
ops


                   • memcached on each app server (2GB)
                   • the process is niced to a lower level
                   • separate pool for sessions
                   • 2 servers keep track of cluster health



Thursday, July 23, 2009
key prefixes

                   • global key prefix for apc, memcached,
                          etc

                   • each pool has additional, versioned
                          prefix: .sess.2

                   • the key version is incremented on
                          each release

                   • global prefix can invalidate all caches

Thursday, July 23, 2009
cache chain

                   • multi-level caching: globals, APC,
                          memcached, etc.
                   • all cache access is through
                          Cache_Chain class
                   • various configurations:
                     • APC ➡ memcached
                     • $GLOBALS ➡ APC

Thursday, July 23, 2009
other


                   • large objects (> 1MB)
                   • split on the client side
                   • save the partial keys in a master one



Thursday, July 23, 2009
stats




Thursday, July 23, 2009
moxi - memcached proxy


                   • homogenization
                   • multi-get escalation
                   • protocol pipelining
                   • statistics, etc



Thursday, July 23, 2009
alternatives


                   • in-memory: Tokyo Tyrant, Scalaris
                   • persistent: Hypertable, Cassandra,
                          MemcacheDB

                   • document-oriented: CouchDB



Thursday, July 23, 2009
mogile
Thursday, July 23, 2009
background

                   • created by Danga Interactive
                   • application-level distributed
                          filesystem

                   • used at Digg, LiveJournal, etc
                   • a form of “cloud caching”
                   • scales very well

Thursday, July 23, 2009
background

                   • automatic file replication with custom
                          policies
                   • no single point of failure
                   • flat namespace
                   • local filesystem agnostic
                   • not meant for speed

Thursday, July 23, 2009
architecture
                                         node

                              tracker
                                         node


                              tracker
                 app                     node
                                DB


                                         node
                              tracker

                                         node

Thursday, July 23, 2009
applications


                   • images
                   • document storage
                   • backing store for certain caches



Thursday, July 23, 2009
PHP client



                   • File_Mogile in PEAR
                   • MediaWiki one (not maintained)




Thursday, July 23, 2009
Example

      $hosts = array('172.10.1.1', '172.10.1.2');
      $m = new File_Mogile($hosts, 'profiles');
      $m->storeFile('user1234', 'image',
                    '/tmp/image1234.jpg');

      ...

      $paths = $m->getPaths('user1234');




Thursday, July 23, 2009
mogile @ digg



Thursday, July 23, 2009
mogile @ digg


                   • Wrapper around File_Mogile to cache
                          entries in memcache
                   • fairly standard set-up
                   • trackers run on storage nodes



Thursday, July 23, 2009
mogile @ digg

                   • not huge (about 3.5 TB of data)
                   • files are replicated 3x
                   • the user profile images are cached on
                          Netscaler (1.5 GB cache)
                   • mogile cluster load is light


Thursday, July 23, 2009
gearman
Thursday, July 23, 2009
background


                   • created by Danga Interactive
                   • anagram of “manager”
                   • a system for distributing work
                   • a form of RPC mechanism



Thursday, July 23, 2009
background


                   • parallel, asynchronous, scales well
                   • fire and forget, decentralized
                   • avoid tying up Apache processes



Thursday, July 23, 2009
background

                   • dispatch function calls to machines
                          that are better suited to do work
                   • do work in parallel
                   • load balance lots of function calls
                   • invoke functions in other languages


Thursday, July 23, 2009
architecture

              client                     worker




              client         gearmand    worker




              client                     worker




Thursday, July 23, 2009
applications

                   • thumbnail generation
                   • asynchronous logging
                   • cache warm-up
                   • DB jobs, data migration
                   • sending email


Thursday, July 23, 2009
servers



                   • Gearman-Server (Perl)
                   • gearmand (C)




Thursday, July 23, 2009
clients

                   • Net_Gearman
                          • simplified, pretty stable
                   • pecl/gearman
                          • more powerful, complex, somewhat
                            unstable (under development)



Thursday, July 23, 2009
Concepts


                   • Job
                   • Worker
                   • Task
                   • Client



Thursday, July 23, 2009
Net_Gearman

                   • Net_Gearman_Job
                   • Net_Gearman_Worker
                   • Net_Gearman_Task
                   • Net_Gearman_Set
                   • Net_Gearman_Client


Thursday, July 23, 2009
Echo Job

      class Net_Gearman_Job_Echo extends Net_Gearman_Job_Common
      {
          public function run($arg)
          {
              var_export($arg);
              echo "n";
          }
      }                                              Echo.php




Thursday, July 23, 2009
Reverse Job
      class Net_Gearman_Job_Reverse extends Net_Gearman_Job_Common
      {
          public function run($arg)
          {
              $result = array();
              $n = count($arg);
              $i = 0;
              while ($value = array_pop($arg)) {
                  $result[] = $value;
                  $i++;
                  $this->status($i, $n);
              }

                          return $result;
                }
      }                                            Reverse.php

Thursday, July 23, 2009
Worker
      define('NET_GEARMAN_JOB_PATH', './');

      require 'Net/Gearman/Worker.php';

      try {
          $worker = new Net_Gearman_Worker(array('localhost:4730'));
          $worker->addAbility('Reverse');
          $worker->addAbility('Echo');
          $worker->beginWork();
      } catch (Net_Gearman_Exception $e) {
          echo $e->getMessage() . "n";
          exit;
      }




Thursday, July 23, 2009
Client

      require_once 'Net/Gearman/Client.php';

      function complete($job, $handle, $result) {
          echo "$job complete, result: ".var_export($result,
      true)."n";
      }

      function status($job, $handle, $n, $d)
      {
          echo "$n/$dn";
      }
                                                      continued..


Thursday, July 23, 2009
Client


      $client = new Net_Gearman_Client(array('lager:4730'));

      $task = new Net_Gearman_Task('Reverse', range(1,5));
      $task->attachCallback("complete",Net_Gearman_Task::TASK_COMPLETE);
      $task->attachCallback("status",Net_Gearman_Task::TASK_STATUS);

                                                                continued..




Thursday, July 23, 2009
Client

      $set = new Net_Gearman_Set();
      $set->addTask($task);

      $client->runSet($set);

      $client->Echo('Mmm... beer');




Thursday, July 23, 2009
pecl/gearman



                   • More complex API
                   • Jobs aren’t separated into files




Thursday, July 23, 2009
Worker
      $gmworker= new gearman_worker();
      $gmworker->add_server();
      $gmworker->add_function("reverse", "reverse_fn");

      while (1)
      {
        $ret= $gmworker->work();
        if ($ret != GEARMAN_SUCCESS)
          break;
      }

      function reverse_fn($job)
      {
        $workload= $job->workload();
        echo "Received job: " . $job->handle() . "n";
        echo "Workload: $workloadn";
        $result= strrev($workload);
        echo "Result: $resultn";
        return $result;
      }


Thursday, July 23, 2009
Client
      $gmclient= new gearman_client();

      $gmclient->add_server('lager');

      echo "Sending jobn";

      list($ret, $result) = $gmclient->do("reverse", "Hello!");

      if ($ret == GEARMAN_SUCCESS)
        echo "Success: $resultn";




Thursday, July 23, 2009
gearman @ digg



Thursday, July 23, 2009
gearman @ digg

                   • 400,000 jobs a day
                   • Jobs: crawling, DB job, FB sync,
                          memcache manipulation, Twitter
                          post, IDDB migration, etc.
                   • Each application server has its own
                          Gearman daemon + workers



Thursday, July 23, 2009
tips and tricks

                   • you can daemonize the workers easily
                          with daemon or supervisord

                   • run workers in different groups, don’t
                          block on job A waiting on job B

                   • Make workers exit after N jobs to free
                          up memory (supervisord will restart
                          them)


Thursday, July 23, 2009
Thrift
Thursday, July 23, 2009
background


                   • NOT developed by Danga (Facebook)
                   • cross-language services
                   • RPC-based



Thursday, July 23, 2009
background

                   • interface description language
                   • bindings: C++, C#, Cocoa, Erlang,
                          Haskell, Java, OCaml, Perl, PHP,
                          Python, Ruby, Smalltalk
                   • data types: base, structs, constants,
                          services, exceptions



Thursday, July 23, 2009
IDL



Thursday, July 23, 2009
Demo



Thursday, July 23, 2009
Thank You


                          http://gravitonic.com/talks
Thursday, July 23, 2009

More Related Content

Similar to All The Little Pieces

Open Source Tools For Freelancers
Open Source Tools For FreelancersOpen Source Tools For Freelancers
Open Source Tools For FreelancersChristie Koehler
 
Memcached: What is it and what does it do? (PHP Version)
Memcached: What is it and what does it do? (PHP Version)Memcached: What is it and what does it do? (PHP Version)
Memcached: What is it and what does it do? (PHP Version)Brian Moon
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepDenish Patel
 
Strategies Tech It Up
Strategies Tech It UpStrategies Tech It Up
Strategies Tech It UpLisa Read
 
Accelerate Your Rails Site with Automatic Generation-Based Action Caching
Accelerate Your Rails Site with Automatic Generation-Based Action CachingAccelerate Your Rails Site with Automatic Generation-Based Action Caching
Accelerate Your Rails Site with Automatic Generation-Based Action Cachingelliando dias
 
Plugin Testing
Plugin TestingPlugin Testing
Plugin TestingTim Moore
 
High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013Charles Nutter
 
Cook Up a Runtime with The New OSGi Resolver - Neil Bartlett
Cook Up a Runtime with The New OSGi Resolver - Neil BartlettCook Up a Runtime with The New OSGi Resolver - Neil Bartlett
Cook Up a Runtime with The New OSGi Resolver - Neil Bartlettmfrancis
 
Invokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesInvokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesCharles Nutter
 
Drizzle - Status, Principles and Ecosystem
Drizzle - Status, Principles and EcosystemDrizzle - Status, Principles and Ecosystem
Drizzle - Status, Principles and EcosystemRonald Bradford
 
URIplay for Open Video Conference (2009)
URIplay for Open Video Conference (2009)URIplay for Open Video Conference (2009)
URIplay for Open Video Conference (2009)Chris Jackson
 
Introtoduction to cocos2d
Introtoduction to  cocos2dIntrotoduction to  cocos2d
Introtoduction to cocos2dJohn Wilker
 
State of Puppet - Puppet Camp Barcelona 2013
State of Puppet - Puppet Camp Barcelona 2013State of Puppet - Puppet Camp Barcelona 2013
State of Puppet - Puppet Camp Barcelona 2013Puppet
 
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009Guillaume Laforge
 

Similar to All The Little Pieces (20)

Using Drupal
Using DrupalUsing Drupal
Using Drupal
 
Open Source Tools For Freelancers
Open Source Tools For FreelancersOpen Source Tools For Freelancers
Open Source Tools For Freelancers
 
Varnish Oscon 2009
Varnish Oscon 2009Varnish Oscon 2009
Varnish Oscon 2009
 
Memcached: What is it and what does it do? (PHP Version)
Memcached: What is it and what does it do? (PHP Version)Memcached: What is it and what does it do? (PHP Version)
Memcached: What is it and what does it do? (PHP Version)
 
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRep
 
JRubyConf 2009
JRubyConf 2009JRubyConf 2009
JRubyConf 2009
 
All The Little Pieces
All The Little PiecesAll The Little Pieces
All The Little Pieces
 
Strategies Tech It Up
Strategies Tech It UpStrategies Tech It Up
Strategies Tech It Up
 
Accelerate Your Rails Site with Automatic Generation-Based Action Caching
Accelerate Your Rails Site with Automatic Generation-Based Action CachingAccelerate Your Rails Site with Automatic Generation-Based Action Caching
Accelerate Your Rails Site with Automatic Generation-Based Action Caching
 
Plugin Testing
Plugin TestingPlugin Testing
Plugin Testing
 
Node and SocketIO
Node and SocketIONode and SocketIO
Node and SocketIO
 
High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013
 
Cook Up a Runtime with The New OSGi Resolver - Neil Bartlett
Cook Up a Runtime with The New OSGi Resolver - Neil BartlettCook Up a Runtime with The New OSGi Resolver - Neil Bartlett
Cook Up a Runtime with The New OSGi Resolver - Neil Bartlett
 
Invokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesInvokedynamic: Tales from the Trenches
Invokedynamic: Tales from the Trenches
 
Drizzle - Status, Principles and Ecosystem
Drizzle - Status, Principles and EcosystemDrizzle - Status, Principles and Ecosystem
Drizzle - Status, Principles and Ecosystem
 
URIplay for Open Video Conference (2009)
URIplay for Open Video Conference (2009)URIplay for Open Video Conference (2009)
URIplay for Open Video Conference (2009)
 
Introtoduction to cocos2d
Introtoduction to  cocos2dIntrotoduction to  cocos2d
Introtoduction to cocos2d
 
State of Puppet - Puppet Camp Barcelona 2013
State of Puppet - Puppet Camp Barcelona 2013State of Puppet - Puppet Camp Barcelona 2013
State of Puppet - Puppet Camp Barcelona 2013
 
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009
 
Plone on Amazon EC2
Plone on Amazon EC2Plone on Amazon EC2
Plone on Amazon EC2
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

All The Little Pieces

  • 1. all the little pieces distributed systems with PHP Andrei Zmievski ⁜ Digg OSCON 2009 ⁜ San Jose twitter: @a Thursday, July 23, 2009
  • 2. Who is this guy? • Open Source Fellow @ Digg • PHP Core Developer since 1999 • Architect of the Unicode/i18n support • Release Manager for PHP 6 • Beer lover (and brewer) Thursday, July 23, 2009
  • 3. Why distributed? • Because Moore’s Law will not save you • Despite what DHH says Thursday, July 23, 2009
  • 4. Share nothing • Your Mom was wrong • No shared data on application servers • Distribute it to shared systems Thursday, July 23, 2009
  • 5. distribute… • memory (memcached) • storage (mogilefs) • work (gearman) Thursday, July 23, 2009
  • 6. Building blocks • GLAMMP - have you heard of it? • Gearman + LAMP + Memcached • Throw in Mogile too Thursday, July 23, 2009
  • 8. background • created by Danga Interactive • high-performance, distributed memory object caching system • sustains Digg, Facebook, LiveJournal, Yahoo!, and many others • if you aren’t using it, you are crazy Thursday, July 23, 2009
  • 9. background • Very fast over the network and very easy to set up • Designed to be transient • You still need a database Thursday, July 23, 2009
  • 10. architecture client memcached client memcached client memcached Thursday, July 23, 2009
  • 11. architecture memcached Thursday, July 23, 2009
  • 12. slab #4 4104 bytes slab #3 1368 bytes 1368 bytes slab #2 456 bytes 456 bytes 456 bytes 152 152 152 152 152 slab #1 bytes bytes bytes bytes bytes Thursday, July 23, 2009
  • 13. memory architecture • memory allocated on startup, released on shutdown • variable sized slabs (30+ by default) • each object is stored in the slab most fitting its size • fragmentation can be problematic Thursday, July 23, 2009
  • 14. memory architecture • items are deleted: • on set • on get, if it’s expired • if slab is full, then use LRU Thursday, July 23, 2009
  • 15. applications • object cache • output cache • action flood control / rate limiting • simple queue • and much more Thursday, July 23, 2009
  • 16. PHP clients • a few private ones (Facebook, Yahoo!, etc) • pecl/memcache • pecl/memcached Thursday, July 23, 2009
  • 17. pecl/memcached • based on libmemcached • released in January 2009 • surface API similarity to pecl/ memcache • parity with other languages Thursday, July 23, 2009
  • 18. API • get • cas • set • *_by_key • add • getMulti • replace • setMulti • delete • getDelayed / fetch* • append • callbacks • prepend Thursday, July 23, 2009
  • 19. consistent hashing IP1 IP2-1 IP2-2 IP3 Thursday, July 23, 2009
  • 20. compare-and-swap (cas) • “check and set” • no update if object changed • relies on CAS token Thursday, July 23, 2009
  • 21. compare-and-swap (cas) $m = new Memcached(); $m->addServer('localhost', 11211); do { $ips = $m->get('ip_block', null, $cas); if ($m->getResultCode() == Memcached::RES_NOTFOUND) { $ips = array($_SERVER['REMOTE_ADDR']); $m->add('ip_block', $ips); } else { $ips[] = $_SERVER['REMOTE_ADDR']; $m->cas($cas, 'ip_block', $ips); } } while ($m->getResultCode() != Memcached::RES_SUCCESS); Thursday, July 23, 2009
  • 22. delayed “lazy” fetching • issue request with getDelayed() • do other work • fetch results with fetch() or fetchAll() Thursday, July 23, 2009
  • 23. binary protocol • performance • every request is parsed • can happen thousands times a second • extensibility • support more data in the protocol Thursday, July 23, 2009
  • 24. callbacks • read-through cache callback • if key is not found, invoke callback, save value to memcache and return it Thursday, July 23, 2009
  • 25. callbacks • result callback • invoked by getDelayed() for every found item • should not call fetch() in this case Thursday, July 23, 2009
  • 26. buffered writes • queue up write requests • send when a threshold is exceeded or a ‘get’ command is issued Thursday, July 23, 2009
  • 27. key prefixing • optional prefix prepended to all the keys automatically • allows for namespacing, versioning, etc. Thursday, July 23, 2009
  • 28. key locality • allows mapping a set of keys to a specific server Thursday, July 23, 2009
  • 29. multiple serializers • PHP • igbinary • JSON (soon) Thursday, July 23, 2009
  • 30. future • UDP support • replication • server management (ejection, status callback) Thursday, July 23, 2009
  • 31. tips & tricks • 32-bit systems with > 4GB memory: memcached -m4096 -p11211 memcached -m4096 -p11212 memcached -m4096 -p11213 Thursday, July 23, 2009
  • 32. tips & tricks • write-through or write-back cache • Warm up the cache on code push • Version the keys (if necessary) Thursday, July 23, 2009
  • 33. tips & tricks • Don’t think row-level DB-style caching; think complex objects • Don’t run memcached on your DB server — your DBAs might send you threatening notes • Use multi-get — run things in parallel Thursday, July 23, 2009
  • 34. delete by namespace 1 $ns_key = $memcache->get("foo_namespace_key"); 2 // if not set, initialize it 3 if ($ns_key === false) 4 $memcache->set("foo_namespace_key", 5 rand(1, 10000)); 6 // cleverly use the ns_key 7 $my_key = "foo_".$ns_key."_12345"; 8 $my_val = $memcache->get($my_key); 9 // to clear the namespace: 10 $memcache->increment("foo_namespace_key"); Thursday, July 23, 2009
  • 35. storing lists of data • Store items under indexed keys: comment.12, comment.23, etc • Then store the list of item IDs in another key: comments • To retrieve, fetch comments and then multi-get the comment IDs Thursday, July 23, 2009
  • 36. preventing stampeding • embedded probabilistic timeout • gearman unique task trick Thursday, July 23, 2009
  • 37. optimization • watch stats (eviction rate, fill, etc) • getStats() • telnet + “stats” commands • peep (heap inspector) Thursday, July 23, 2009
  • 38. slabs • Tune slab sizes to your needs: • -f chunk size growth factor (default 1.25) • -n minimum space allocated for key +value+flags (default 48) Thursday, July 23, 2009
  • 39. slabs slab class 1: chunk size 104 perslab 10082 slab class 2: chunk size 136 perslab 7710 slab class 3: chunk size 176 perslab 5957 slab class 4: chunk size 224 perslab 4681 ... slab class 38: chunk size 394840 perslab 2 slab class 39: chunk size 493552 perslab 2 Default: 38 slabs Thursday, July 23, 2009
  • 40. slabs Most objects: ∼1-2KB, some larger slab class 1: chunk size 1048 perslab 1000 slab class 2: chunk size 1064 perslab 985 slab class 3: chunk size 1080 perslab 970 slab class 4: chunk size 1096 perslab 956 ... slab class 198: chunk size 9224 perslab 113 slab class 199: chunk size 9320 perslab 112 memcached -n 1000 -f 1.01 Thursday, July 23, 2009
  • 41. memcached @ digg Thursday, July 23, 2009
  • 42. ops • memcached on each app server (2GB) • the process is niced to a lower level • separate pool for sessions • 2 servers keep track of cluster health Thursday, July 23, 2009
  • 43. key prefixes • global key prefix for apc, memcached, etc • each pool has additional, versioned prefix: .sess.2 • the key version is incremented on each release • global prefix can invalidate all caches Thursday, July 23, 2009
  • 44. cache chain • multi-level caching: globals, APC, memcached, etc. • all cache access is through Cache_Chain class • various configurations: • APC ➡ memcached • $GLOBALS ➡ APC Thursday, July 23, 2009
  • 45. other • large objects (> 1MB) • split on the client side • save the partial keys in a master one Thursday, July 23, 2009
  • 47. moxi - memcached proxy • homogenization • multi-get escalation • protocol pipelining • statistics, etc Thursday, July 23, 2009
  • 48. alternatives • in-memory: Tokyo Tyrant, Scalaris • persistent: Hypertable, Cassandra, MemcacheDB • document-oriented: CouchDB Thursday, July 23, 2009
  • 50. background • created by Danga Interactive • application-level distributed filesystem • used at Digg, LiveJournal, etc • a form of “cloud caching” • scales very well Thursday, July 23, 2009
  • 51. background • automatic file replication with custom policies • no single point of failure • flat namespace • local filesystem agnostic • not meant for speed Thursday, July 23, 2009
  • 52. architecture node tracker node tracker app node DB node tracker node Thursday, July 23, 2009
  • 53. applications • images • document storage • backing store for certain caches Thursday, July 23, 2009
  • 54. PHP client • File_Mogile in PEAR • MediaWiki one (not maintained) Thursday, July 23, 2009
  • 55. Example $hosts = array('172.10.1.1', '172.10.1.2'); $m = new File_Mogile($hosts, 'profiles'); $m->storeFile('user1234', 'image', '/tmp/image1234.jpg'); ... $paths = $m->getPaths('user1234'); Thursday, July 23, 2009
  • 56. mogile @ digg Thursday, July 23, 2009
  • 57. mogile @ digg • Wrapper around File_Mogile to cache entries in memcache • fairly standard set-up • trackers run on storage nodes Thursday, July 23, 2009
  • 58. mogile @ digg • not huge (about 3.5 TB of data) • files are replicated 3x • the user profile images are cached on Netscaler (1.5 GB cache) • mogile cluster load is light Thursday, July 23, 2009
  • 60. background • created by Danga Interactive • anagram of “manager” • a system for distributing work • a form of RPC mechanism Thursday, July 23, 2009
  • 61. background • parallel, asynchronous, scales well • fire and forget, decentralized • avoid tying up Apache processes Thursday, July 23, 2009
  • 62. background • dispatch function calls to machines that are better suited to do work • do work in parallel • load balance lots of function calls • invoke functions in other languages Thursday, July 23, 2009
  • 63. architecture client worker client gearmand worker client worker Thursday, July 23, 2009
  • 64. applications • thumbnail generation • asynchronous logging • cache warm-up • DB jobs, data migration • sending email Thursday, July 23, 2009
  • 65. servers • Gearman-Server (Perl) • gearmand (C) Thursday, July 23, 2009
  • 66. clients • Net_Gearman • simplified, pretty stable • pecl/gearman • more powerful, complex, somewhat unstable (under development) Thursday, July 23, 2009
  • 67. Concepts • Job • Worker • Task • Client Thursday, July 23, 2009
  • 68. Net_Gearman • Net_Gearman_Job • Net_Gearman_Worker • Net_Gearman_Task • Net_Gearman_Set • Net_Gearman_Client Thursday, July 23, 2009
  • 69. Echo Job class Net_Gearman_Job_Echo extends Net_Gearman_Job_Common { public function run($arg) { var_export($arg); echo "n"; } } Echo.php Thursday, July 23, 2009
  • 70. Reverse Job class Net_Gearman_Job_Reverse extends Net_Gearman_Job_Common { public function run($arg) { $result = array(); $n = count($arg); $i = 0; while ($value = array_pop($arg)) { $result[] = $value; $i++; $this->status($i, $n); } return $result; } } Reverse.php Thursday, July 23, 2009
  • 71. Worker define('NET_GEARMAN_JOB_PATH', './'); require 'Net/Gearman/Worker.php'; try { $worker = new Net_Gearman_Worker(array('localhost:4730')); $worker->addAbility('Reverse'); $worker->addAbility('Echo'); $worker->beginWork(); } catch (Net_Gearman_Exception $e) { echo $e->getMessage() . "n"; exit; } Thursday, July 23, 2009
  • 72. Client require_once 'Net/Gearman/Client.php'; function complete($job, $handle, $result) { echo "$job complete, result: ".var_export($result, true)."n"; } function status($job, $handle, $n, $d) { echo "$n/$dn"; } continued.. Thursday, July 23, 2009
  • 73. Client $client = new Net_Gearman_Client(array('lager:4730')); $task = new Net_Gearman_Task('Reverse', range(1,5)); $task->attachCallback("complete",Net_Gearman_Task::TASK_COMPLETE); $task->attachCallback("status",Net_Gearman_Task::TASK_STATUS); continued.. Thursday, July 23, 2009
  • 74. Client $set = new Net_Gearman_Set(); $set->addTask($task); $client->runSet($set); $client->Echo('Mmm... beer'); Thursday, July 23, 2009
  • 75. pecl/gearman • More complex API • Jobs aren’t separated into files Thursday, July 23, 2009
  • 76. Worker $gmworker= new gearman_worker(); $gmworker->add_server(); $gmworker->add_function("reverse", "reverse_fn"); while (1) { $ret= $gmworker->work(); if ($ret != GEARMAN_SUCCESS) break; } function reverse_fn($job) { $workload= $job->workload(); echo "Received job: " . $job->handle() . "n"; echo "Workload: $workloadn"; $result= strrev($workload); echo "Result: $resultn"; return $result; } Thursday, July 23, 2009
  • 77. Client $gmclient= new gearman_client(); $gmclient->add_server('lager'); echo "Sending jobn"; list($ret, $result) = $gmclient->do("reverse", "Hello!"); if ($ret == GEARMAN_SUCCESS) echo "Success: $resultn"; Thursday, July 23, 2009
  • 78. gearman @ digg Thursday, July 23, 2009
  • 79. gearman @ digg • 400,000 jobs a day • Jobs: crawling, DB job, FB sync, memcache manipulation, Twitter post, IDDB migration, etc. • Each application server has its own Gearman daemon + workers Thursday, July 23, 2009
  • 80. tips and tricks • you can daemonize the workers easily with daemon or supervisord • run workers in different groups, don’t block on job A waiting on job B • Make workers exit after N jobs to free up memory (supervisord will restart them) Thursday, July 23, 2009
  • 82. background • NOT developed by Danga (Facebook) • cross-language services • RPC-based Thursday, July 23, 2009
  • 83. background • interface description language • bindings: C++, C#, Cocoa, Erlang, Haskell, Java, OCaml, Perl, PHP, Python, Ruby, Smalltalk • data types: base, structs, constants, services, exceptions Thursday, July 23, 2009
  • 86. Thank You http://gravitonic.com/talks Thursday, July 23, 2009