Queues can provide parallel processing, cross language scripting and more! The talk was focused on Gearman but the principles apply to any alternative.
4. What’s a Work Queue Anyway?
“A sequence of stored data or programs awaiting
processing.”
~ American Heritage Dictionary
5. What’s a Work Queue Anyway?
“A sequence of stored data or programs awaiting
processing.”
~ American Heritage Dictionary
“[F]or storing messages as they travel between
computers.”
~ Amazon SQS site
6. What’s a Work Queue Anyway?
“A sequence of stored data or programs awaiting
processing.”
~ American Heritage Dictionary
“[F]or storing messages as they travel between
computers.”
~ Amazon SQS site
“[I]t’s the nervous system for how distributed processing
communicates”
~ Gearman site
9. You might be using them already...
<?php
// filename: do_some_work.php
$conn = mysqli_connect($server, $user, $pass, $database);
$select = “SELECT * FROM things WHERE status = ‘not done’ ORDER BY timestamp ASC”;
$res = $conn->query($select);
while($work = $res->fetch_assoc()){
!
//
! // do work
! //
$update = “UPDATE things SET status = ‘done’ WHERE id = {$things[‘id’]}”;
! $conn->query($update);
}
12. It works!
It’s simple to understand
Get work -> do work -> done.
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
13. It works!
It’s simple to understand
Get work -> do work -> done.
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
Can be deployed on a different server
So long as it can talk to the DB
14. It works!
It’s simple to understand
Get work -> do work -> done.
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
Can be deployed on a different server
So long as it can talk to the DB
It’s persistent
MySQL is pretty good at keeping data
16. What’s wrong with that?
Runs at the frequency that cron fires it off
Can only run once a minute
17. What’s wrong with that?
Runs at the frequency that cron fires it off
Can only run once a minute
Single threaded
Race condition if you start two overlapping threads
Mitigation strategies
Have workers kill themselves every minute
Use modulus to only do certain jobs in certain threads
Create different DB pools for each worker
18. What’s wrong with that?
Runs at the frequency that cron fires it off
Can only run once a minute
Single threaded
Race condition if you start two overlapping threads
Mitigation strategies
Have workers kill themselves every minute
Use modulus to only do certain jobs in certain threads
Create different DB pools for each worker
Hits the database at the predefined frequency
Wasting DB resources
Hardest tech in most stacks to scale...
23. A little better?
It’s simple to understand
Get work -> do work -> repeat
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
24. A little better?
It’s simple to understand
Get work -> do work -> repeat
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
Can be deployed on a different server
So long as it can talk to the DB
25. A little better?
It’s simple to understand
Get work -> do work -> repeat
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
Can be deployed on a different server
So long as it can talk to the DB
It’s persistent
MySQL is pretty good at keeping data
26. A little better?
It’s simple to understand
Get work -> do work -> repeat
Can be implemented in other languages
This job/worker could have been created in Python, Java... whatever!
Can be deployed on a different server
So long as it can talk to the DB
It’s persistent
MySQL is pretty good at keeping data
It’s near real time
Maximum delay of 1 second between jobs
28. What’s wrong with that?
Runs at the frequency of the sleep() delay
1 second waits are pretty good for offloaded tasks
Can tweak the timing delay to our level of tolerance
usleep() for finer control of the interval and less than 1 second intervals are
possible!
29. What’s wrong with that?
Runs at the frequency of the sleep() delay
1 second waits are pretty good for offloaded tasks
Can tweak the timing delay to our level of tolerance
usleep() for finer control of the interval and less than 1 second intervals are
possible!
Less likely to hit a race condition
Same mitigation strategies apply
30. What’s wrong with that?
Runs at the frequency of the sleep() delay
1 second waits are pretty good for offloaded tasks
Can tweak the timing delay to our level of tolerance
usleep() for finer control of the interval and less than 1 second intervals are
possible!
Less likely to hit a race condition
Same mitigation strategies apply
Hits the database even more!
MySQL can cache queries pretty well with MyISAM tables...
... but your using InnoDB ... right?
31. - a quick history lesson
Originally developed by Danga Interactive to solve specific issues
in the building and hosting of LiveJournal.com. It was originally
announced in 2005 and was written in Perl.
32. - a quick history lesson
Originally developed by Danga Interactive to solve specific issues
in the building and hosting of LiveJournal.com. It was originally
announced in 2005 and was written in Perl.
It was ported to C by a number of big companies, like Google
and enjoys wide support from the community; not that much
really goes wrong with it.
33. - a quick history lesson
Originally developed by Danga Interactive to solve specific issues
in the building and hosting of LiveJournal.com. It was originally
announced in 2005 and was written in Perl.
It was ported to C by a number of big companies, like Google
and enjoys wide support from the community; not that much
really goes wrong with it.
Incidentally, Danga Interactive also created Memcached
34. - a quick history lesson
Originally developed by Danga Interactive to solve specific issues
in the building and hosting of LiveJournal.com. It was originally
announced in 2005 and was written in Perl.
It was ported to C by a number of big companies, like Google
and enjoys wide support from the community; not that much
really goes wrong with it.
Incidentally, Danga Interactive also created Memcached
Danga Interactive exists no more, it was sold to Six Apart then
LiveJournal was sold to SUP. Six Apart was acquired by SAY
Media.
36. How Gearman Work Queues Stack Up
They’re simple to understand
Get work -> do work -> repeat
37. How Gearman Work Queues Stack Up
They’re simple to understand
Get work -> do work -> repeat
They can work across languages
Jobs are simple strings, so they can pass whatever you want
Additionally, tools exist for things like MySQL to trigger new jobs
38. How Gearman Work Queues Stack Up
They’re simple to understand
Get work -> do work -> repeat
They can work across languages
Jobs are simple strings, so they can pass whatever you want
Additionally, tools exist for things like MySQL to trigger new jobs
They can work across servers
Workers don’t need to live where clients do
39. How Gearman Work Queues Stack Up
They’re simple to understand
Get work -> do work -> repeat
They can work across languages
Jobs are simple strings, so they can pass whatever you want
Additionally, tools exist for things like MySQL to trigger new jobs
They can work across servers
Workers don’t need to live where clients do
Persistent
This requires additional configuration
40. How Gearman Work Queues Stack Up
They’re simple to understand
Get work -> do work -> repeat
They can work across languages
Jobs are simple strings, so they can pass whatever you want
Additionally, tools exist for things like MySQL to trigger new jobs
They can work across servers
Workers don’t need to live where clients do
Persistent
This requires additional configuration
Real Time
41. How Gearman Work Queues Stack Up
They’re simple to understand
Get work -> do work -> repeat
They can work across languages
Jobs are simple strings, so they can pass whatever you want
Additionally, tools exist for things like MySQL to trigger new jobs
They can work across servers
Workers don’t need to live where clients do
Persistent
This requires additional configuration
Real Time
Bonus: Can be asynchronous and synchronous!
49. But there are limitations...
Cameras take big images
The iPhone 4 is a 5 megapixel camera, most files are about 2Mb
Entry level DSLR cameras are 14 megapixel with 5Mb files
50. But there are limitations...
Cameras take big images
The iPhone 4 is a 5 megapixel camera, most files are about 2Mb
Entry level DSLR cameras are 14 megapixel with 5Mb files
What about multiple uploads
Present the form, upload one image, process, repeat?
Provide a block of upload fields and hope it’s enough?
Use HTML5 / Flash plugin to send multiple files?
Hope you have enough CPU & RAM to crunch them in parallel.
51. But there are limitations...
Cameras take big images
The iPhone 4 is a 5 megapixel camera, most files are about 2Mb
Entry level DSLR cameras are 14 megapixel with 5Mb files
What about multiple uploads
Present the form, upload one image, process, repeat?
Provide a block of upload fields and hope it’s enough?
Use HTML5 / Flash plugin to send multiple files?
Hope you have enough CPU & RAM to crunch them in parallel.
Processing may delay the client longer than the default
timeout.
What happens to your user’s confidence in your product if things are
slow?
53. Gearmand runs as another daemon/service in your stack.
Can be a different server or run along side everything else
54. Gearmand runs as another daemon/service in your stack.
Can be a different server or run along side everything else
Gearman’s only job is to facilitate the handling of these
messages
It can optionally store them in a persistent store to recover from
system reboots, service restarts, etc.
55. Gearmand runs as another daemon/service in your stack.
Can be a different server or run along side everything else
Gearman’s only job is to facilitate the handling of these
messages
It can optionally store them in a persistent store to recover from
system reboots, service restarts, etc.
PHP talks to the Gearmand server via an API extension
Just like MySQL, Memcached, Postgresql, APC and other tools
56. Gearmand runs as another daemon/service in your stack.
Can be a different server or run along side everything else
Gearman’s only job is to facilitate the handling of these
messages
It can optionally store them in a persistent store to recover from
system reboots, service restarts, etc.
PHP talks to the Gearmand server via an API extension
Just like MySQL, Memcached, Postgresql, APC and other tools
Your client and worker code pass messages back and forth
to Gearman
You’ll use a predefined set of PHP function calls to do this
58. Getting Gearmand running - installers for many systems
Linux: apt-get, yum etc
sudo apt-get install gearmand
sudo yum install gearmand
59. Getting Gearmand running - installers for many systems
Linux: apt-get, yum etc
sudo apt-get install gearmand
sudo yum install gearmand
Windows: cygwin ...
60. Getting Gearmand running - installers for many systems
Linux: apt-get, yum etc
sudo apt-get install gearmand
sudo yum install gearmand
Windows: cygwin ...
OS X: ...
Install a package manager: MacPorts, Homebrew
sudo port install gearmand
brew install gearmand
Add a Virtual Machine that has an installer...
62. Getting Gearmand running - compile your own
Get your dependencies
sudo yum install gpp gcc-c++ boost boost-devel libevent libevent-devel libuuid libuuid-
devel
63. Getting Gearmand running - compile your own
Get your dependencies
sudo yum install gpp gcc-c++ boost boost-devel libevent libevent-devel libuuid libuuid-
devel
Get the latest stable source from gearman.org (Launchpad)
wget https://launchpad.net/gearmand/trunk/0.34/+download/gearmand-0.34.tar.gz
64. Getting Gearmand running - compile your own
Get your dependencies
sudo yum install gpp gcc-c++ boost boost-devel libevent libevent-devel libuuid libuuid-
devel
Get the latest stable source from gearman.org (Launchpad)
wget https://launchpad.net/gearmand/trunk/0.34/+download/gearmand-0.34.tar.gz
Unpack, Compile and Install
tar -xzf gearmand-0.34.tar.gz
cd gearmand-0.34.tar.gz
./configure
make
sudo make install
65. Getting Gearmand running - compile your own
Get your dependencies
sudo yum install gpp gcc-c++ boost boost-devel libevent libevent-devel libuuid libuuid-
devel
Get the latest stable source from gearman.org (Launchpad)
wget https://launchpad.net/gearmand/trunk/0.34/+download/gearmand-0.34.tar.gz
Unpack, Compile and Install
tar -xzf gearmand-0.34.tar.gz
cd gearmand-0.34.tar.gz
./configure
make
sudo make install
Add appropriate system hooks to start the service on reboot
chkconfig
systemctl
?
67. Configuring Gearmand
Default install is pretty good
Load gearmand as a background service
No persistence
Listens on all available interfaces at port 4730
68. Configuring Gearmand
Default install is pretty good
Load gearmand as a background service
No persistence
Listens on all available interfaces at port 4730
Persistence
Easily enabled using
MySQL based databases (Drizzle, MySQL, MariaDB etc)
Postgresql
SQL Lite
Memcached
Add the appropriate flags to your init script for details on each
Docs: http://gearman.org/index.php?id=manual:job_server#persistent_queues
Example: MySQL
/sbin/gearmand -q libdrizzle --libdrizzle-host=127.0.0.1 --libdrizzle-user=gearman
--libdrizzle-password=secret --libdrizzle-db=queue --libdrizzle-table=gearman
--libdrizzle-mysql
69. Configuring Gearmand
Defaults to single thread (mostly)
Some versions default to use more than one...
Non-blocking I/O which works very fast with a single thread
To give each of the internals a dedicated thread use
/sbin/gearmand -d -t 3
Additional threads are then used for client worker connections
70. Configuring Gearmand
Defaults to single thread (mostly)
Some versions default to use more than one...
Non-blocking I/O which works very fast with a single thread
To give each of the internals a dedicated thread use
/sbin/gearmand -d -t 3
Additional threads are then used for client worker connections
Security
Lock gearmand to a single IP
/sbin/gearmand -d -L 127.0.0.1
Change the port from the default 4730
/sbin/gearmand -d -L 127.0.0.1 -p 7003
71. Configuring Gearmand
Defaults to single thread (mostly)
Some versions default to use more than one...
Non-blocking I/O which works very fast with a single thread
To give each of the internals a dedicated thread use
/sbin/gearmand -d -t 3
Additional threads are then used for client worker connections
Security
Lock gearmand to a single IP
/sbin/gearmand -d -L 127.0.0.1
Change the port from the default 4730
/sbin/gearmand -d -L 127.0.0.1 -p 7003
HTTP Access
Gearmand supports a pluggable interface architecture and can use HTTP for
communication. Requests are sent using GET and POST data.
73. Configuring Gearmand for high availability
Gearmand doesn’t replicate data
May be a weak point depending on your use case
74. Configuring Gearmand for high availability
Gearmand doesn’t replicate data
May be a weak point depending on your use case
Redundancy without a load balancer
In the client logic, add multiple servers and let the driver sort it out
Workers register themselves with each gearmand server
Done.
75. Configuring Gearmand for high availability
Gearmand doesn’t replicate data
May be a weak point depending on your use case
Redundancy without a load balancer
In the client logic, add multiple servers and let the driver sort it out
Workers register themselves with each gearmand server
Done.
76. Configuring Gearmand for high availability
Easy to load balance
Put the two servers behind a load balancer
Each client connects to the load balancer
Workers register themselves with each gearmand server
Done.
77. Configuring Gearmand for high availability
Easy to load balance
Put the two servers behind a load balancer
Each client connects to the load balancer
Workers register themselves with each gearmand server
Done.
84. Gearman in PHP - The Client
Clients create the jobs and tasks for the workers to do
85. Gearman in PHP - The Client
Clients create the jobs and tasks for the workers to do
Create a client object
$gm = new GearmanClient();
86. Gearman in PHP - The Client
Clients create the jobs and tasks for the workers to do
Create a client object
$gm = new GearmanClient();
Define the server(s) to use
$gm->addServer(); // defaults to localhost:4730
87. Gearman in PHP - The Client
Clients create the jobs and tasks for the workers to do
Create a client object
$gm = new GearmanClient();
Define the server(s) to use
$gm->addServer(); // defaults to localhost:4730
Create the job and wait for a response
$jobdata = “/* any valid string */”;
do {
$res = $gm->do(‘image_resize’, $jobdata);
} while ($gm->returnCode() != GEARMAN_SUCCESS);
88. Gearman in PHP - The Client
Clients create the jobs and tasks for the workers to do
Create a client object
$gm = new GearmanClient();
Define the server(s) to use
$gm->addServer(); // defaults to localhost:4730
Create the job and wait for a response
$jobdata = “/* any valid string */”;
do {
$res = $gm->do(‘image_resize’, $jobdata);
} while ($gm->returnCode() != GEARMAN_SUCCESS);
Or don’t
$jobid = $gm->doBackground(‘image_resize’, $jobdata);
89. Gearman in PHP - The Worker
Workers do the jobs and tasks the clients request
90. Gearman in PHP - The Worker
Workers do the jobs and tasks the clients request
Define the callbacks
function callback_resize($job){ /* do stuff */ }
function callback_watermark($job){ /* do stuff */ }
91. Gearman in PHP - The Worker
Workers do the jobs and tasks the clients request
Define the callbacks
function callback_resize($job){ /* do stuff */ }
function callback_watermark($job){ /* do stuff */ }
The callbacks will fetch the actual message data and do the work
function callback_resize($job){
$data = $job->getWorkload();
/* do stuff */
}
92. Gearman in PHP - The Worker
Workers do the jobs and tasks the clients request
Define the callbacks
function callback_resize($job){ /* do stuff */ }
function callback_watermark($job){ /* do stuff */ }
The callbacks will fetch the actual message data and do the work
function callback_resize($job){
$data = $job->getWorkload();
/* do stuff */
}
Create a worker object
$gm = new GearmanWorker();
93. Gearman in PHP - The Worker
Workers do the jobs and tasks the clients request
Define the callbacks
function callback_resize($job){ /* do stuff */ }
function callback_watermark($job){ /* do stuff */ }
The callbacks will fetch the actual message data and do the work
function callback_resize($job){
$data = $job->getWorkload();
/* do stuff */
}
Create a worker object
$gm = new GearmanWorker();
Define the server(s) to respond to
$gm->addServer(); // defaults to localhost:4730
94. Gearman in PHP - The Worker
Tell the server which callbacks to use for each task
$gm->addFunction(‘resize’, ‘callback_resize’);
$gm->addFunction(‘watermark’, ‘callback_watermark’);
95. Gearman in PHP - The Worker
Tell the server which callbacks to use for each task
$gm->addFunction(‘resize’, ‘callback_resize’);
$gm->addFunction(‘watermark’, ‘callback_watermark’);
Alternately, use an anonymous function
$gm->addFunction(‘resize’, function($job){
$data = $job->workload();
/* do stuff */
});
96. Gearman in PHP - The Worker
Tell the server which callbacks to use for each task
$gm->addFunction(‘resize’, ‘callback_resize’);
$gm->addFunction(‘watermark’, ‘callback_watermark’);
Alternately, use an anonymous function
$gm->addFunction(‘resize’, function($job){
$data = $job->workload();
/* do stuff */
});
Callback functions can post status updates on their progress back to the client
function callback_resize($job){
/* do stuff */
$job->sendStatus($numerator, $denominator);
}
97. Gearman in PHP - The Worker
Tell the server which callbacks to use for each task
$gm->addFunction(‘resize’, ‘callback_resize’);
$gm->addFunction(‘watermark’, ‘callback_watermark’);
Alternately, use an anonymous function
$gm->addFunction(‘resize’, function($job){
$data = $job->workload();
/* do stuff */
});
Callback functions can post status updates on their progress back to the client
function callback_resize($job){
/* do stuff */
$job->sendStatus($numerator, $denominator);
}
Wait for work to do
while($gm->work());
98. Example Clients in CodeIgniter
class Photos extends CI_Model {
! public function resize_sync($filename){
! ! $gm = new GearmanClient();
! ! $gm->addServer('127.0.0.1', 4730);
! ! do {
! ! ! $res = $gm->do('image_resize', $filename);
! ! ! switch($gm->returnCode()){
! ! ! ! case GEARMAN_WORK_FAIL:
! ! ! ! ! return FALSE;
! ! ! ! case GEARMAN_SUCCESS:
! ! ! ! ! return TRUE;
! ! ! }
! ! } while ($gm->returnCode() != GEARMAN_SUCCESS);!
! return TRUE;
! }
! public function resize_async($filename){
! ! $gm = new GearmanClient();
! ! $gm->addServer('127.0.0.1', 4730);
! ! $res = $gm->doBackground('image_resize', $filename);
if(!$res){ return FALSE; }
! return TRUE;
! }
}
102. Running your workers
#!/bin/bash
SCRIPT="index.php upload worker"! # this is your CI controller/method
WORKDIR=/var/www/html/ ! ! # this is your CI app root
MAX_WORKERS=5! ! ! # number of workers
PHP=/usr/bin/php! ! ! # location of PHP on your system
COUNT=0! ! ! ! # internal use variable
for i in `ps -afe | grep "$SCRIPT" | grep -v grep | awk '{print $2}'`
do
! COUNT=$((COUNT+1))
done
if test $COUNT -lt $MAX_WORKERS
then
! cd $WORKDIR
! $PHP $SCRIPT
else
! echo There are enough workers running already.
fi
105. About the job data
Job data is always a string
You can serialize data to pass more complex objects
$data = array(
‘filename’ => $filename,
‘methods’ => array(‘resize’, ‘watermark’),
‘user_id’ => $user_id,
);
$jobdata = json_encode($data);
$gm->doBackground(‘processor’, $jobdata);
106. About the job data
Job data is always a string
You can serialize data to pass more complex objects
$data = array(
‘filename’ => $filename,
‘methods’ => array(‘resize’, ‘watermark’),
‘user_id’ => $user_id,
);
$jobdata = json_encode($data);
$gm->doBackground(‘processor’, $jobdata);
And then deserialize it in the worker
$jobdata = $job->workload();
$data = json_decode($jobdata, true);
107. About the job data
In theory, you can pass about 2Gb per message
Limited by the server protocol which uses a 32 bit integer to define
message size.
108. About the job data
In theory, you can pass about 2Gb per message
Limited by the server protocol which uses a 32 bit integer to define
message size.
You can pass binary data by encoding it
$binary_base64 = base64_encode($binary);
109. About the job data
In theory, you can pass about 2Gb per message
Limited by the server protocol which uses a 32 bit integer to define
message size.
You can pass binary data by encoding it
$binary_base64 = base64_encode($binary);
But use it sparingly
Any data passed is stored in memory
and your persistent store (if enabled)
and is likely on disk already
Base 64 encoded data is roughly 1 1/2 times the size of the source
data too
Compressing the binary data first can help
111. About the job data
Don’t pass 2Gb objects
Keep your message data small, less than 64K is perfect
Amazon’s SQS has a hard limit of 64K
112. About the job data
Don’t pass 2Gb objects
Keep your message data small, less than 64K is perfect
Amazon’s SQS has a hard limit of 64K
Pass pointers instead
Pass paths and filenames instead of files
Pass cache keys instead of complex class objects
that won’t deserialize right anyway
113. About the job data
Don’t pass 2Gb objects
Keep your message data small, less than 64K is perfect
Amazon’s SQS has a hard limit of 64K
Pass pointers instead
Pass paths and filenames instead of files
Pass cache keys instead of complex class objects
that won’t deserialize right anyway
Gearman is not a data store!
116. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
117. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
118. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
119. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
But Google has API limits for getting data out of Analytics
120. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
But Google has API limits for getting data out of Analytics
Only a few thousand requests per day
121. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
But Google has API limits for getting data out of Analytics
Only a few thousand requests per day
Batching requests was good - but not ideal
122. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
But Google has API limits for getting data out of Analytics
Only a few thousand requests per day
Batching requests was good - but not ideal
Data was only monthly, no daily, weekly breakdowns
123. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
But Google has API limits for getting data out of Analytics
Only a few thousand requests per day
Batching requests was good - but not ideal
Data was only monthly, no daily, weekly breakdowns
Still ended up taking weeks to gather a full month of data
124. Analytics: Problem
We wanted to provide near realtime analytics to our clients
We liked the data we could scrape out of Google
Networks Referrers Location
so we did... for a while... and it was good!
But Google has API limits for getting data out of Analytics
Only a few thousand requests per day
Batching requests was good - but not ideal
Data was only monthly, no daily, weekly breakdowns
Still ended up taking weeks to gather a full month of data
We already had scraped a significant amount of Google data so any
future collection would need to play nice with the existing data
126. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
127. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
128. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
captures the current url, user agent, referrer and so on
129. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
captures the current url, user agent, referrer and so on
and embeds it as an image tag with a carefully crafted URL for the 1x1
tracking pixel /statistics/pixel/?ua=Mozilla%2F...
130. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
captures the current url, user agent, referrer and so on
and embeds it as an image tag with a carefully crafted URL for the 1x1
tracking pixel /statistics/pixel/?ua=Mozilla%2F...
The pixel request is handled by CodeIgniter, which returns the image,
131. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
captures the current url, user agent, referrer and so on
and embeds it as an image tag with a carefully crafted URL for the 1x1
tracking pixel /statistics/pixel/?ua=Mozilla%2F...
The pixel request is handled by CodeIgniter, which returns the image,
but not before we lookup the country, state, city of the originating IP and
pass that data into Gearman for future processing.
132. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
captures the current url, user agent, referrer and so on
and embeds it as an image tag with a carefully crafted URL for the 1x1
tracking pixel /statistics/pixel/?ua=Mozilla%2F...
The pixel request is handled by CodeIgniter, which returns the image,
but not before we lookup the country, state, city of the originating IP and
pass that data into Gearman for future processing.
The Gearman worker then parses and normalizes the data,
133. Analytics: Solution
Build our own platform that gathered exactly what we wanted
modeled after Google Analytics
How we did our tracking pixel
The page has a small javascript file included that
captures the current url, user agent, referrer and so on
and embeds it as an image tag with a carefully crafted URL for the 1x1
tracking pixel /statistics/pixel/?ua=Mozilla%2F...
The pixel request is handled by CodeIgniter, which returns the image,
but not before we lookup the country, state, city of the originating IP and
pass that data into Gearman for future processing.
The Gearman worker then parses and normalizes the data,
performs a host lookup on the source IP (which is often very slow) all
before recording the result in our datastore.
136. Patterns and Recipes
One to one
Workers can run in optimized environments for specific tasks
Example: Using R, Matlab etc to run mathematic analysis and still use CI to for
the front end
Example: Run code on a different server to avoid CPU/disk/memory contention
with Apache etc.
Example: Run tools on Windows platforms like .NET components for generating
word files.
137. Patterns and Recipes
One to one
Workers can run in optimized environments for specific tasks
Example: Using R, Matlab etc to run mathematic analysis and still use CI to for
the front end
Example: Run code on a different server to avoid CPU/disk/memory contention
with Apache etc.
Example: Run tools on Windows platforms like .NET components for generating
word files.
One to many
A single client can part out jobs to multiple workers
Example: Performing TF*IDF analysis on documents to find keywords
Example: Handling image manipulations in parallel, resizing 2 or 3 new
thumbnails at a time.
138. Patterns and Recipes
Many to one
Multiple clients utilizing a single worker thread
Share memory across jobs. Arrays are faster than APC, Memcached, MySQL
Example: Database write buffering (for non critical data only)
Example: Perform database writes across shards. MySQL UDF inserts a record into
gearman that’s then re-written out to appropriate user shards
139. Patterns and Recipes
Many to one
Multiple clients utilizing a single worker thread
Share memory across jobs. Arrays are faster than APC, Memcached, MySQL
Example: Database write buffering (for non critical data only)
Example: Perform database writes across shards. MySQL UDF inserts a record into
gearman that’s then re-written out to appropriate user shards
Optimize front end displays
Example: Pagination optimization
User requests the first page of data
Kick off a background task to pre-cache the next page before it’s requested
Example: Dashboards
User logs into your site, you fire off background tasks to generate dashboard information
that user will need
Workers begin crunching the data and store the results in a cache
Meanwhile, the user is served a page with spaces allocated for each widget, which are then
loaded via AJAX
140. Patterns and Recipes
Delayed/Deferred Processing
Use whenever tasks would run long or are potentially unreliable
Preparing images
Preparing video
Remote service calls
Sending content to Twitter, Facebook, LinkedIn...
Triggering other remote services faxes, emails etc
Prefetching data
142. Other Solutions
Most alternative solutions have APIs for PHP
Some interface over other protocols like HTTP(s) or memcache,
which can ease deploying new servers
143. Other Solutions
Most alternative solutions have APIs for PHP
Some interface over other protocols like HTTP(s) or memcache,
which can ease deploying new servers
Alternatives
Active MQ
Amazon’s SQS
Beanstalkd
Microsoft Message Queuing
RabbitMQ
Others - check for activity before adopting
146. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
147. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
148. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
you want the benefit of running in parallel
149. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
you want the benefit of running in parallel
Don’t use your database server as your work queue
150. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
you want the benefit of running in parallel
Don’t use your database server as your work queue
It might work short term, but it’s not scaleable.
151. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
you want the benefit of running in parallel
Don’t use your database server as your work queue
It might work short term, but it’s not scaleable.
Persistence comes with a cost
152. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
you want the benefit of running in parallel
Don’t use your database server as your work queue
It might work short term, but it’s not scaleable.
Persistence comes with a cost
Writes to the datastore are never free and will slow down the queue
153. Some helpful tips
Not everything belongs in a work queue
Queue it if...
the process is intensive on any subsystem, CPU, RAM etc
the process is slow, taking 1/2 second or longer to run, profile your
app with CodeIgniter’s built-in profiler.
you want the benefit of running in parallel
Don’t use your database server as your work queue
It might work short term, but it’s not scaleable.
Persistence comes with a cost
Writes to the datastore are never free and will slow down the queue
Adds an additional layer of stuff to maintain