SlideShare a Scribd company logo
1 of 70
© 2012 Equifax Inc. 
Audience Intel 
J. David Mitchell 
IXI Services, Equifax Inc. 
7927 Jones Branch Drive, Suite 400 | McLean, VA 22102
Overview or Outline 
1. IXI Audience Intel application: High-level 
overview 
1. A graph data store: Neo4J 
2. A key-value data store: Redis 
3. How we query and filter Redis 
2. Using ZeroMQ to build a compute cluster 
1. Design patterns 
2. How it works 
© 2012 Equifax Inc. 2
The Business Need 
Audience Intel 
© 2012 Equifax Inc. 3
Background 
Audience Intel (AI) will help our customers monitor and glean 
insights from their online marketing campaigns. 
AI will track click-and-conversion counts for our customers. 
AI will profile the click-and-conversion counts using IXI 
segments. 
 WealthComplete Total Investable Assets 
 WealthComplete Deposits 
 Financial Cohorts 
 Income360 
 Discretionary Spending 
 Ability to Pay 
 Economic Spectrum 
 Aggregated FICO scores 
© 2012 Equifax Inc. 4
Background: Use cases 
The end-user application will query a persistent data store to answer their 
business questions. 
 Mr. Jones from Razorfish would like to see clicks for all offers, all publishers and all 
creatives for June 1. 
 Mr. Jones from Razorfish would like to see ATP (Ability to Pay) for clicks for all offers, 
all publishers and all creatives for June 1. 
 Mr. Jones from Razorfish would like to see ATP for clicks for offer 1, goal 1, all 
publishers and creative 79 for June 1. 
A partner will place an IXI empty gif on their page (or in their ad), so that we will 
get an entry in our Web server logs. 
 GET /digi/23CE7C3A-FAE93B9DB863/a.gif?partner=0244&offer=1&goal=1&result=1&source=1&creativeid=1 
Parsers will parse the log files for a time slot (e.g., a one-hour time slot) and do 
counts for each partner and campaign. 
 Clicks and conversions 
 IXI product lookup, e.g., based on the IP address or cookie (zip+4) 
© 2012 Equifax Inc. 5
Background 
Query views or clicks or conversions of an audience 
 By time: hour, day, week, month or whole campaign 
 By partner (client or campaign) 
 By offer 
 By goal 
 By result: achieved or not 
 By source: publishers 
 By creative 
Glean insights from IXI products 
WealthComplete Deposits 8 
FinancialCohorts 61 
Income360 11 
Discretionary Spending 9 
EconomicCohorts 71 
Ability to Pay (ATP) 4 
FICO scores 6 
Economic Spectrum 17 
210 
© 2012 Equifax Inc. 6
Background: Examples of IXI products 
WealthComplete Deposits AB 
Tiers WealthComplete Deposits # HHs % 
1 $250K+ 5,754,096 4.78% 
2 $100K - $250K 12,536,735 10.42% 
3 $50K - $100K 13,288,411 11.04% 
4 $25K - $50K 14,722,193 12.23% 
5 $10K - $25K 18,336,190 15.24% 
Ability to Pay (ATP) ED 
6 $2.5K - $10K 20,246,716 16.82% 
Tiers Data Labels # HHs % 
7 $0.01 - $2.5K 34,993,949 29.08% 
1, 2, 3, 4, 5, 6, 7 Highest Ability to Pay: Top 20% 22,249,393 18.49% 
8 $0 467,498 0.39% 
8, 9, 10, 11, 12 High Ability to Pay 32,916,824 27.36% 
Total 120,345,788 100.00% 
13, 14, 15, 16, 17 Moderate Ability to Pay 40,012,323 33.25% 
18, 19, 20, 21, 22, 23, 24 Lowest Abillity to Pay: Bottom 20% 25,144,689 20.90% 
Total 120,323,229 100.00% 
© 2012 Equifax Inc. 7
Background 
Rough sketch (or mockup) of the UI for AI. 
© 2012 Equifax Inc. 8
Platform Architecture 
Audience Intel 
© 2012 Equifax Inc. 9
Technology: High-Level overview 
Logs: Queuing component (Kafka) 
 Producer: Stream the log files into Kakfa. 
 Consumer: Parse the log files in Kafka and do counts. 
Summarize: Key-value storage for the counters 
 Scalable with fast lookups. 
Metadata storage: Partner, client, campaign 
 Schema-less 
 Expresses relationships between entities easily 
 Fast lookups. 
Filtering API 
A UI 
© 2012 Equifax Inc. 10
Technology: High-Level overview 
Queuing component: Kafka 
 Apache Kafka is a distributed publish-subscribe messaging system written in 
Scala and Java. 
 incubator.apache.org/kafka/ 
Data storage for the counters: Redis 
 Advanced key-value store or data structure server written in C. 
 redis.io 
Metadata storage: Neo4j 
 Java Graph database (neo4j.org) 
 High availability cluster option 
Filtering API: PHP with ZeroMQ 
© 2012 Equifax Inc. 11
RDMS Design 
Audience Intel 
© 2012 Equifax Inc. 12
RDMS Design 
© 2012 Equifax Inc. 13
RDMS Design 
Use cases 
 Mr. Jones from Razorfish would like to see clicks for all offers, all sources and all 
creatives for Dec. 29. 
 Mr. Jones from Razorfish would like to see ATP for clicks for all offers, all 
sources and all creatives for Dec. 29. 
 Mr. Jones from Razorfish would like to see ATP for clicks for offer 1, goal 1, all 
sources and creative 79 for Dec. 29. 
First case: Get sum of clicks 
 day 2012-12-29, partner 1, goal 1 
 SELECT SUM(count) AS sum FROM ai201212 WHERE partnerId = 1 AND day = 
29 AND goal = 1; 
Second use case: Get ATP 
 day 2012-12-29, partner 1, goal 1 
 SELECT ti.productTierId, SUM(ti.tierHH) AS tierHH FROM ai201112 ai INNER 
JOIN tiers201212 ti ON ai.aiId = ti.aiId WHERE ai.partnerId = 1 AND ai.day = 29 
AND ai.goal = 1 AND ti.productId = 1 GROUP BY ti.productTierId; 
© 2012 Equifax Inc. 14
RDMS Design 
Third use case: Get ATP 
 day 2012-12-29, partner 1, goal 1 
 creativeId 79 
 SELECT ti.productTierId, SUM(ti.tierHH) AS tierHH, SUM(ai.count) AS sum, 
(SUM(ti.tierHH)/SUM(ai.count))*100 AS percent FROM ai201112 ai INNER 
JOIN tiers201212 ti ON ai.aiId = ti.aiId WHERE ai.partnerId = 1 AND ai.day = 
29 AND ai.goal = 1 AND ti.productId = 1 AND ai.creativeId = 79 GROUP BY 
ti.productTierId; 
© 2012 Equifax Inc. 15
Metadata to Neo4j 
Audience Intel 
© 2012 Equifax Inc. 16
Metadata to Neo4j 
© 2012 Equifax Inc. 17
Metadata to Neo4j 
© 2012 Equifax Inc. 18
Metadata to Neo4j 
© 2012 Equifax Inc. 19
Metadata to Neo4j 
© 2012 Equifax Inc. 20
Metadata to Neo4j 
© 2012 Equifax Inc. 21
Metadata to Neo4j 
© 2012 Equifax Inc. 22
Metadata to Neo4j: Users, Firms & Data Sources 
© 2012 Equifax Inc. 23
Metadata to Neo4j 
© 2012 Equifax Inc. 24
Storing & Retrieving Data 
in Redis 
© 2012 Equifax Inc. 25
What is Redis? 
Redis is an in-memory, advanced key-value store. 
 “in-memory”: like a cache (e.g., memcached). 
 “advanced”: stores complex data structures. 
It has a “hash table” data structure—array storage. 
 Key: currentInventoryOfFruit 
 Value: [“apples” : 2, “oranges” : 12, “tomatoes” : 6] 
 Increment: $redis->hIncrBy(‘currentInventoryOfFruit’, ‘kiwi’, 2); 
 Value: [“apples” : 2, “oranges” : 12, “kiwi” : 2, “tomatoes” : 6] 
 Increment: $redis->hIncrBy(‘currentInventoryOfFruit’, ‘kiwi’, 2); 
 Value: [“apples” : 2, “oranges” : 12, “kiwi” : 4, “tomatoes” : 6] 
© 2012 Equifax Inc. 26
What is Redis? 
One of the unique aspects of Redis in the world of key-value 
caches is that Redis adds a hash-table data structure. 
 strings -- binary safe data 
 hashes -- maps between string fields and string values 
 lists -- lists of strings, sorted by insertion order 
 sets -- unordered collection of unique strings 
 sorted sets -- numerically ordered collection of unique strings 
Functionality 
 For a list, one can push and pop strings off a list. 
 For a hash, one can sort, increment, get all fields or get one field. 
 For a set, one can sort and perform operations such as intersections and unions. 
 For sorted sets, in addition to the things mentioned above, one can use the 
numeric score to retrieve a subset. 
 Redis also supports performing parallel queries, using MULTI/EXEC commands, 
and a PUBLISH and SUBSCRIBE feature set for posting to channels and 
listening for messages posted to channels. 
© 2012 Equifax Inc. 27
How we store our Data in Redis 
Key: STRING 
 Verbose key: 
/ai/partner:1/client:1/campaign:1/time:2012.06.01/offer:1/goal:1/result:1/source:1/ 
creative:1/ 
 Compact key: /ai/1/1/1/time:2012.06.01/1/1/1/1/1/ 
 Value: 326 
Need a legend: SSET 
 Key: /ai/legend:key/partner:1/client:1/campaign:1 
 Value: [{“0”:”partner”},{“1”:”client”},{“2”:”campaign”},{“3”:”time”},{“4”:”offer”}, 
{“5”:”goal”}, {“6”:”resultId”}, {“7”:”sourceId”}, {“8”:”creativeId”}] 
Key: STRING (dog visits) 
 Verbose key: 
/ai/partner:1/client:1/campaign:1/time:2012.06.01/dog:Rusty/breed:23/color:brn/ 
weight:36/height:24/ 
 Compact key: /ai/1/1/1/2012.06.01/Rusty/23/brn/36/24/ 
 Value: 3 
© 2012 Equifax Inc. 28
Legend for /0246/-/c1/ (partner/client/campaign) 
[{ 
0 : "partner" 
}, { 
1 : "client" 
}, { 
2 : "campaign" 
}, { 
3 : "time" 
}, { 
4 : "offer" 
}, { 
5 : "goal" 
}, { 
6 : "result" 
}, { 
7 : "source" 
}, { 
8 : "creative" 
} 
] 
© 2012 Equifax Inc. 29
Querying the Data in Redis 
Created indexes for efficient lookups. 
 We are storing hour keys. 
 We created day indexes (e.g., 2012.07.11) for the hour keys. 
 We are storing day keys. 
 We created another index (e.g., ‘days’) for the day keys 
Hour keys 
 Key: /ai/key/1/1/1/2012.07.11.08/1/1/1/1/1/ 
 Value: 26 
 Index: /ai/idx/1/1/1/2012.07.11/1/1/1/1/1/ 
 Value (SET): [‘/ai/key/1/1/1/2012.07.11.08/1/1/1/1/1/’, 
‘/ai/key/1/1/1/2012.07.11.09/1/1/1/1/1/’, 
‘/ai/key/1/1/1/2012.07.11.10/1/1/1/1/1/’] 
© 2012 Equifax Inc. 30
Querying the Data in Redis 
Day keys 
 Key: /ai/key/1/1/1/2012.07.11/1/1/1/1/1/ 
 Value: 322 
 Index: /ai/idx/1/1/1/day/1/1/1/1/1/ 
 Value (SET): [‘/ai/key/1/1/1/2012.07.09/1/1/1/1/1/’, 
‘/ai/key/1/1/1/2012.07.10/1/1/1/1/1/’, ‘/ai/key/1/1/1/2012.07.11/1/1/1/1/1/’] 
Queries are typically for many days and for many dimensions. 
 We do an sUnion on the indexes. 
 Performs the union between N sets (e.g., 1,000 sets) and returns an array of keys. 
Step one: Does the item exist in an index? Get all of the keys in 
the indexes. 
 We build a list of the indexes that we want to query, and we query the indexes. 
– /ai/idx/1/1/1/day/1/1/1/1/1/ 
– /ai/idx/1/1/1/day/2/1/1/1/1/ 
– /ai/idx/1/1/1/day/3/1/1/1/1/ 
 We do an sUnion to get all of the keys in the index sets. We can filter the days 
before get the keys. 
 We do an mGet to get all of the key values, and we sum up all of the keys for the 
total for a given day (or a given hour). 
© 2012 Equifax Inc. 31
Querying the Data in Redis 
Secret sauce for queries 
 sUnion on the indexes 
 mGet (or getMultiple) on the keys 
mGet 
 An mGet gets the values of all the specified keys, and returns an array 
of values. 
 If one or more keys does not exist, the array will contain FALSE at the 
position of the key. 
© 2012 Equifax Inc. 32
Query Params 
[{"name":"partner","value":"0246"}, {"name":"client", "value":"-"}, 
{"name":"campaign", "value":“c1"}, {"name":"source", "options":[{"value":"1"}, 
{"value":"2"}]}, {"name":"time", "options":[{"value":""}, {"value":"2012"}, 
{"value":"201202"}, {"value":"20120201"}, {"value":"20120202"}, 
{"value":"20120203"}, {"value":"20120204"}, {"value":"20120205"}, 
{"value":"20120206"}, {"value":"20120207"}, {"value":"20120208"}, 
{"value":"20120209"}, {"value":"20120210"}, {"value":"20120211"}, 
{"value":"20120212"}, {"value":"20120213"}, {"value":"20120214"}, 
{"value":"20120215"}, {"value":"20120216"}, {"value":"20120217"}, 
{"value":"20120218"}, {"value":"20120219"}, {"value":"20120220"}, 
{"value":"20120221"}, {"value":"20120222"}, {"value":"20120223"}, 
{"value":"20120224"}, {"value":"20120225"}, {"value":"201204"}, 
{"value":"20120402"}, {"value":"20120404"}, {"value":"20120405"}, 
{"value":"201206"}, {"value":"20120602"}, {"value":"20120605"}, 
{"value":"20120621"}, {"value":"201207"}, {"value":"20120702"}]}, 
{"name":"result", "options":[{"value":"0"}, {"value":"1"}]}, {"name":"goal", 
"value":"1"}, {"name":"creative", "value":"1"}, {"name":"offer", 
"options":[{"value":"4"}, {"value":"5"}, {"value":"6"}, {"value":"7"}]}, 
{"name":"level", "value":"day"}] 
© 2012 Equifax Inc. 33
Build a list of indexes 
partner = 0246 
campaign = c1 
index = days 
offer = 4, 5, 6, 7 
goal = 1 
result = 0, 1 
source = 1, 2 
creative = 1 
/ai/idx/0246/-/c1/days/4/1/0/1/1/ 
/ai/idx/0246/-/c1/days/5/1/0/1/1/ 
/ai/idx/0246/-/c1/days/6/1/0/1/1/ 
/ai/idx/0246/-/c1/days/7/1/0/1/1/ 
/ai/idx/0246/-/c1/days/4/1/1/1/1/ 
/ai/idx/0246/-/c1/days/5/1/1/1/1/ 
/ai/idx/0246/-/c1/days/6/1/1/1/1/ 
/ai/idx/0246/-/c1/days/7/1/1/1/1/ 
/ai/idx/0246/-/c1/days/4/1/0/2/1/ 
/ai/idx/0246/-/c1/days/5/1/0/2/1/ 
/ai/idx/0246/-/c1/days/6/1/0/2/1/ 
/ai/idx/0246/-/c1/days/7/1/0/2/1/ 
/ai/idx/0246/-/c1/days/4/1/1/2/1/ 
/ai/idx/0246/-/c1/days/5/1/1/2/1/ 
/ai/idx/0246/-/c1/days/6/1/1/2/1/ 
/ai/idx/0246/-/c1/days/7/1/1/2/1/ 
© 2012 Equifax Inc. 34
Querying the Data in Redis: One server (2 GB of RAM) 
The results with one server 
 Less than 100,000 rows: <1second 
 800,000 rows: about 4 seconds 
 9,000,000 rows about 45 seconds 
9 million max rows with one server 
Removed the 9-million-row limitation by using a 
compute cluster 
 10 servers 
 Divide and conquer strategy 
Three primary lookups: 
 Basic counts 
 Product counts (percent distribution) 
 Distinct values for our form multi-select list. 
© 2012 Equifax Inc. 35
Quick distinct values for our multi-select lists 
Generate form elements: We have HTML select lists. 
 We need to do a “select distinct(offer)”. 
 We write to a ‘distincts’ key so that we do not have look up the values and 
calculate the distinct values in code. 
Distinct values (for the form): SET 
 Key: /ai/distincts:filter/partner:1/client:1/campaign:1/time 
 Value: [“2012.06.01”,”2012.06.02”,”2012.06.03”] 
 Key: /ai/distincts:filter/partner:1/client:1/campaign:1/offer 
 Value: [“offer1”,”offer2”,”offer3”,”offer4”] 
 SETs are unique values so duplicates are dropped. 
© 2012 Equifax Inc. 36
Redis in production 
Redis stores the data on disk in the event of a system failure 
(or a reboot). 
 Two backup modes: append-only file and snapshots. 
 The default is to snapshot your data every N seconds if there are at least M 
changes. 
– after 900 sec (15 min) if at least 1 key changed 
– after 300 sec (5 min) if at least 10 keys changed 
– after 60 sec if at least 10000 keys changed 
 The default is append-only file, with fsync set to every second. 
 With fsync set to every second, performance is still very good. 
Supports master-slave replication. 
 We write to the master and read from the slave. 
Redis works best when you re-use open connections. 
© 2012 Equifax Inc. 37
Building a compute cluster 
© 2012 Equifax Inc. 38
Building a compute cluster: Outline 
Problem: large memory-hungry queries 
Solution: Shard the query on the biggest dimension 
Employing a message-passing paradigm 
Overview of architecture 
Dealing with long-running processes in PHP 
 pcntl_fork() – creates a child process with a new PID. 
 PHP forker – uses C to encapsulate PHP 
Using ZeroMQ sockets: PUSH-PULL 
Challenges 
© 2012 Equifax Inc. 39
Problem: large memory-hungry 
© 2012 Equifax Inc. 40 
queries
Problem: large memory-hungry queries 
partner = 0246 
campaign = c1 
index = days 
offer = 4, 5, 6, 7 
goal = 1 
result = 0, 1 
source = 1, 2 
creative = 1 
1. /ai/idx/0246/-/c1/days/4/1/0/1/1/ 
2. /ai/idx/0246/-/c1/days/5/1/0/1/1/ 
3. /ai/idx/0246/-/c1/days/6/1/0/1/1/ 
4. /ai/idx/0246/-/c1/days/7/1/0/1/1/ 
5. /ai/idx/0246/-/c1/days/4/1/1/1/1/ 
6. /ai/idx/0246/-/c1/days/5/1/1/1/1/ 
7. /ai/idx/0246/-/c1/days/6/1/1/1/1/ 
8. /ai/idx/0246/-/c1/days/7/1/1/1/1/ 
9. /ai/idx/0246/-/c1/days/4/1/0/2/1/ 
10. /ai/idx/0246/-/c1/days/5/1/0/2/1/ 
11. /ai/idx/0246/-/c1/days/6/1/0/2/1/ 
12. /ai/idx/0246/-/c1/days/7/1/0/2/1/ 
13. /ai/idx/0246/-/c1/days/4/1/1/2/1/ 
14. /ai/idx/0246/-/c1/days/5/1/1/2/1/ 
15. /ai/idx/0246/-/c1/days/6/1/1/2/1/ 
16. /ai/idx/0246/-/c1/days/7/1/1/2/1/ 
© 2012 Equifax Inc. 41
Solution: Shard the query 
© 2012 Equifax Inc. 42
Solution: Shard the query on the biggest dimension 
function getMaxPosition(array $positionArray) 
{ 
$pCount = array(); 
if (isset($positionArray['position4'])) { 
$position4 = $positionArray['position4']; 
if (is_array($position4)) { 
$pCount['position4'] = count($position4); 
} 
} 
if (isset($positionArray['position5'])) { 
$position5 = $positionArray['position5']; 
if (is_array($position5)) { 
$pCount['position5'] = count($position5); 
} 
} 
if (isset($positionArray['position6'])) { 
$position6 = $positionArray['position6']; 
if (is_array($position6)) { 
$pCount['position6'] = count($position6); 
} 
} 
// Find the max array size 
$max = 0; 
foreach ($pCount as $key => $value) { 
if ($value > $max) { 
$max = $value; 
$maxPosition = $key; 
} 
} 
return $maxPosition; 
} 
/** 
* Shards an array into smaller pieces. 
* @param array $positionArray The position that 
needs to be sharded. 
* @return array 
*/ 
function shardPosition($positionArray) 
{ 
© 2012 Equifax Inc. 43 
$shardArray = array_chunk($positionArray, 1); 
return $shardArray; 
}
Message passing paradigm 
© 2012 Equifax Inc. 44
Message passing 
Erlang 
Akka for Scala & Java 
Threading and locking 
 Is the code thread safe when updating data? 
 Locking: creates contention for the lock and waiting. 
Message passing to autonomous processes 
 The state of the process can be blocking (synchronous) or non-blocking 
(asynchronous). 
 In the case of a corrupt state, kill and start, again. 
Fail early; fail often. 
Auto-restart on failure (or timeout…). 
© 2012 Equifax Inc. 45
Solution: Build a compute 
cluster to process the queries 
using ZeroMQ 
© 2012 Equifax Inc. 46
What is zeroMQ 
http://zguide.zeromq.org/php:all 
It’s a networking library for message passing. 
It's fast enough to be the fabric for clustered products. 
Its asynchronous I/O model gives you scalable multicore 
applications, built as asynchronous message-processing 
tasks. 
My zeroMQ sockets: 
 REQ - REP (syncronous/blocking with a timeout) 
 REQ - ROUTER (asyncronous) 
 PUSH – PULL (fan out, fan in) 
 PUB – SUB 
© 2012 Equifax Inc. 47
Overview of architecture 
© 2012 Equifax Inc. 48
Overview of architecture 
zeromq.c0.uber = "ash-uhapsyslog01.meshdomain.ixicorp.com" 
zeromq.c0.cbroker = "ash-uhapsyslog01.meshdomain.ixicorp.com" 
zeromq.c0.ping = 5500 
zeromq.c0.frontend = 5550 
zeromq.c0.map = 5600 
zeromq.c0.reduce = 5650 
zeromq.c0.kill = 7800 
zeromq.c0.state = 7500 
zeromq.c1.uber = "ash-uhapsyslog01.meshdomain.ixicorp.com" 
zeromq.c1.cbroker = "ash-uhapsyslog01.meshdomain.ixicorp.com" 
zeromq.c1.ping = 5501 
zeromq.c1.frontend = 5551 
zeromq.c1.map = 5601 
zeromq.c1.reduce = 5651 
zeromq.c1.kill = 7801 
zeromq.c1.state = 7500 
ping: REQ – REP 
frontend: REQ – REP 
map: PUSH - PULL 
reduce: PUSH - PULL 
kill: PUB - SUB 
state: REQ - ROUTER 
© 2012 Equifax Inc. 49
Overview of architecture 
© 2012 Equifax Inc. 50
Overview of architecture 
Supervisor (uber) and Cluster brokers (cbroker) 
root 58690 0.0 0.5 257900 11100 ? Ssl 06:05 0:11 php /opt/aiclusters/uber.php u0 
root 58703 0.1 1.1 336064 22604 ? Ssl 06:05 0:39 php /opt/aiclusters/cbroker.php p0 c0 
root 58714 0.0 1.0 335296 21916 ? Ssl 06:05 0:07 php /opt/aiclusters/cbroker.php p0 c1 
root 58725 0.0 0.5 325544 11384 ? Ssl 06:05 0:04 php /opt/aiclusters/cbroker.php p0 c2 
root 58736 0.0 0.5 325544 11348 ? Ssl 06:05 0:02 php /opt/aiclusters/cbroker.php p0 c3 
root 58750 0.0 0.5 325532 11232 ? Ssl 06:05 0:01 php /opt/aiclusters/cbroker.php p0 c4 
root 58761 0.0 0.5 325544 11288 ? Ssl 06:05 0:07 php /opt/aiclusters/cbroker.php p0 c5 
root 58769 0.0 0.5 325544 11224 ? Ssl 06:05 0:02 php /opt/aiclusters/cbroker.php p0 c6 
root 58777 0.0 0.5 325532 11312 ? Ssl 06:05 0:04 php /opt/aiclusters/cbroker.php p0 c7 
Worker server 1: Server broker and server workers for each virtual cluster 
root 24532 0.0 0.4 284460 9832 ? Ssl 06:07 0:01 php /opt/aiclusters/sbroker.php p0 1 
root 24540 0.8 11.9 514392 244012 ? Ssl 06:07 5:42 php /opt/aiclusters/sworker.php c0 
root 24546 0.0 5.2 381264 106616 ? Ssl 06:07 0:36 php /opt/aiclusters/sworker.php c1 
root 24555 0.0 0.4 284716 9976 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c2 
root 24564 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c3 
root 24573 0.0 0.4 284716 9812 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c4 
root 24582 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c5 
root 24592 0.0 0.4 284716 9808 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c6 
root 24601 0.0 0.4 284716 9820 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c7 
Worker server 2: Server broker and server workers for each virtual cluster 
root 46997 0.0 0.4 284460 9824 ? Ssl 06:07 0:00 php /opt/aiclusters/sbroker.php p0 2 
root 47005 0.7 25.5 797800 522688 ? Ssl 06:07 5:04 php /opt/aiclusters/sworker.php c0 
root 47011 0.0 5.0 378192 103616 ? Ssl 06:07 0:28 php /opt/aiclusters/sworker.php c1 
root 47021 0.0 0.4 284716 10012 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c2 
root 47029 0.0 0.4 284716 9812 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c3 
root 47041 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c4 
root 47048 0.0 0.4 284716 9828 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c5 
root 47059 0.0 0.4 284716 9820 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c6 
root 47068 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c7 
© 2012 Equifax Inc. 51
Creating long running 
processes with PHP 
© 2012 Equifax Inc. 52
Long running processes in PHP 
pcntl_fork() – creates a child process with a new PID. 
system() or passthru() 
 When you try to run a script in the background, it creates a zombie process. 
PHP forker – uses C to encapsulate php-cli 
 https://code.google.com/p/php-forker/ 
 php_forker demonizes a php-cli process that runs a console php script 
 Processes run for weeks and months. 
© 2012 Equifax Inc. 53
Long running processes 
$theScript = '/usr/local/sbin/php-forker ' . $thisDirectory . $script; 
if (isset($param1)) { 
$theScript .= ' ' . $param1; 
} 
if (isset($param2)) { 
$theScript .= ' ' . $param2; 
} 
$escapedScript = escapeshellcmd($theScript); 
$logger->info('Executing: ' . $escapedScript); 
$result = exec($escapedScript, $output); 
if ($result != 'Ok') { 
$theOutput = var_export($output, true); 
$logger->err('php-forker error launching: ' . $theScript . ' Output: ' . 
$theOutput); 
echo $theOutput . "n"; 
} 
© 2012 Equifax Inc. 54
Supervisor 
© 2012 Equifax Inc. 55
$connectionsEndpoint = 'tcp://*:' . $connectionsPort; 
$monitorEndpoint = 'tcp://*:' . $monitorPort; 
$statusEndpoint = 'tcp://*:' . $statePort; 
$podControlEndpoint = 'tcp://*:' . $podControlPort; 
$podRestartEndpoint = 'tcp://*:' . $podRestartPort; 
$statusArray = array(); 
foreach ($connections as $key => $value) { 
$statusArray[$key] = array('availability' => 'active', 'status' => 'started'); 
} 
$context = new ZMQContext(); 
// Socket for connections output 
$mqConnections = new ZMQSocket($context, ZMQ::SOCKET_REP); 
$mqConnections->bind($connectionsEndpoint); 
// This socket receives 'start' and 'kill' messages from pod control. 
$restart = new ZMQSocket($context, ZMQ::SOCKET_ROUTER); 
$restart->bind($podRestartEndpoint); 
// Socket for status messages. 
$status = new ZMQSocket($context, ZMQ::SOCKET_ROUTER); 
$status->bind($statusEndpoint); 
// This socket publishes 'start' and 'kill' messages to node brokers. 
$podControl = new ZMQSocket($context, ZMQ::SOCKET_PUB); 
$podControl->bind($podControlEndpoint); 
© 2012 Equifax Inc. 56 
Supervisor
$poll = new ZMQPoll(); 
$poll->add($mqConnections, ZMQ::POLL_IN); 
$poll->add($status, ZMQ::POLL_IN); 
$poll->add($restart, ZMQ::POLL_IN); 
$read = $write = array(); 
while(true) { 
// One second timeout 
$events = $poll->poll($read, $write, 1000); 
if($events) { 
foreach ($read as $socket) { 
if ($socket === $status) { 
$zmsg = new Zmsg($status); 
$zmsg->recv(); 
$message = $zmsg->body(); 
// array('cluster' => $clusterId, 'status' => $message) 
$statusMessage = Zend_Json::decode($message, true); 
$statusArray = updateStatus($statusArray, $statusMessage); 
$logger->info("Received status message: $message"); 
// Publishing status message 
$monitor->send($message); 
//printf ("Received status message: %s %s", $message, PHP_EOL); 
} elseif ($socket === $mqConnections) { 
$message = $socket->recv(); 
$connectionsArray = getConnections($connections, $statusArray); 
$jsonConnections = Zend_Json::encode($connectionsArray); 
$mqConnections->send($jsonConnections); 
$connectionsCount = count($connectionsArray); 
$logger->info("Received $message, Sent $connectionsCount zeroMQ connections"); 
} elseif ($socket == $restart) { 
© 2012 Equifax Inc. 57 
Supervisor
Supervisor: Publish to Server Brokers 
/** 
* Publish a 'start' or 'kill' message to the node brokers. 
* @param ZMQSocket $podControl 
* @param string $clusterId 
* @param string $action 'kill' or 'start' 
* @param Zend_Log $logger 
*/ 
function pubMessageToNodeBrokers($podControl, $clusterId, $action, $logger) { 
// Publish a message to sbroker on the pod control port. 
$controlArray = array('cluster' => $clusterId, 'action' => $action); 
$json = Zend_Json::encode($controlArray); 
//$thePayload = '{"cluster":"' . $clusterId . '","action":"' . $action . '"}'; 
$podControl->send($json); 
$message = "On the pod control port, the supervisor published to $clusterId the following message: $action"; 
$logger->info($message); 
} 
© 2012 Equifax Inc. 58
Cluster Broker 
© 2012 Equifax Inc. 59
// Ping 
$context = new ZMQContext(); 
$ping = $context->getSocket(ZMQ::SOCKET_REP); 
$pingEndpoint = 'tcp://*:' . $connections['ping']; 
$ping->bind($pingEndpoint); 
// Receives the query from the frontend API. 
$frontend = $context->getSocket(ZMQ::SOCKET_REP); 
$frontendEndpoint = 'tcp://*:' . $connections['frontend']; 
$frontend->bind($frontendEndpoint); 
// Kill this broker 
$controller = $context->getSocket(ZMQ::SOCKET_SUB); 
$controlEndpoint = 'tcp://' . $connections['cbroker'] . ':' . $connections['kill']; 
$controller->connect($controlEndpoint); 
$controller->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE, ""); 
// Status messages 
$statusEndpoint = 'tcp://' . $connections['uber'] . ':' . $connections['state']; 
sendStatus($context, $statusEndpoint, $clusterId, 'waiting'); 
// Socket for map 
$map = new ZMQSocket($context, ZMQ::SOCKET_PUSH); 
// Use the $nodeCount 
$map->setSockOpt(ZMQ::SOCKOPT_HWM, $nodeCount); 
$mapEndpoint = 'tcp://*:' . $connections['map']; 
$map->bind($mapEndpoint); 
// Socket for reduce 
$reduce = new ZMQSocket($context, ZMQ::SOCKET_PULL); 
$reduceEndpoint = 'tcp://*:' . $connections['reduce']; 
$reduce->bind($reduceEndpoint); 
© 2012 Equifax Inc. 60 
Cluster broker
$read = $write = array(); 
while(true) { 
$poll = new ZMQPoll(); 
$poll->add($ping, ZMQ::POLL_IN); 
$poll->add($frontend, ZMQ::POLL_IN); 
$poll->add($controller, ZMQ::POLL_IN); 
$poll->add($reduce, ZMQ::POLL_IN); 
$events = $poll->poll($read, $write, 1000); // 1 second interval 
if ($events > 0) { 
foreach ($read as $socket) { 
if($socket === $ping) { 
$msg = $ping->recv(); 
$logger->info($clusterId . ': Sending pong'); 
$ping->send('pong'); 
} elseif ($socket === $frontend) { 
© 2012 Equifax Inc. 61 
Cluster broker
} elseif ($socket === $frontend) { 
$shardCount = 0; 
$reduceCount = 0; 
$reduceArray = array(); 
$timeArray = array(); 
$logger->info($cBrokerId . ': frontend receiving message'); 
$payload = $frontend->recv(); 
sendStatus($context, $statusEndpoint, $clusterId, 'processing'); 
$logger->info($cBrokerId . ': frontend set cluster to processing'); 
$paramsArray = Zend_Json::decode($payload, true); 
// get redis from the payload and unset 
$redisConnection = array('redis' => $paramsArray['redis']); 
unset($paramsArray['redis']); 
// Time array. Get rid of time values that are not day values. 
$theTimeArray = array(); 
$timeArray = $paramsArray['time']; 
foreach ($timeArray as $timeSlot) { 
$stringLength = strlen($timeSlot); 
if ($stringLength == 8) { 
$theTimeArray[] = $timeSlot; 
} 
} 
$paramsArray['time'] = $theTimeArray; 
// Shard on the largest array. 
$maxPosition = getMaxPosition($paramsArray); 
if ($maxPosition == null) { 
// None of the elements are arrays 
$logger->info($cBrokerId . ': frontend, none of the search criteria are arrays. Cannot be sharded!'); 
$finalArray = array_merge($paramsArray, $redisConnection); 
$workJson = Zend_Json::encode($finalArray); 
$shardCount++; 
$map->send($workJson); 
© 2012 Equifax Inc. 62 
Cluster broker
} else { 
$maxPositionArray = $paramsArray[$maxPosition]; 
$shardedArray = shardPosition($maxPositionArray); 
foreach ($shardedArray as $shard) { 
$workerArray = array(); 
foreach ($paramsArray as $key => $value) { 
if ($key != $maxPosition) { 
$workerArray[$key] = $value; 
} else { 
$workerArray[$key] = $shard; 
} 
} 
$finalArray = array_merge($workerArray, $redisConnection); 
$workJson = Zend_Json::encode($finalArray); 
$shardCount++; 
$logger->info($cBrokerId . ': Shard count is ' . $shardCount); 
$map->send($workJson); 
} 
} 
// unset variables 
unset($paramsArray); 
unset($theTimeArray); 
unset($shardedArray); 
unset($finalArray); 
© 2012 Equifax Inc. 63 
Cluster broker
} elseif ($socket === $reduce) { 
$jsonResult = $reduce->recv(); 
$result = Zend_Json::decode($jsonResult, true); 
$reduceCount++; 
$message = $cBrokerId . ': Received reduce. Reduce count ' . $reduceCount . '; shard count ' . $shardCount; 
$logger->info($message); 
if ($reduceCount < $shardCount) { 
$reduceArray[] = $result; 
} else { 
$reduceArray[] = $result; 
// Reduce it 
$message = $cBrokerId . ': Sending reduce array to reduce action'; 
$logger->info($message); 
$sendingResults = array(); 
if ($type == 'normal') { 
$sendingResults = reduceAction($reduceArray); 
} elseif ($type == 'distincts') { 
$sendingResults = reduceDistincts($reduceArray); 
} else { 
$sendingResults = reduceProduct($reduceArray); 
} 
$jsonofied = Zend_Json::encode($sendingResults); 
unset($sendingResults); 
$frontend->send($jsonofied); 
sendStatus($context, $statusEndpoint, $clusterId, 'waiting'); 
gc_collect_cycles(); 
} 
} elseif ($socket === $controller) { 
© 2012 Equifax Inc. 64 
Cluster broker
function reduceAction(array $reduceArray) { 
$timeArray = array(); 
foreach ($reduceArray as $valuesArray) { 
if (!empty($valuesArray)) { 
foreach ($valuesArray as $key => $value) { 
if (!empty($value)) { 
if (isset($timeArray[$key])) { 
$timeArray[$key] = $timeArray[$key] + $value; 
} else { 
$timeArray[$key] = $value; 
} 
} 
} 
} 
} 
if (empty($timeArray)) { 
return 'No data found!'; 
} 
unset($valuesArray); 
return $timeArray; 
} 
© 2012 Equifax Inc. 65 
Cluster broker
Front-end API 
© 2012 Equifax Inc. 66
function _sendPayload($ctx, $endpoint, array $request) 
{ 
$client = $ctx->getSocket(ZMQ::SOCKET_REQ); 
//$logger->info('sendPayload called: ' . $endpoint); 
$client->connect($endpoint); 
$json = Zend_Json::encode($request); 
$client->send($json); 
$poll = new ZMQPoll(); 
$poll->add($client, ZMQ::POLL_IN); 
$readable = $writable = array(); 
$timeout = 180000; // Three minutes in milliseconds 
$events = $poll->poll($readable, $writable, $timeout); 
//$logger->info('events poll is finished.'); 
$response = null; 
if ($events) { 
//$logger->info('There is an event.'); 
foreach($readable as $sock) { 
if ($sock == $client) { 
$response = $client->recv(); 
} else { 
$response= null; 
} 
} 
} 
© 2012 Equifax Inc. 67 
Cluster broker
Challenges 
© 2012 Equifax Inc. 68
Challenges 
Selling message passing as an alternative to threading. 
 We were having a memory problem. 
 It was not cpu bound. 
Learning zeroMQ 
 Worrying about making mistakes. 
 Do I have the right model for the task. 
Getting the application into production 
 With two pods 
Getting dev and test clusters. 
Selling the AI application. 
© 2012 Equifax Inc. 69
Questions? 
Comments? 
Observations? 
J. David Mitchell 
LinkedIn: david@dmitchell.biz 
Twitter: pingdavid 
© 2012 Equifax Inc. 70

More Related Content

Similar to Audience Intel presentation 2014

MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...MongoDB
 
Vishal_Agarwal_webMethods_CV_2016
Vishal_Agarwal_webMethods_CV_2016Vishal_Agarwal_webMethods_CV_2016
Vishal_Agarwal_webMethods_CV_2016vishal agarwal
 
7i server app-oap-vl2
7i server app-oap-vl27i server app-oap-vl2
7i server app-oap-vl2fho1962
 
Unlocking Engineering Observability with advanced IT analytics
Unlocking Engineering Observability with advanced IT analyticsUnlocking Engineering Observability with advanced IT analytics
Unlocking Engineering Observability with advanced IT analyticssource{d}
 
How We Built the Private AppExchange App (Apex, Visualforce, RWD)
How We Built the Private AppExchange App (Apex, Visualforce, RWD)How We Built the Private AppExchange App (Apex, Visualforce, RWD)
How We Built the Private AppExchange App (Apex, Visualforce, RWD)Salesforce Developers
 
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceOptimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceThousandEyes
 
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceEMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceThousandEyes
 
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceOptimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceThousandEyes
 
Q1FY21 Heroes - QPT Overview and Workshop.pdf
Q1FY21 Heroes - QPT Overview and Workshop.pdfQ1FY21 Heroes - QPT Overview and Workshop.pdf
Q1FY21 Heroes - QPT Overview and Workshop.pdfYasmineBoudhina
 
Yield Vision 20090331 Erb
Yield Vision 20090331 ErbYield Vision 20090331 Erb
Yield Vision 20090331 Erbguest20ce88c
 
Deliver Secure SQL Access for Enterprise APIs - August 29 2017
Deliver Secure SQL Access for Enterprise APIs - August 29 2017Deliver Secure SQL Access for Enterprise APIs - August 29 2017
Deliver Secure SQL Access for Enterprise APIs - August 29 2017Nishanth Kadiyala
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Workshop using open source software for mobile data collection workshop - a...
Workshop   using open source software for mobile data collection workshop - a...Workshop   using open source software for mobile data collection workshop - a...
Workshop using open source software for mobile data collection workshop - a...Wisconsin Land Information Association
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j
 
Business Intelligence Best Practice Summit: BI Quo Vadis
Business Intelligence Best Practice Summit:  BI Quo VadisBusiness Intelligence Best Practice Summit:  BI Quo Vadis
Business Intelligence Best Practice Summit: BI Quo VadisManagility
 
Duet enterprise executive overview
Duet enterprise executive overviewDuet enterprise executive overview
Duet enterprise executive overviewYi Guoyong
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays
 

Similar to Audience Intel presentation 2014 (20)

MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
MongoDB World 2018: From Disruption to Transformation: Document Databases, Do...
 
Vishal_Agarwal_webMethods_CV_2016
Vishal_Agarwal_webMethods_CV_2016Vishal_Agarwal_webMethods_CV_2016
Vishal_Agarwal_webMethods_CV_2016
 
7i server app-oap-vl2
7i server app-oap-vl27i server app-oap-vl2
7i server app-oap-vl2
 
Unlocking Engineering Observability with advanced IT analytics
Unlocking Engineering Observability with advanced IT analyticsUnlocking Engineering Observability with advanced IT analytics
Unlocking Engineering Observability with advanced IT analytics
 
How We Built the Private AppExchange App (Apex, Visualforce, RWD)
How We Built the Private AppExchange App (Apex, Visualforce, RWD)How We Built the Private AppExchange App (Apex, Visualforce, RWD)
How We Built the Private AppExchange App (Apex, Visualforce, RWD)
 
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceOptimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
 
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceEMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
 
Resume
ResumeResume
Resume
 
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid WorkforceOptimizing and Troubleshooting Digital Experience for a Hybrid Workforce
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
 
Q1FY21 Heroes - QPT Overview and Workshop.pdf
Q1FY21 Heroes - QPT Overview and Workshop.pdfQ1FY21 Heroes - QPT Overview and Workshop.pdf
Q1FY21 Heroes - QPT Overview and Workshop.pdf
 
Yield Vision 20090331 Erb
Yield Vision 20090331 ErbYield Vision 20090331 Erb
Yield Vision 20090331 Erb
 
Deliver Secure SQL Access for Enterprise APIs - August 29 2017
Deliver Secure SQL Access for Enterprise APIs - August 29 2017Deliver Secure SQL Access for Enterprise APIs - August 29 2017
Deliver Secure SQL Access for Enterprise APIs - August 29 2017
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Workshop using open source software for mobile data collection workshop - a...
Workshop   using open source software for mobile data collection workshop - a...Workshop   using open source software for mobile data collection workshop - a...
Workshop using open source software for mobile data collection workshop - a...
 
Neo4j: What's Under the Hood
Neo4j: What's Under the HoodNeo4j: What's Under the Hood
Neo4j: What's Under the Hood
 
Business Intelligence Best Practice Summit: BI Quo Vadis
Business Intelligence Best Practice Summit:  BI Quo VadisBusiness Intelligence Best Practice Summit:  BI Quo Vadis
Business Intelligence Best Practice Summit: BI Quo Vadis
 
Vinod_peddireddy
Vinod_peddireddyVinod_peddireddy
Vinod_peddireddy
 
Duet enterprise executive overview
Duet enterprise executive overviewDuet enterprise executive overview
Duet enterprise executive overview
 
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
apidays LIVE Paris 2021 - EDI & API on One Integration Platform by Mir Mustha...
 

Recently uploaded

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfryanfarris8
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2
 

Recently uploaded (20)

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 

Audience Intel presentation 2014

  • 1. © 2012 Equifax Inc. Audience Intel J. David Mitchell IXI Services, Equifax Inc. 7927 Jones Branch Drive, Suite 400 | McLean, VA 22102
  • 2. Overview or Outline 1. IXI Audience Intel application: High-level overview 1. A graph data store: Neo4J 2. A key-value data store: Redis 3. How we query and filter Redis 2. Using ZeroMQ to build a compute cluster 1. Design patterns 2. How it works © 2012 Equifax Inc. 2
  • 3. The Business Need Audience Intel © 2012 Equifax Inc. 3
  • 4. Background Audience Intel (AI) will help our customers monitor and glean insights from their online marketing campaigns. AI will track click-and-conversion counts for our customers. AI will profile the click-and-conversion counts using IXI segments.  WealthComplete Total Investable Assets  WealthComplete Deposits  Financial Cohorts  Income360  Discretionary Spending  Ability to Pay  Economic Spectrum  Aggregated FICO scores © 2012 Equifax Inc. 4
  • 5. Background: Use cases The end-user application will query a persistent data store to answer their business questions.  Mr. Jones from Razorfish would like to see clicks for all offers, all publishers and all creatives for June 1.  Mr. Jones from Razorfish would like to see ATP (Ability to Pay) for clicks for all offers, all publishers and all creatives for June 1.  Mr. Jones from Razorfish would like to see ATP for clicks for offer 1, goal 1, all publishers and creative 79 for June 1. A partner will place an IXI empty gif on their page (or in their ad), so that we will get an entry in our Web server logs.  GET /digi/23CE7C3A-FAE93B9DB863/a.gif?partner=0244&offer=1&goal=1&result=1&source=1&creativeid=1 Parsers will parse the log files for a time slot (e.g., a one-hour time slot) and do counts for each partner and campaign.  Clicks and conversions  IXI product lookup, e.g., based on the IP address or cookie (zip+4) © 2012 Equifax Inc. 5
  • 6. Background Query views or clicks or conversions of an audience  By time: hour, day, week, month or whole campaign  By partner (client or campaign)  By offer  By goal  By result: achieved or not  By source: publishers  By creative Glean insights from IXI products WealthComplete Deposits 8 FinancialCohorts 61 Income360 11 Discretionary Spending 9 EconomicCohorts 71 Ability to Pay (ATP) 4 FICO scores 6 Economic Spectrum 17 210 © 2012 Equifax Inc. 6
  • 7. Background: Examples of IXI products WealthComplete Deposits AB Tiers WealthComplete Deposits # HHs % 1 $250K+ 5,754,096 4.78% 2 $100K - $250K 12,536,735 10.42% 3 $50K - $100K 13,288,411 11.04% 4 $25K - $50K 14,722,193 12.23% 5 $10K - $25K 18,336,190 15.24% Ability to Pay (ATP) ED 6 $2.5K - $10K 20,246,716 16.82% Tiers Data Labels # HHs % 7 $0.01 - $2.5K 34,993,949 29.08% 1, 2, 3, 4, 5, 6, 7 Highest Ability to Pay: Top 20% 22,249,393 18.49% 8 $0 467,498 0.39% 8, 9, 10, 11, 12 High Ability to Pay 32,916,824 27.36% Total 120,345,788 100.00% 13, 14, 15, 16, 17 Moderate Ability to Pay 40,012,323 33.25% 18, 19, 20, 21, 22, 23, 24 Lowest Abillity to Pay: Bottom 20% 25,144,689 20.90% Total 120,323,229 100.00% © 2012 Equifax Inc. 7
  • 8. Background Rough sketch (or mockup) of the UI for AI. © 2012 Equifax Inc. 8
  • 9. Platform Architecture Audience Intel © 2012 Equifax Inc. 9
  • 10. Technology: High-Level overview Logs: Queuing component (Kafka)  Producer: Stream the log files into Kakfa.  Consumer: Parse the log files in Kafka and do counts. Summarize: Key-value storage for the counters  Scalable with fast lookups. Metadata storage: Partner, client, campaign  Schema-less  Expresses relationships between entities easily  Fast lookups. Filtering API A UI © 2012 Equifax Inc. 10
  • 11. Technology: High-Level overview Queuing component: Kafka  Apache Kafka is a distributed publish-subscribe messaging system written in Scala and Java.  incubator.apache.org/kafka/ Data storage for the counters: Redis  Advanced key-value store or data structure server written in C.  redis.io Metadata storage: Neo4j  Java Graph database (neo4j.org)  High availability cluster option Filtering API: PHP with ZeroMQ © 2012 Equifax Inc. 11
  • 12. RDMS Design Audience Intel © 2012 Equifax Inc. 12
  • 13. RDMS Design © 2012 Equifax Inc. 13
  • 14. RDMS Design Use cases  Mr. Jones from Razorfish would like to see clicks for all offers, all sources and all creatives for Dec. 29.  Mr. Jones from Razorfish would like to see ATP for clicks for all offers, all sources and all creatives for Dec. 29.  Mr. Jones from Razorfish would like to see ATP for clicks for offer 1, goal 1, all sources and creative 79 for Dec. 29. First case: Get sum of clicks  day 2012-12-29, partner 1, goal 1  SELECT SUM(count) AS sum FROM ai201212 WHERE partnerId = 1 AND day = 29 AND goal = 1; Second use case: Get ATP  day 2012-12-29, partner 1, goal 1  SELECT ti.productTierId, SUM(ti.tierHH) AS tierHH FROM ai201112 ai INNER JOIN tiers201212 ti ON ai.aiId = ti.aiId WHERE ai.partnerId = 1 AND ai.day = 29 AND ai.goal = 1 AND ti.productId = 1 GROUP BY ti.productTierId; © 2012 Equifax Inc. 14
  • 15. RDMS Design Third use case: Get ATP  day 2012-12-29, partner 1, goal 1  creativeId 79  SELECT ti.productTierId, SUM(ti.tierHH) AS tierHH, SUM(ai.count) AS sum, (SUM(ti.tierHH)/SUM(ai.count))*100 AS percent FROM ai201112 ai INNER JOIN tiers201212 ti ON ai.aiId = ti.aiId WHERE ai.partnerId = 1 AND ai.day = 29 AND ai.goal = 1 AND ti.productId = 1 AND ai.creativeId = 79 GROUP BY ti.productTierId; © 2012 Equifax Inc. 15
  • 16. Metadata to Neo4j Audience Intel © 2012 Equifax Inc. 16
  • 17. Metadata to Neo4j © 2012 Equifax Inc. 17
  • 18. Metadata to Neo4j © 2012 Equifax Inc. 18
  • 19. Metadata to Neo4j © 2012 Equifax Inc. 19
  • 20. Metadata to Neo4j © 2012 Equifax Inc. 20
  • 21. Metadata to Neo4j © 2012 Equifax Inc. 21
  • 22. Metadata to Neo4j © 2012 Equifax Inc. 22
  • 23. Metadata to Neo4j: Users, Firms & Data Sources © 2012 Equifax Inc. 23
  • 24. Metadata to Neo4j © 2012 Equifax Inc. 24
  • 25. Storing & Retrieving Data in Redis © 2012 Equifax Inc. 25
  • 26. What is Redis? Redis is an in-memory, advanced key-value store.  “in-memory”: like a cache (e.g., memcached).  “advanced”: stores complex data structures. It has a “hash table” data structure—array storage.  Key: currentInventoryOfFruit  Value: [“apples” : 2, “oranges” : 12, “tomatoes” : 6]  Increment: $redis->hIncrBy(‘currentInventoryOfFruit’, ‘kiwi’, 2);  Value: [“apples” : 2, “oranges” : 12, “kiwi” : 2, “tomatoes” : 6]  Increment: $redis->hIncrBy(‘currentInventoryOfFruit’, ‘kiwi’, 2);  Value: [“apples” : 2, “oranges” : 12, “kiwi” : 4, “tomatoes” : 6] © 2012 Equifax Inc. 26
  • 27. What is Redis? One of the unique aspects of Redis in the world of key-value caches is that Redis adds a hash-table data structure.  strings -- binary safe data  hashes -- maps between string fields and string values  lists -- lists of strings, sorted by insertion order  sets -- unordered collection of unique strings  sorted sets -- numerically ordered collection of unique strings Functionality  For a list, one can push and pop strings off a list.  For a hash, one can sort, increment, get all fields or get one field.  For a set, one can sort and perform operations such as intersections and unions.  For sorted sets, in addition to the things mentioned above, one can use the numeric score to retrieve a subset.  Redis also supports performing parallel queries, using MULTI/EXEC commands, and a PUBLISH and SUBSCRIBE feature set for posting to channels and listening for messages posted to channels. © 2012 Equifax Inc. 27
  • 28. How we store our Data in Redis Key: STRING  Verbose key: /ai/partner:1/client:1/campaign:1/time:2012.06.01/offer:1/goal:1/result:1/source:1/ creative:1/  Compact key: /ai/1/1/1/time:2012.06.01/1/1/1/1/1/  Value: 326 Need a legend: SSET  Key: /ai/legend:key/partner:1/client:1/campaign:1  Value: [{“0”:”partner”},{“1”:”client”},{“2”:”campaign”},{“3”:”time”},{“4”:”offer”}, {“5”:”goal”}, {“6”:”resultId”}, {“7”:”sourceId”}, {“8”:”creativeId”}] Key: STRING (dog visits)  Verbose key: /ai/partner:1/client:1/campaign:1/time:2012.06.01/dog:Rusty/breed:23/color:brn/ weight:36/height:24/  Compact key: /ai/1/1/1/2012.06.01/Rusty/23/brn/36/24/  Value: 3 © 2012 Equifax Inc. 28
  • 29. Legend for /0246/-/c1/ (partner/client/campaign) [{ 0 : "partner" }, { 1 : "client" }, { 2 : "campaign" }, { 3 : "time" }, { 4 : "offer" }, { 5 : "goal" }, { 6 : "result" }, { 7 : "source" }, { 8 : "creative" } ] © 2012 Equifax Inc. 29
  • 30. Querying the Data in Redis Created indexes for efficient lookups.  We are storing hour keys.  We created day indexes (e.g., 2012.07.11) for the hour keys.  We are storing day keys.  We created another index (e.g., ‘days’) for the day keys Hour keys  Key: /ai/key/1/1/1/2012.07.11.08/1/1/1/1/1/  Value: 26  Index: /ai/idx/1/1/1/2012.07.11/1/1/1/1/1/  Value (SET): [‘/ai/key/1/1/1/2012.07.11.08/1/1/1/1/1/’, ‘/ai/key/1/1/1/2012.07.11.09/1/1/1/1/1/’, ‘/ai/key/1/1/1/2012.07.11.10/1/1/1/1/1/’] © 2012 Equifax Inc. 30
  • 31. Querying the Data in Redis Day keys  Key: /ai/key/1/1/1/2012.07.11/1/1/1/1/1/  Value: 322  Index: /ai/idx/1/1/1/day/1/1/1/1/1/  Value (SET): [‘/ai/key/1/1/1/2012.07.09/1/1/1/1/1/’, ‘/ai/key/1/1/1/2012.07.10/1/1/1/1/1/’, ‘/ai/key/1/1/1/2012.07.11/1/1/1/1/1/’] Queries are typically for many days and for many dimensions.  We do an sUnion on the indexes.  Performs the union between N sets (e.g., 1,000 sets) and returns an array of keys. Step one: Does the item exist in an index? Get all of the keys in the indexes.  We build a list of the indexes that we want to query, and we query the indexes. – /ai/idx/1/1/1/day/1/1/1/1/1/ – /ai/idx/1/1/1/day/2/1/1/1/1/ – /ai/idx/1/1/1/day/3/1/1/1/1/  We do an sUnion to get all of the keys in the index sets. We can filter the days before get the keys.  We do an mGet to get all of the key values, and we sum up all of the keys for the total for a given day (or a given hour). © 2012 Equifax Inc. 31
  • 32. Querying the Data in Redis Secret sauce for queries  sUnion on the indexes  mGet (or getMultiple) on the keys mGet  An mGet gets the values of all the specified keys, and returns an array of values.  If one or more keys does not exist, the array will contain FALSE at the position of the key. © 2012 Equifax Inc. 32
  • 33. Query Params [{"name":"partner","value":"0246"}, {"name":"client", "value":"-"}, {"name":"campaign", "value":“c1"}, {"name":"source", "options":[{"value":"1"}, {"value":"2"}]}, {"name":"time", "options":[{"value":""}, {"value":"2012"}, {"value":"201202"}, {"value":"20120201"}, {"value":"20120202"}, {"value":"20120203"}, {"value":"20120204"}, {"value":"20120205"}, {"value":"20120206"}, {"value":"20120207"}, {"value":"20120208"}, {"value":"20120209"}, {"value":"20120210"}, {"value":"20120211"}, {"value":"20120212"}, {"value":"20120213"}, {"value":"20120214"}, {"value":"20120215"}, {"value":"20120216"}, {"value":"20120217"}, {"value":"20120218"}, {"value":"20120219"}, {"value":"20120220"}, {"value":"20120221"}, {"value":"20120222"}, {"value":"20120223"}, {"value":"20120224"}, {"value":"20120225"}, {"value":"201204"}, {"value":"20120402"}, {"value":"20120404"}, {"value":"20120405"}, {"value":"201206"}, {"value":"20120602"}, {"value":"20120605"}, {"value":"20120621"}, {"value":"201207"}, {"value":"20120702"}]}, {"name":"result", "options":[{"value":"0"}, {"value":"1"}]}, {"name":"goal", "value":"1"}, {"name":"creative", "value":"1"}, {"name":"offer", "options":[{"value":"4"}, {"value":"5"}, {"value":"6"}, {"value":"7"}]}, {"name":"level", "value":"day"}] © 2012 Equifax Inc. 33
  • 34. Build a list of indexes partner = 0246 campaign = c1 index = days offer = 4, 5, 6, 7 goal = 1 result = 0, 1 source = 1, 2 creative = 1 /ai/idx/0246/-/c1/days/4/1/0/1/1/ /ai/idx/0246/-/c1/days/5/1/0/1/1/ /ai/idx/0246/-/c1/days/6/1/0/1/1/ /ai/idx/0246/-/c1/days/7/1/0/1/1/ /ai/idx/0246/-/c1/days/4/1/1/1/1/ /ai/idx/0246/-/c1/days/5/1/1/1/1/ /ai/idx/0246/-/c1/days/6/1/1/1/1/ /ai/idx/0246/-/c1/days/7/1/1/1/1/ /ai/idx/0246/-/c1/days/4/1/0/2/1/ /ai/idx/0246/-/c1/days/5/1/0/2/1/ /ai/idx/0246/-/c1/days/6/1/0/2/1/ /ai/idx/0246/-/c1/days/7/1/0/2/1/ /ai/idx/0246/-/c1/days/4/1/1/2/1/ /ai/idx/0246/-/c1/days/5/1/1/2/1/ /ai/idx/0246/-/c1/days/6/1/1/2/1/ /ai/idx/0246/-/c1/days/7/1/1/2/1/ © 2012 Equifax Inc. 34
  • 35. Querying the Data in Redis: One server (2 GB of RAM) The results with one server  Less than 100,000 rows: <1second  800,000 rows: about 4 seconds  9,000,000 rows about 45 seconds 9 million max rows with one server Removed the 9-million-row limitation by using a compute cluster  10 servers  Divide and conquer strategy Three primary lookups:  Basic counts  Product counts (percent distribution)  Distinct values for our form multi-select list. © 2012 Equifax Inc. 35
  • 36. Quick distinct values for our multi-select lists Generate form elements: We have HTML select lists.  We need to do a “select distinct(offer)”.  We write to a ‘distincts’ key so that we do not have look up the values and calculate the distinct values in code. Distinct values (for the form): SET  Key: /ai/distincts:filter/partner:1/client:1/campaign:1/time  Value: [“2012.06.01”,”2012.06.02”,”2012.06.03”]  Key: /ai/distincts:filter/partner:1/client:1/campaign:1/offer  Value: [“offer1”,”offer2”,”offer3”,”offer4”]  SETs are unique values so duplicates are dropped. © 2012 Equifax Inc. 36
  • 37. Redis in production Redis stores the data on disk in the event of a system failure (or a reboot).  Two backup modes: append-only file and snapshots.  The default is to snapshot your data every N seconds if there are at least M changes. – after 900 sec (15 min) if at least 1 key changed – after 300 sec (5 min) if at least 10 keys changed – after 60 sec if at least 10000 keys changed  The default is append-only file, with fsync set to every second.  With fsync set to every second, performance is still very good. Supports master-slave replication.  We write to the master and read from the slave. Redis works best when you re-use open connections. © 2012 Equifax Inc. 37
  • 38. Building a compute cluster © 2012 Equifax Inc. 38
  • 39. Building a compute cluster: Outline Problem: large memory-hungry queries Solution: Shard the query on the biggest dimension Employing a message-passing paradigm Overview of architecture Dealing with long-running processes in PHP  pcntl_fork() – creates a child process with a new PID.  PHP forker – uses C to encapsulate PHP Using ZeroMQ sockets: PUSH-PULL Challenges © 2012 Equifax Inc. 39
  • 40. Problem: large memory-hungry © 2012 Equifax Inc. 40 queries
  • 41. Problem: large memory-hungry queries partner = 0246 campaign = c1 index = days offer = 4, 5, 6, 7 goal = 1 result = 0, 1 source = 1, 2 creative = 1 1. /ai/idx/0246/-/c1/days/4/1/0/1/1/ 2. /ai/idx/0246/-/c1/days/5/1/0/1/1/ 3. /ai/idx/0246/-/c1/days/6/1/0/1/1/ 4. /ai/idx/0246/-/c1/days/7/1/0/1/1/ 5. /ai/idx/0246/-/c1/days/4/1/1/1/1/ 6. /ai/idx/0246/-/c1/days/5/1/1/1/1/ 7. /ai/idx/0246/-/c1/days/6/1/1/1/1/ 8. /ai/idx/0246/-/c1/days/7/1/1/1/1/ 9. /ai/idx/0246/-/c1/days/4/1/0/2/1/ 10. /ai/idx/0246/-/c1/days/5/1/0/2/1/ 11. /ai/idx/0246/-/c1/days/6/1/0/2/1/ 12. /ai/idx/0246/-/c1/days/7/1/0/2/1/ 13. /ai/idx/0246/-/c1/days/4/1/1/2/1/ 14. /ai/idx/0246/-/c1/days/5/1/1/2/1/ 15. /ai/idx/0246/-/c1/days/6/1/1/2/1/ 16. /ai/idx/0246/-/c1/days/7/1/1/2/1/ © 2012 Equifax Inc. 41
  • 42. Solution: Shard the query © 2012 Equifax Inc. 42
  • 43. Solution: Shard the query on the biggest dimension function getMaxPosition(array $positionArray) { $pCount = array(); if (isset($positionArray['position4'])) { $position4 = $positionArray['position4']; if (is_array($position4)) { $pCount['position4'] = count($position4); } } if (isset($positionArray['position5'])) { $position5 = $positionArray['position5']; if (is_array($position5)) { $pCount['position5'] = count($position5); } } if (isset($positionArray['position6'])) { $position6 = $positionArray['position6']; if (is_array($position6)) { $pCount['position6'] = count($position6); } } // Find the max array size $max = 0; foreach ($pCount as $key => $value) { if ($value > $max) { $max = $value; $maxPosition = $key; } } return $maxPosition; } /** * Shards an array into smaller pieces. * @param array $positionArray The position that needs to be sharded. * @return array */ function shardPosition($positionArray) { © 2012 Equifax Inc. 43 $shardArray = array_chunk($positionArray, 1); return $shardArray; }
  • 44. Message passing paradigm © 2012 Equifax Inc. 44
  • 45. Message passing Erlang Akka for Scala & Java Threading and locking  Is the code thread safe when updating data?  Locking: creates contention for the lock and waiting. Message passing to autonomous processes  The state of the process can be blocking (synchronous) or non-blocking (asynchronous).  In the case of a corrupt state, kill and start, again. Fail early; fail often. Auto-restart on failure (or timeout…). © 2012 Equifax Inc. 45
  • 46. Solution: Build a compute cluster to process the queries using ZeroMQ © 2012 Equifax Inc. 46
  • 47. What is zeroMQ http://zguide.zeromq.org/php:all It’s a networking library for message passing. It's fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. My zeroMQ sockets:  REQ - REP (syncronous/blocking with a timeout)  REQ - ROUTER (asyncronous)  PUSH – PULL (fan out, fan in)  PUB – SUB © 2012 Equifax Inc. 47
  • 48. Overview of architecture © 2012 Equifax Inc. 48
  • 49. Overview of architecture zeromq.c0.uber = "ash-uhapsyslog01.meshdomain.ixicorp.com" zeromq.c0.cbroker = "ash-uhapsyslog01.meshdomain.ixicorp.com" zeromq.c0.ping = 5500 zeromq.c0.frontend = 5550 zeromq.c0.map = 5600 zeromq.c0.reduce = 5650 zeromq.c0.kill = 7800 zeromq.c0.state = 7500 zeromq.c1.uber = "ash-uhapsyslog01.meshdomain.ixicorp.com" zeromq.c1.cbroker = "ash-uhapsyslog01.meshdomain.ixicorp.com" zeromq.c1.ping = 5501 zeromq.c1.frontend = 5551 zeromq.c1.map = 5601 zeromq.c1.reduce = 5651 zeromq.c1.kill = 7801 zeromq.c1.state = 7500 ping: REQ – REP frontend: REQ – REP map: PUSH - PULL reduce: PUSH - PULL kill: PUB - SUB state: REQ - ROUTER © 2012 Equifax Inc. 49
  • 50. Overview of architecture © 2012 Equifax Inc. 50
  • 51. Overview of architecture Supervisor (uber) and Cluster brokers (cbroker) root 58690 0.0 0.5 257900 11100 ? Ssl 06:05 0:11 php /opt/aiclusters/uber.php u0 root 58703 0.1 1.1 336064 22604 ? Ssl 06:05 0:39 php /opt/aiclusters/cbroker.php p0 c0 root 58714 0.0 1.0 335296 21916 ? Ssl 06:05 0:07 php /opt/aiclusters/cbroker.php p0 c1 root 58725 0.0 0.5 325544 11384 ? Ssl 06:05 0:04 php /opt/aiclusters/cbroker.php p0 c2 root 58736 0.0 0.5 325544 11348 ? Ssl 06:05 0:02 php /opt/aiclusters/cbroker.php p0 c3 root 58750 0.0 0.5 325532 11232 ? Ssl 06:05 0:01 php /opt/aiclusters/cbroker.php p0 c4 root 58761 0.0 0.5 325544 11288 ? Ssl 06:05 0:07 php /opt/aiclusters/cbroker.php p0 c5 root 58769 0.0 0.5 325544 11224 ? Ssl 06:05 0:02 php /opt/aiclusters/cbroker.php p0 c6 root 58777 0.0 0.5 325532 11312 ? Ssl 06:05 0:04 php /opt/aiclusters/cbroker.php p0 c7 Worker server 1: Server broker and server workers for each virtual cluster root 24532 0.0 0.4 284460 9832 ? Ssl 06:07 0:01 php /opt/aiclusters/sbroker.php p0 1 root 24540 0.8 11.9 514392 244012 ? Ssl 06:07 5:42 php /opt/aiclusters/sworker.php c0 root 24546 0.0 5.2 381264 106616 ? Ssl 06:07 0:36 php /opt/aiclusters/sworker.php c1 root 24555 0.0 0.4 284716 9976 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c2 root 24564 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c3 root 24573 0.0 0.4 284716 9812 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c4 root 24582 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c5 root 24592 0.0 0.4 284716 9808 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c6 root 24601 0.0 0.4 284716 9820 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c7 Worker server 2: Server broker and server workers for each virtual cluster root 46997 0.0 0.4 284460 9824 ? Ssl 06:07 0:00 php /opt/aiclusters/sbroker.php p0 2 root 47005 0.7 25.5 797800 522688 ? Ssl 06:07 5:04 php /opt/aiclusters/sworker.php c0 root 47011 0.0 5.0 378192 103616 ? Ssl 06:07 0:28 php /opt/aiclusters/sworker.php c1 root 47021 0.0 0.4 284716 10012 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c2 root 47029 0.0 0.4 284716 9812 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c3 root 47041 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c4 root 47048 0.0 0.4 284716 9828 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c5 root 47059 0.0 0.4 284716 9820 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c6 root 47068 0.0 0.4 284716 9816 ? Ssl 06:07 0:00 php /opt/aiclusters/sworker.php c7 © 2012 Equifax Inc. 51
  • 52. Creating long running processes with PHP © 2012 Equifax Inc. 52
  • 53. Long running processes in PHP pcntl_fork() – creates a child process with a new PID. system() or passthru()  When you try to run a script in the background, it creates a zombie process. PHP forker – uses C to encapsulate php-cli  https://code.google.com/p/php-forker/  php_forker demonizes a php-cli process that runs a console php script  Processes run for weeks and months. © 2012 Equifax Inc. 53
  • 54. Long running processes $theScript = '/usr/local/sbin/php-forker ' . $thisDirectory . $script; if (isset($param1)) { $theScript .= ' ' . $param1; } if (isset($param2)) { $theScript .= ' ' . $param2; } $escapedScript = escapeshellcmd($theScript); $logger->info('Executing: ' . $escapedScript); $result = exec($escapedScript, $output); if ($result != 'Ok') { $theOutput = var_export($output, true); $logger->err('php-forker error launching: ' . $theScript . ' Output: ' . $theOutput); echo $theOutput . "n"; } © 2012 Equifax Inc. 54
  • 55. Supervisor © 2012 Equifax Inc. 55
  • 56. $connectionsEndpoint = 'tcp://*:' . $connectionsPort; $monitorEndpoint = 'tcp://*:' . $monitorPort; $statusEndpoint = 'tcp://*:' . $statePort; $podControlEndpoint = 'tcp://*:' . $podControlPort; $podRestartEndpoint = 'tcp://*:' . $podRestartPort; $statusArray = array(); foreach ($connections as $key => $value) { $statusArray[$key] = array('availability' => 'active', 'status' => 'started'); } $context = new ZMQContext(); // Socket for connections output $mqConnections = new ZMQSocket($context, ZMQ::SOCKET_REP); $mqConnections->bind($connectionsEndpoint); // This socket receives 'start' and 'kill' messages from pod control. $restart = new ZMQSocket($context, ZMQ::SOCKET_ROUTER); $restart->bind($podRestartEndpoint); // Socket for status messages. $status = new ZMQSocket($context, ZMQ::SOCKET_ROUTER); $status->bind($statusEndpoint); // This socket publishes 'start' and 'kill' messages to node brokers. $podControl = new ZMQSocket($context, ZMQ::SOCKET_PUB); $podControl->bind($podControlEndpoint); © 2012 Equifax Inc. 56 Supervisor
  • 57. $poll = new ZMQPoll(); $poll->add($mqConnections, ZMQ::POLL_IN); $poll->add($status, ZMQ::POLL_IN); $poll->add($restart, ZMQ::POLL_IN); $read = $write = array(); while(true) { // One second timeout $events = $poll->poll($read, $write, 1000); if($events) { foreach ($read as $socket) { if ($socket === $status) { $zmsg = new Zmsg($status); $zmsg->recv(); $message = $zmsg->body(); // array('cluster' => $clusterId, 'status' => $message) $statusMessage = Zend_Json::decode($message, true); $statusArray = updateStatus($statusArray, $statusMessage); $logger->info("Received status message: $message"); // Publishing status message $monitor->send($message); //printf ("Received status message: %s %s", $message, PHP_EOL); } elseif ($socket === $mqConnections) { $message = $socket->recv(); $connectionsArray = getConnections($connections, $statusArray); $jsonConnections = Zend_Json::encode($connectionsArray); $mqConnections->send($jsonConnections); $connectionsCount = count($connectionsArray); $logger->info("Received $message, Sent $connectionsCount zeroMQ connections"); } elseif ($socket == $restart) { © 2012 Equifax Inc. 57 Supervisor
  • 58. Supervisor: Publish to Server Brokers /** * Publish a 'start' or 'kill' message to the node brokers. * @param ZMQSocket $podControl * @param string $clusterId * @param string $action 'kill' or 'start' * @param Zend_Log $logger */ function pubMessageToNodeBrokers($podControl, $clusterId, $action, $logger) { // Publish a message to sbroker on the pod control port. $controlArray = array('cluster' => $clusterId, 'action' => $action); $json = Zend_Json::encode($controlArray); //$thePayload = '{"cluster":"' . $clusterId . '","action":"' . $action . '"}'; $podControl->send($json); $message = "On the pod control port, the supervisor published to $clusterId the following message: $action"; $logger->info($message); } © 2012 Equifax Inc. 58
  • 59. Cluster Broker © 2012 Equifax Inc. 59
  • 60. // Ping $context = new ZMQContext(); $ping = $context->getSocket(ZMQ::SOCKET_REP); $pingEndpoint = 'tcp://*:' . $connections['ping']; $ping->bind($pingEndpoint); // Receives the query from the frontend API. $frontend = $context->getSocket(ZMQ::SOCKET_REP); $frontendEndpoint = 'tcp://*:' . $connections['frontend']; $frontend->bind($frontendEndpoint); // Kill this broker $controller = $context->getSocket(ZMQ::SOCKET_SUB); $controlEndpoint = 'tcp://' . $connections['cbroker'] . ':' . $connections['kill']; $controller->connect($controlEndpoint); $controller->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE, ""); // Status messages $statusEndpoint = 'tcp://' . $connections['uber'] . ':' . $connections['state']; sendStatus($context, $statusEndpoint, $clusterId, 'waiting'); // Socket for map $map = new ZMQSocket($context, ZMQ::SOCKET_PUSH); // Use the $nodeCount $map->setSockOpt(ZMQ::SOCKOPT_HWM, $nodeCount); $mapEndpoint = 'tcp://*:' . $connections['map']; $map->bind($mapEndpoint); // Socket for reduce $reduce = new ZMQSocket($context, ZMQ::SOCKET_PULL); $reduceEndpoint = 'tcp://*:' . $connections['reduce']; $reduce->bind($reduceEndpoint); © 2012 Equifax Inc. 60 Cluster broker
  • 61. $read = $write = array(); while(true) { $poll = new ZMQPoll(); $poll->add($ping, ZMQ::POLL_IN); $poll->add($frontend, ZMQ::POLL_IN); $poll->add($controller, ZMQ::POLL_IN); $poll->add($reduce, ZMQ::POLL_IN); $events = $poll->poll($read, $write, 1000); // 1 second interval if ($events > 0) { foreach ($read as $socket) { if($socket === $ping) { $msg = $ping->recv(); $logger->info($clusterId . ': Sending pong'); $ping->send('pong'); } elseif ($socket === $frontend) { © 2012 Equifax Inc. 61 Cluster broker
  • 62. } elseif ($socket === $frontend) { $shardCount = 0; $reduceCount = 0; $reduceArray = array(); $timeArray = array(); $logger->info($cBrokerId . ': frontend receiving message'); $payload = $frontend->recv(); sendStatus($context, $statusEndpoint, $clusterId, 'processing'); $logger->info($cBrokerId . ': frontend set cluster to processing'); $paramsArray = Zend_Json::decode($payload, true); // get redis from the payload and unset $redisConnection = array('redis' => $paramsArray['redis']); unset($paramsArray['redis']); // Time array. Get rid of time values that are not day values. $theTimeArray = array(); $timeArray = $paramsArray['time']; foreach ($timeArray as $timeSlot) { $stringLength = strlen($timeSlot); if ($stringLength == 8) { $theTimeArray[] = $timeSlot; } } $paramsArray['time'] = $theTimeArray; // Shard on the largest array. $maxPosition = getMaxPosition($paramsArray); if ($maxPosition == null) { // None of the elements are arrays $logger->info($cBrokerId . ': frontend, none of the search criteria are arrays. Cannot be sharded!'); $finalArray = array_merge($paramsArray, $redisConnection); $workJson = Zend_Json::encode($finalArray); $shardCount++; $map->send($workJson); © 2012 Equifax Inc. 62 Cluster broker
  • 63. } else { $maxPositionArray = $paramsArray[$maxPosition]; $shardedArray = shardPosition($maxPositionArray); foreach ($shardedArray as $shard) { $workerArray = array(); foreach ($paramsArray as $key => $value) { if ($key != $maxPosition) { $workerArray[$key] = $value; } else { $workerArray[$key] = $shard; } } $finalArray = array_merge($workerArray, $redisConnection); $workJson = Zend_Json::encode($finalArray); $shardCount++; $logger->info($cBrokerId . ': Shard count is ' . $shardCount); $map->send($workJson); } } // unset variables unset($paramsArray); unset($theTimeArray); unset($shardedArray); unset($finalArray); © 2012 Equifax Inc. 63 Cluster broker
  • 64. } elseif ($socket === $reduce) { $jsonResult = $reduce->recv(); $result = Zend_Json::decode($jsonResult, true); $reduceCount++; $message = $cBrokerId . ': Received reduce. Reduce count ' . $reduceCount . '; shard count ' . $shardCount; $logger->info($message); if ($reduceCount < $shardCount) { $reduceArray[] = $result; } else { $reduceArray[] = $result; // Reduce it $message = $cBrokerId . ': Sending reduce array to reduce action'; $logger->info($message); $sendingResults = array(); if ($type == 'normal') { $sendingResults = reduceAction($reduceArray); } elseif ($type == 'distincts') { $sendingResults = reduceDistincts($reduceArray); } else { $sendingResults = reduceProduct($reduceArray); } $jsonofied = Zend_Json::encode($sendingResults); unset($sendingResults); $frontend->send($jsonofied); sendStatus($context, $statusEndpoint, $clusterId, 'waiting'); gc_collect_cycles(); } } elseif ($socket === $controller) { © 2012 Equifax Inc. 64 Cluster broker
  • 65. function reduceAction(array $reduceArray) { $timeArray = array(); foreach ($reduceArray as $valuesArray) { if (!empty($valuesArray)) { foreach ($valuesArray as $key => $value) { if (!empty($value)) { if (isset($timeArray[$key])) { $timeArray[$key] = $timeArray[$key] + $value; } else { $timeArray[$key] = $value; } } } } } if (empty($timeArray)) { return 'No data found!'; } unset($valuesArray); return $timeArray; } © 2012 Equifax Inc. 65 Cluster broker
  • 66. Front-end API © 2012 Equifax Inc. 66
  • 67. function _sendPayload($ctx, $endpoint, array $request) { $client = $ctx->getSocket(ZMQ::SOCKET_REQ); //$logger->info('sendPayload called: ' . $endpoint); $client->connect($endpoint); $json = Zend_Json::encode($request); $client->send($json); $poll = new ZMQPoll(); $poll->add($client, ZMQ::POLL_IN); $readable = $writable = array(); $timeout = 180000; // Three minutes in milliseconds $events = $poll->poll($readable, $writable, $timeout); //$logger->info('events poll is finished.'); $response = null; if ($events) { //$logger->info('There is an event.'); foreach($readable as $sock) { if ($sock == $client) { $response = $client->recv(); } else { $response= null; } } } © 2012 Equifax Inc. 67 Cluster broker
  • 68. Challenges © 2012 Equifax Inc. 68
  • 69. Challenges Selling message passing as an alternative to threading.  We were having a memory problem.  It was not cpu bound. Learning zeroMQ  Worrying about making mistakes.  Do I have the right model for the task. Getting the application into production  With two pods Getting dev and test clusters. Selling the AI application. © 2012 Equifax Inc. 69
  • 70. Questions? Comments? Observations? J. David Mitchell LinkedIn: david@dmitchell.biz Twitter: pingdavid © 2012 Equifax Inc. 70