Retail Profitability, Improved
Small Blobs and Big Logs
Different use cases for MongoDB at ShopperTrak
About Us
Retail Profitability, Improved
■Largest retail foot traffic counting and analytics
company in the world.
About Us
Retail Profitability, Improved
=
=
Some Stats
■20 year old company
■50,000 plus retail locations
■100+ countries
■20 billion people counted every year
Retail Profitability, Improved
Who IsThis Guy
■Former co-founder/CTO of a video analytics company
acquired by ShopperTrak in late 2012
■Using MongoDB since 2009 (before auto-sharding
even)
■Currently working across product and technology to
bring new ideas to life at ShopperTrak, primarily in
video and Bluetooth Low-Energy
Retail Profitability, Improved
Hash Check -- Guy talking
should look somewhat like
this
OurTechnology Stack
■Java/Groovy with a bit of Node.js
■Increasingly more Javascript single page apps in
Angular and React
■Data Layer
▼Oracle, and lots of it still
▼MySQL from acquisitions (AWS RDS)
▼Hazelcast for in-memory storage
▼S3 for video and file storage
▼Redshift for data warehousing
▼And of course MongoDB (6 nodes, 2 shards)
Retail Profitability, Improved
You’re welcome Larry
4 Distinct Uses at ShopperTrak
Application Uses (major)
▼Small Blobs
V Small number of documents (rows)
V Each document has a rich object
structure
▼Big Logs
V Lots (billions) of documents
V Each document is very narrow
Generic Cross Application Uses
(minor)
▼Distributed Queue
V Many applications sharing a work
queue
▼Distributed Lock
V Many applications being able to lock a
resource globally
Retail Profitability, Improved
Cross Application Uses – Distributed Queue
■Sounds a bit crazy and “off label” but works well and
we’re not alone. [1]
■What
▼Tasks/Events are generated into a global distributed log
▼Isolated applications can take jobs for processing
Retail Profitability, Improved
App 1 App 2 Event App
Mongo DB
appendfindAndModify
[1] https://blog.serverdensity.com/queueing-mongodb-using-mongodb/ , https://github.com/gaillard/mongo-queue-java ,
https://blog.serverdensity.com/replacing-rabbitmq-with-mongodb/, http://www.ibm.com/developerworks/library/os-mongodb-work-queues/
Cross Application Uses – Distributed Queue
■Why
▼findAndModify allows for atomic task ‘check-out’
Like SELECT … FOR UPDATE in SQL but without transactional
overhead
▼Durable structure
▼Infrastructure already exists and well understood
■How
▼Query by queue name, timestamp, completed
▼Ensure size of document doesn’t change on update
Retail Profitability, Improved
Cross Application Uses – Distributed Queue
Retail Profitability, Improved
collection.findAndModify(
[
queue: name,
'meta.ttl': ['$lt': now],
'meta.completed': 0l
],
[
'$set': [
'meta.worker': worker,
'meta.ttl': now + timeToLive
]
]
)
Cross Application Uses – Distributed Locks
Retail Profitability, Improved
■Needed a persistent globally distributed lock
■What
▼Distributed isolated systems atomically lock resources
▼Moving toward fronting this system with in-memory
Hazelcast cluster
■Why
▼Mongo provides us an atomic operation that is also
persisted
▼Performance is fast as long as lock lookup is indexed
■How
▼findAndModify on lock name, lock time < expiration
▼TTL on lock to automatically remove if processes die
Small Blobs
Retail Profitability, Improved
Physical people counting device
(e.g. orbit)
TCE (site)
THUB (camera hub)
TPPL_CNTR (people
counting device)
A site, this is the starting point for
both equipment and reporting
Mainly a remnant of mall setups. New
orbits do not have a physical hub.
TZN (zone)
A zone that collects together a logical
collection of traffic monitoring points
TTMP (monitoring
point)
A door can have 1 or more orbits
monitoring it through a traffic monitoring
point
REPORTINGPHYSICAL DEVICES
TCE (org)
~ 85,000 documents
Small Blobs
Retail Profitability, Improved
{
"_id" : NumberLong(81086),
"external" : {
"tz" : "America/New_York",
"loc" : [-75.160977, 39.948711],
"level" : "street",
"format_addr" : "1206 Walnut Street …"
},
"siteId" : NumberLong(81086),
"name" : "81086 OB Walnut",
"timezone" : "America/New_York",
…
"type" : "Retailer",
"org" : {
"id" : NumberLong(5521),
"name" : "Orange Branded LLC - ATT"
},
"search" : "5521 ORANGE BRANDED …",
"addrs" : [{
"type" : "Main",
"addr1" : "1206 Walnut St",
…
}],
"hubs" : [{
"id" : NumberLong(110240),
"devs" : [{
"id" : NumberLong(398884)
"endDt" : ISODate("9999-12-31T06:00:00Z"),
}]
}],
…
"zones" : [{
"id" : NumberLong(362337),
"type" : "TotalProp",
"tmps" : [{
"id" : NumberLong(404269),
"name" : "Standard",
"devs" : [NumberLong(398884),
NumberLong(398885) ]
}]
}],
"updateDate" : ISODate("2014-10-30T21:38:33.33Z")
}
http://stackoverflow.com/q/17268770/311525
Set Operations Before and After
Retail Profitability, Improved
db.colors.aggregate([
{'$unwind' : "$left"},
{'$unwind' : "$right"},
{'$project': {
value:"$left",same:{$cond:
[{$eq:["$left","$right"]}, 1, 0]}
}
},
{'$group' : {
_id: {id:'$_id', val:'$value'},
doesMatch:{$max:"$same"}
}
},
{'$match':{doesMatch:1}},
]);
db.colors.aggregate([
{'$project': {
int:{$setIntersection:["$left","$right"]}
}
}
]);
Union and set difference even longer.
One example 46 lines long to prepare
for union.
db.colors.insert({
_id: 1,
left : ['red', 'green'],
right : ['green', 'blue']
});
Without Set Operations With Set Operations
Big Logs
■Many very small documents
■Names are 1 character because Mongo uses storage
space for field names in every document.
Retail Profitability, Improved
> 7 billion documents/yr
{
“s" : 81086,
“i" : 3,
“o" : 4,
“t" : ISODate(“2014-12-31T06:00:00Z")
“d" : 300
}
Map Reduce vs Aggregation
Retail Profitability, Improved
db.salescanonicaldata.mapReduce(
function () {
var key= {rawFileId:this.rawFileId,
internalSiteId:this.internalSiteId,
siteZoneId:this.siteZoneId,
categoryId:this.categoryId,
hourEndTS:this.hourEndTS};
var value = {transactionId: this.transactionId,
totalTransactions:this.totalTransactions,
totalItems:this.totalItems,
totalSalesAmount:this.totalSalesAmount};
emit(key,value);
},
function(key, values) {
var result={totalTransactions: 0, totalItems: 0,
totalSalesAmount: 0.0};
var uniqueTransactionIdMap = [];
for(var i = 0; i<values.length; i++) {
if(uniqueTransactionIdMap.indexOf(values[i].transactionId)==-1){
// Count Distinct TransactionIds only once
uniqueTransactionIdMap.push(values[i].transactionId);
result.totalTransactions+=values[i].totalTransactions;
}
result.totalItems+=values[i].totalItems;
result.totalSalesAmount+=values[i].totalSalesAmount;
}
result.createdTS = new Date();
return result;
},
{out: 'mr_out'}
)
db.runCommand({
aggregate: 'salescanonicaldata',
allowDiskUse: true,
cursor: { },
pipeline: [
{$group: {_id: {key: {rawFileId: '$rawFileId',
internalSiteId: '$internalSiteId',
siteZoneId: '$siteZoneId',
categoryId: '$categoryId',
hourEndTS: '$hourEndTS'},
transactionId: '$transactionId'},
totalTransactions: {$first: '$totalTransactions'},
totalItems: {$sum: '$totalItems'},
totalSalesAmount: {$sum: '$totalSalesAmount'},
}},
{$group: {_id: '$_id.key',
totalTransactions: {$sum: '$totalTransactions'},
totalItems: {$sum: '$totalItems'},
totalSalesAmount: {$sum: '$totalSalesAmount'},
}},
{$project: {_id: 1,
totalTransactions: 1,
totalItems:1,
totalSalesAmount:1,
createdTS: {$literal: ISODate()},
}},
{$out: 'pa_out'},
],
})
> 20 x
Faster
Experience Moving Data
Issues
■Organizationally many people tied to Oracle
■Lots of business logic coded in PL/SQL
Solutions
■Move toward microservice architecture
■Schema in NoSQL still matters, get the model right
■Process in parallel to ensure business consistency and
prove value at scale
Retail Profitability, Improved
QUESTIONS?
Retail Profitability, Improved

How ShopperTrak Is Using MongoDB

  • 1.
    Retail Profitability, Improved SmallBlobs and Big Logs Different use cases for MongoDB at ShopperTrak
  • 2.
    About Us Retail Profitability,Improved ■Largest retail foot traffic counting and analytics company in the world.
  • 3.
  • 4.
    Some Stats ■20 yearold company ■50,000 plus retail locations ■100+ countries ■20 billion people counted every year Retail Profitability, Improved
  • 5.
    Who IsThis Guy ■Formerco-founder/CTO of a video analytics company acquired by ShopperTrak in late 2012 ■Using MongoDB since 2009 (before auto-sharding even) ■Currently working across product and technology to bring new ideas to life at ShopperTrak, primarily in video and Bluetooth Low-Energy Retail Profitability, Improved Hash Check -- Guy talking should look somewhat like this
  • 6.
    OurTechnology Stack ■Java/Groovy witha bit of Node.js ■Increasingly more Javascript single page apps in Angular and React ■Data Layer ▼Oracle, and lots of it still ▼MySQL from acquisitions (AWS RDS) ▼Hazelcast for in-memory storage ▼S3 for video and file storage ▼Redshift for data warehousing ▼And of course MongoDB (6 nodes, 2 shards) Retail Profitability, Improved You’re welcome Larry
  • 7.
    4 Distinct Usesat ShopperTrak Application Uses (major) ▼Small Blobs V Small number of documents (rows) V Each document has a rich object structure ▼Big Logs V Lots (billions) of documents V Each document is very narrow Generic Cross Application Uses (minor) ▼Distributed Queue V Many applications sharing a work queue ▼Distributed Lock V Many applications being able to lock a resource globally Retail Profitability, Improved
  • 8.
    Cross Application Uses– Distributed Queue ■Sounds a bit crazy and “off label” but works well and we’re not alone. [1] ■What ▼Tasks/Events are generated into a global distributed log ▼Isolated applications can take jobs for processing Retail Profitability, Improved App 1 App 2 Event App Mongo DB appendfindAndModify [1] https://blog.serverdensity.com/queueing-mongodb-using-mongodb/ , https://github.com/gaillard/mongo-queue-java , https://blog.serverdensity.com/replacing-rabbitmq-with-mongodb/, http://www.ibm.com/developerworks/library/os-mongodb-work-queues/
  • 9.
    Cross Application Uses– Distributed Queue ■Why ▼findAndModify allows for atomic task ‘check-out’ Like SELECT … FOR UPDATE in SQL but without transactional overhead ▼Durable structure ▼Infrastructure already exists and well understood ■How ▼Query by queue name, timestamp, completed ▼Ensure size of document doesn’t change on update Retail Profitability, Improved
  • 10.
    Cross Application Uses– Distributed Queue Retail Profitability, Improved collection.findAndModify( [ queue: name, 'meta.ttl': ['$lt': now], 'meta.completed': 0l ], [ '$set': [ 'meta.worker': worker, 'meta.ttl': now + timeToLive ] ] )
  • 11.
    Cross Application Uses– Distributed Locks Retail Profitability, Improved ■Needed a persistent globally distributed lock ■What ▼Distributed isolated systems atomically lock resources ▼Moving toward fronting this system with in-memory Hazelcast cluster ■Why ▼Mongo provides us an atomic operation that is also persisted ▼Performance is fast as long as lock lookup is indexed ■How ▼findAndModify on lock name, lock time < expiration ▼TTL on lock to automatically remove if processes die
  • 12.
    Small Blobs Retail Profitability,Improved Physical people counting device (e.g. orbit) TCE (site) THUB (camera hub) TPPL_CNTR (people counting device) A site, this is the starting point for both equipment and reporting Mainly a remnant of mall setups. New orbits do not have a physical hub. TZN (zone) A zone that collects together a logical collection of traffic monitoring points TTMP (monitoring point) A door can have 1 or more orbits monitoring it through a traffic monitoring point REPORTINGPHYSICAL DEVICES TCE (org) ~ 85,000 documents
  • 13.
    Small Blobs Retail Profitability,Improved { "_id" : NumberLong(81086), "external" : { "tz" : "America/New_York", "loc" : [-75.160977, 39.948711], "level" : "street", "format_addr" : "1206 Walnut Street …" }, "siteId" : NumberLong(81086), "name" : "81086 OB Walnut", "timezone" : "America/New_York", … "type" : "Retailer", "org" : { "id" : NumberLong(5521), "name" : "Orange Branded LLC - ATT" }, "search" : "5521 ORANGE BRANDED …", "addrs" : [{ "type" : "Main", "addr1" : "1206 Walnut St", … }], "hubs" : [{ "id" : NumberLong(110240), "devs" : [{ "id" : NumberLong(398884) "endDt" : ISODate("9999-12-31T06:00:00Z"), }] }], … "zones" : [{ "id" : NumberLong(362337), "type" : "TotalProp", "tmps" : [{ "id" : NumberLong(404269), "name" : "Standard", "devs" : [NumberLong(398884), NumberLong(398885) ] }] }], "updateDate" : ISODate("2014-10-30T21:38:33.33Z") } http://stackoverflow.com/q/17268770/311525
  • 14.
    Set Operations Beforeand After Retail Profitability, Improved db.colors.aggregate([ {'$unwind' : "$left"}, {'$unwind' : "$right"}, {'$project': { value:"$left",same:{$cond: [{$eq:["$left","$right"]}, 1, 0]} } }, {'$group' : { _id: {id:'$_id', val:'$value'}, doesMatch:{$max:"$same"} } }, {'$match':{doesMatch:1}}, ]); db.colors.aggregate([ {'$project': { int:{$setIntersection:["$left","$right"]} } } ]); Union and set difference even longer. One example 46 lines long to prepare for union. db.colors.insert({ _id: 1, left : ['red', 'green'], right : ['green', 'blue'] }); Without Set Operations With Set Operations
  • 15.
    Big Logs ■Many verysmall documents ■Names are 1 character because Mongo uses storage space for field names in every document. Retail Profitability, Improved > 7 billion documents/yr { “s" : 81086, “i" : 3, “o" : 4, “t" : ISODate(“2014-12-31T06:00:00Z") “d" : 300 }
  • 16.
    Map Reduce vsAggregation Retail Profitability, Improved db.salescanonicaldata.mapReduce( function () { var key= {rawFileId:this.rawFileId, internalSiteId:this.internalSiteId, siteZoneId:this.siteZoneId, categoryId:this.categoryId, hourEndTS:this.hourEndTS}; var value = {transactionId: this.transactionId, totalTransactions:this.totalTransactions, totalItems:this.totalItems, totalSalesAmount:this.totalSalesAmount}; emit(key,value); }, function(key, values) { var result={totalTransactions: 0, totalItems: 0, totalSalesAmount: 0.0}; var uniqueTransactionIdMap = []; for(var i = 0; i<values.length; i++) { if(uniqueTransactionIdMap.indexOf(values[i].transactionId)==-1){ // Count Distinct TransactionIds only once uniqueTransactionIdMap.push(values[i].transactionId); result.totalTransactions+=values[i].totalTransactions; } result.totalItems+=values[i].totalItems; result.totalSalesAmount+=values[i].totalSalesAmount; } result.createdTS = new Date(); return result; }, {out: 'mr_out'} ) db.runCommand({ aggregate: 'salescanonicaldata', allowDiskUse: true, cursor: { }, pipeline: [ {$group: {_id: {key: {rawFileId: '$rawFileId', internalSiteId: '$internalSiteId', siteZoneId: '$siteZoneId', categoryId: '$categoryId', hourEndTS: '$hourEndTS'}, transactionId: '$transactionId'}, totalTransactions: {$first: '$totalTransactions'}, totalItems: {$sum: '$totalItems'}, totalSalesAmount: {$sum: '$totalSalesAmount'}, }}, {$group: {_id: '$_id.key', totalTransactions: {$sum: '$totalTransactions'}, totalItems: {$sum: '$totalItems'}, totalSalesAmount: {$sum: '$totalSalesAmount'}, }}, {$project: {_id: 1, totalTransactions: 1, totalItems:1, totalSalesAmount:1, createdTS: {$literal: ISODate()}, }}, {$out: 'pa_out'}, ], }) > 20 x Faster
  • 17.
    Experience Moving Data Issues ■Organizationallymany people tied to Oracle ■Lots of business logic coded in PL/SQL Solutions ■Move toward microservice architecture ■Schema in NoSQL still matters, get the model right ■Process in parallel to ensure business consistency and prove value at scale Retail Profitability, Improved
  • 18.