Retail Reference Architecture
with MongoDB
Antoine Girbal
Principal Solutions Engineer, MongoDB Inc.
@antoinegirbal
Introduction
MongoDB Overview
4
MongoDB Strategic Advantages
Horizontally Scalable
-Sharding
Agile
Flexible
High Performance &
Strong Consistency
Applic...
5
Documents let you build your data to fit
your application
Relational MongoDB
{ customer_id : 1,
name : "Mark Smith",
cit...
6
Notions
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Architecture Overview
8
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social
Architecture ...
9
Commerce Functional Components
Information
Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotion...
Merchandising
11
Merchandising
Merchandising
MongoDB
Product Variation
Product Hierarchy
Pricing
Promotions
Ratings & Reviews
Calendar
S...
12
• Single view of a product: Single scalable catalog service
used by all services and channels
• Read volume is high and...
13
Merchandising - requirements
Requirement Example Challenge MongoDB
Single-view of product Blended description and
hiera...
14
Merchandising - Product Page
Product
images
General
Informatio
n
List of
Variations
External
Informatio
n
Localized
Des...
15
> db.definitions.findOne()
{ productId: "301671", // main product id
department: "Shoes",
category: "Shoes/Women/Pumps"...
16
• Get item from Product Id
db.definition.findOne( { productId: "301671" } )
• Get item from Product Ids
db.definition.f...
17
> db.variations.findOne()
{
_id: "730223104376", // the sku
productId: "301671", // references product id
thumbnail: "h...
18
• Get Variation from SKU
db.variation.find( { _id: "730223104376" } )
• Get all variations for a product, sorted by SKU...
20
Price: {
_id: "sku730223104376_store123",
currency: "USD",
price: 89.95,
lastUpdated: Date("2014/04/01"), // last updat...
21
• Get all prices for a given item
db.prices.find( { _id: /^p301671_/ )
• Get all prices for a given sku (price could be...
22
• The hierarchy of items typically follows:
• Company
– Division:
• Department: Women's shoe store
– Class: Pumps
»Item...
24
Merchandising – Browse and Search products
Browse by
category
Special
Lists
Filter by
attributes
Lists hundreds
of item...
25
The previous page presents many challenges:
• Response is needed within milliseconds for hundreds of
items
• Faceted se...
26
Merchandising – Browse and Search products
Hundreds
of sizes
One Item
Dozens of
colors
A single item may have thousands...
27
Merchandising – Browse and Search products
Images of the matching
variations are displayed
Hierarchy
Sort
parameter
Fac...
28
Merchandising – Traditional Architecture
Relational DB
System of Records
Full Text Search
Engine
Indexing
#1 obtain
sea...
29
The traditional architecture presents issues:
• 3 different systems to maintain: RDBMS, Search
engine, Caching layer
• ...
30
MongoDB Data Store
Merchandising - Architecture
Product
Summaries
Product
Definitions
Pricing
Promotions
Product
Variat...
31
The product index relies on the following parameters:
• The department (required): the main component of category, e.g....
32
> db.summaries.findOne()
{ "_id": "p39",
"title": "Evening Platform Pumps 39",
"department": "Shoes", "category": "Shoe...
33
• Get summary from item id
db.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKU
db.variati...
34
Merchandising – Query stats
Department Category Price Primary
attribute
Time
Average
(ms)
90th (ms) 95th (ms)
1 0 0 0 2...
Content
36
Content
Content
MongoDB
Metadata
Asset Repository
Digital Right Mgt
Access Control
Processing /
Encoding
Inventory
38
Inventory
Inventory
MongoDB
External Inventory
Internal Inventory
Regional Inventory
Purchase Orders
Fulfillment
Promot...
39
Demonstration Document Model
Definitions
• id: p0
Variations
• id: sku0
• pId: p0
Summary
• id: p0
• vars: [sku0,
sku1,...
40
db.stores.findOne()
{ "_id" : ObjectId("53549fd3e4b0aaf5d6d07f35"),
"className" : "catalog.Store",
"storeId" : "store0"...
41
• Get a store by storeId
db.stores.find({ productId: "301671" })
• Get nearby stores sorted by distance
db.stores.runCo...
42
> db.inventory.findOne()
{ "_id": "5354869f300487d20b2b011d",
"storeId": "store0",
"location": [
-86.95444,
33.40178
],...
43
• Get all items in a store
db.inventory.find({ storeId: "store100" })
• Get quantity for an item at a store
db.inventor...
44
• Aggregate total quantity for an item
db.inventory.aggregate([
{ $match: { productId: "p200" }},
{ $unwind: "$vars" },...
45
• Get inventory for an item near a point
db.runCommand(
{ "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] ...
Customer
47
Customer
Customer
MongoDB
Profile
Market Segment
Demographics
Wish List
Preference
Inbox
Sales / Support
Chat
Content
S...
Channels
49
Channels
Channels
MongoDB
Location
Store
Assortment
Point of Sale
Channel Definition
Planogram
Sales & Fulfillment
51
Sales & Fulfillment
Sales &
Fulfillment
MongoDB
Sales Transaction
Shipping
Tracking
Return & Exchange
Business Rule
Aud...
Insight
53
Insight
Insight
MongoDB
Advertising metrics
Clickstream
Recommendations
Session Capture
Activity Logging
Geo Tracking
P...
54
• Many user activities can be of interest:
– Search
– Product view, like or wish
– Shopping cart add / remove
– Sharing...
55
Activity logging - Architecture
MongoDB
HVDF
API
Activity Logging
User History
External
Analytics:
Hadoop,
Spark,
Storm...
56
Activity Logging
57
• You need to store and manage an incoming stream of data
samples (views, impressions, orders, …)
– High arrival rate o...
58
Activity logging - Requirements
Requirement MongoDB
Ingestion of 100ks of
writes / sec
Fast C++ process, multi-threads,...
59
Activity Logging using HVDF
HVDF (High Volume Data Feed):
• Open source reference implementation of high
volume writing...
60
Feed
High volume data feed architecture
Channel
Sample Sample Sample Sample
Source
Source
Processor
Inline
Processing
B...
61
HVDF -- High Volume Data Feed engine
HVDF – Reference implementation
REST
Service API
Processor
Plugins
Inline
Batch
St...
62
{ _id: ObjectId(),
geoCode: 1, // used to localize write operations
sessionId: "2373BB…",
device: { id: "1234",
type: "...
63
Dynamic schema for sample data
Sample 1
{
deviceId: XXXX,
time: Date(…)
type: "VIEW",
…
}
Channel
Sample 2
{
deviceId: ...
64
Channels are sharded
Shard
Shard
Shard
Shard
Shard
Shard Key:
Customer_id
Sample
{
customer_id: XXXX,
time: Date(…)
typ...
65
Channels are time partitioned
Channel
Sample Sample Sample Sample Sample Sample Sample Sample
- 2 days - 1 Day Today
Pa...
66
Dynamic queries on Channels
Channel
Sample Sample Sample Sample
App
App
App
Indexes
Queries Pipelines Map-Reduce
Create...
67
North America - West
North America - East
Europe
Geographically distributed system
Channel
Sample Sample Sample Sample
...
68
Insight
69
Insight – Useful Data
• Useful data for better shopping:
– User history (e.g. recently seen products)
– User statistics...
70
Example of real-time aggregation with Agg Framework
User Activity – Computing User Stats
71
Example of real-time aggregation with Agg Framework
User Activity – Computing User Stats
72
Let's simplify each activity recorded as the following:
{ userId: 123, type: order, itemId: 2, time }
{ userId: 123, ty...
73
Then run a 2nd mapreduce job that for each of the previous results:
- map: emits every combination of 2 items, starting...
74
The output collection can then be queried per item Id and sorted by
count, and cutoff at a threshold.
Need of index on ...
75
Example of Hadoop integration
User Activity – Hadoop integration
Social
77
Social
Social
MongoDB
Social Channels
User Network
Activity
Chat
Social Profiles
Community Mgt
Rewards /
Gamification
Conclusion
Appendix
83
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Single View of Product Clust...
84
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DCPrimary node replicates data
...
85
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Center Shard contains
all th...
86
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Center Shard contains
all th...
87
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Each region is able to
see t...
88
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Two nodes in each DC
for pai...
89
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Even if a DC goes out, the
d...
90
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Data set can grow, shards ca...
Thank You!
Antoine Girbal
Senior Solutions Engineer, MongoDB Inc.
@antoinegirbal
Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization
Upcoming SlideShare
Loading in...5
×

Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

1,691

Published on

During this session we will cover the best practices for implementing the insight component with MongoDB. This includes efficiently ingesting and managing a large volume of user activity logs, such as clickstreams, views, likes and sales. We'll dive into how you can derive user statistics, product maps and trends using different analytics tools like the aggregation framework, map/reduce or the Hadoop connector. We will also cover operational considerations, including low-latency data ingestion and seamless aggregation queries.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,691
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
56
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Fix stream box. Add validator box.
  • Would be useful to have diagram that mixes shards and time partitions
  • Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

    1. 1. Retail Reference Architecture with MongoDB Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal
    2. 2. Introduction
    3. 3. MongoDB Overview
    4. 4. 4 MongoDB Strategic Advantages Horizontally Scalable -Sharding Agile Flexible High Performance & Strong Consistency Application Highly Available -Replica Sets { customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
    5. 5. 5 Documents let you build your data to fit your application Relational MongoDB { customer_id : 1, name : "Mark Smith", city : "San Francisco", orders: [ { order_number : 13, store_id : 10, date: “2014-01-03”, products: [ {SKU: 24578234, Qty: 3, Unit_price: 350}, {SKU: 98762345, Qty: 1, Unit_Price: 110} ] }, { <...> } ] } CustomerID First Name Last Name City 0 John Doe New York 1 Mark Smith San Francisco 2 Jay Black Newark 3 Meagan White London 4 Edward Danields Boston Order Number Store ID Product Customer ID 10 100 Tablet 0 11 101 Smartphone 0 12 101 Dishwasher 0 13 200 Sofa 1 14 200 Coffee table 1 15 201 Suit 2
    6. 6. 6 Notions RDBMS MongoDB Database Database Table Collection Row Document Column Field
    7. 7. Architecture Overview
    8. 8. 8 Information Management Merchandising Content Inventory Customer Channel Sales & Fulfillment Insight Social Architecture Overview Customer Channels Amazon Ebay … Stores POS Kiosk … Mobile Smartphone Tablet Website Contact Center API Data and Service Integration Social Facebook Twitter … Data Warehouse Analytics Supply Chain Management System Suppliers 3rd Party In Network Web Servers Application Servers
    9. 9. 9 Commerce Functional Components Information Layer Look & Feel Navigation Customization Personalization Branding Promotions Chat Ads Customer's Perspective Research Browse Search Select Shopping Cart Purchase Checkout Receive Track Use Feedback Maintain Dialog Assist Market / Offer Guide Offer Semantic Search Recommend Rule-based Decisions Pricing Coupons Sell / Fullfill Orders Payments Fraud Detection Fulfillment Business Rules Insight Session Capture Activity Monitoring Customer Enterprise Information Management Merchandising Content Inventory Customer Channel Sales & Fulfillment Insight Social
    10. 10. Merchandising
    11. 11. 11 Merchandising Merchandising MongoDB Product Variation Product Hierarchy Pricing Promotions Ratings & Reviews Calendar Semantic Search Product Definition Localization
    12. 12. 12 • Single view of a product: Single scalable catalog service used by all services and channels • Read volume is high and sustained • Write volume spikes up during catalog update, but also allows real-time updating of a product • Advanced indexing and querying is a requirement: find product by SKU, category, color, etc • Geographical distribution and low latency achieved through replication • Scaling achieved through sharding Merchandising - principles
    13. 13. 13 Merchandising - requirements Requirement Example Challenge MongoDB Single-view of product Blended description and hierarchy of product to ensure availability on all channels Flexible document-oriented storage High sustained read volume with low latency Constant querying from online users and sales associates, requiring immediate response Fast indexed querying, replication allows local copy of catalog, sharding for scaling Spiky and real-time write volume Bulk update of full catalog without impacting production, real-time touch update Fast in-place updating, real- time indexing, , sharding for scaling Advanced querying Find product based on color, size, description Ad-hoc querying on any field, advanced secondary and compound indexing
    14. 14. 14 Merchandising - Product Page Product images General Informatio n List of Variations External Informatio n Localized Description
    15. 15. 15 > db.definitions.findOne() { productId: "301671", // main product id department: "Shoes", category: "Shoes/Women/Pumps", brand: "Guess", thumbnail: "http://cdn…/pump.jpg", image: "http://cdn…/pump1.jpg", // larger version of thumbnail title: "Evening Platform Pumps", description: "Those evening platform pumps put the perfect finishing touches on your most glamourous night-on-the-town outfit", shortDescription: "Evening Platform Pumps", style: "Designer", type: "Platform", rating: 4.5, // user rating lastUpdated: Date("2014/04/01"), // last update time … } Merchandising - Product Definition
    16. 16. 16 • Get item from Product Id db.definition.findOne( { productId: "301671" } ) • Get item from Product Ids db.definition.findOne( { productId: { $in: ["301671", "301672" ] } } ) • Get items by department db.definition.find({ department: "Shoes" }) • Get items by category prefix db.definition.find( { category: /^Shoes/Women/ } ) • Indices productId, department, category, lastUpdated Merchandising - Product Definition
    17. 17. 17 > db.variations.findOne() { _id: "730223104376", // the sku productId: "301671", // references product id thumbnail: "http://cdn…/pump-red.jpg", image: "http://cdn…/pump-red.jpg", // larger version of thumbnail size: 6.0, color: "Red", width: "B", heelHeight: 5.0, lastUpdated: Date("2014/04/01"), // last update time … } Merchandising - Product Variation
    18. 18. 18 • Get Variation from SKU db.variation.find( { _id: "730223104376" } ) • Get all variations for a product, sorted by SKU db.variation.find( { productId: "301671" } ).sort( { _id: 1 } ) • Indices productId, lastUpdated Merchandising - Product Variation
    19. 19. 20 Price: { _id: "sku730223104376_store123", currency: "USD", price: 89.95, lastUpdated: Date("2014/04/01"), // last update time … } _id: concatenation of item and store. Store: can be a store group or store id. Item: can be an item id or sku Indices: lastUpdated Merchandising – Pricing
    20. 20. 21 • Get all prices for a given item db.prices.find( { _id: /^p301671_/ ) • Get all prices for a given sku (price could be at item level) db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ]) • Get minimum and maximum prices for a sku db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price }, max: { $max : price} } }) • Get price for a sku and store id (returns up to 4 prices) db.prices.find( { _id: { $in: [ "sku730223104376_store1234", "sku730223104376_sgroup0", "p301671_store1234", "p301671_sgroup0"] , { price: 1 }) Merchandising - Pricing
    21. 21. 22 • The hierarchy of items typically follows: • Company – Division: • Department: Women's shoe store – Class: Pumps »Item: Guess classic pump • Variation: size 6 black Merchandising – Product Hierarchy
    22. 22. 24 Merchandising – Browse and Search products Browse by category Special Lists Filter by attributes Lists hundreds of item summaries Ideally a single query is issued to the database to obtain all items and metadata to display
    23. 23. 25 The previous page presents many challenges: • Response is needed within milliseconds for hundreds of items • Faceted search on many attributes of an item: department, brand, category, etc • Attributes to match may be at the variation level: color, size, etc, in which case the variation should be shown • One item may have thousands of variations. Only one item should be displayed even if many variations match • Efficient sorting on several attributes: price, popularity • Pagination feature which requires deterministic ordering Merchandising – Browse and Search products
    24. 24. 26 Merchandising – Browse and Search products Hundreds of sizes One Item Dozens of colors A single item may have thousands of variations
    25. 25. 27 Merchandising – Browse and Search products Images of the matching variations are displayed Hierarchy Sort parameter Faceted Search
    26. 26. 28 Merchandising – Traditional Architecture Relational DB System of Records Full Text Search Engine Indexing #1 obtain search results IDs ApplicationCache #2 obtain objects by ID Pre-joined into objects
    27. 27. 29 The traditional architecture presents issues: • 3 different systems to maintain: RDBMS, Search engine, Caching layer • A search returns a list of IDs which then are looked up in the cache as a batch or one by one. It significantly increases latency of response • RDBMS schema is complex and static • The search index needs to be refreshed at intervals • Setup does not allow efficient pagination Merchandising – Traditional Architecture
    28. 28. 30 MongoDB Data Store Merchandising - Architecture Product Summaries Product Definitions Pricing Promotions Product Variations Ratings & Reviews #1 Obtain results
    29. 29. 31 The product index relies on the following parameters: • The department (required): the main component of category, e.g. "Shoes" • An indexed attribute (optional) – Category path, e.g. "Shoes/Women/Pumps" – Price range (based on online prices) – List of Item Attributes, e.g. Brand = Guess – List of Variation Attributes, e.g. Color = red • A non-indexed attribute (optional) – List of Item Secondary Attributes, e.g. Style = Designer – List of Variation Secondary Attributes, e.g. heel height = 5.0 • As well as Sorting, e.g. Price Low to High Merchandising – Product Summaries
    30. 30. 32 > db.summaries.findOne() { "_id": "p39", "title": "Evening Platform Pumps 39", "department": "Shoes", "category": "Shoes/Women/Pumps", "thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg", "price": 145.99, "rating": 0.95, "attrs": [ { "brand" : "Guess"}, … ], "sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …], "vars": [ { "sku": "sku2441", "thumbnail": "http://cdn…/pump-small-39.jpg.Blue", "image": "http://cdn…/pump-39.jpg.Blue", "attrs": [ { "size": 6.0 }, { "color": "Blue" }, …], "sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …], }, … Many more skus … ] } Indices: vars.sku, department + attr + category, department + vars.attrs + category, department + category, department + price, department + rating Merchandising – Product Summaries
    31. 31. 33 • Get summary from item id db.variation.find({ _id: "p301671" }) • Get summary's specific variation from SKU db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } ) • Get summary by department, sorted by rating db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } ) • Get summary with mix of parameters db.variation.find( { department : "Shoes" , "vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" : 180.99 } } ) Merchandising - Product Summaries
    32. 32. 34 Merchandising – Query stats Department Category Price Primary attribute Time Average (ms) 90th (ms) 95th (ms) 1 0 0 0 2 3 3 1 1 0 0 1 2 2 1 0 1 0 1 2 3 1 1 1 0 1 2 2 1 0 0 1 0 1 2 1 1 0 1 0 1 1 1 0 1 1 1 2 2 1 1 1 1 0 1 1 1 0 0 2 1 3 3 1 1 0 2 0 2 2 1 0 1 2 10 20 35 1 1 1 2 0 1 1
    33. 33. Content
    34. 34. 36 Content Content MongoDB Metadata Asset Repository Digital Right Mgt Access Control Processing / Encoding
    35. 35. Inventory
    36. 36. 38 Inventory Inventory MongoDB External Inventory Internal Inventory Regional Inventory Purchase Orders Fulfillment Promotions
    37. 37. 39 Demonstration Document Model Definitions • id: p0 Variations • id: sku0 • pId: p0 Summary • id: p0 • vars: [sku0, sku1, …] Stores • id: s1 • Loc: [22, 33] Inventory • store: s1 • pId: p0 • vars: [{sku: sku0, q: 3}, {sku: sku2, q: 2}] Product
    38. 38. 40 db.stores.findOne() { "_id" : ObjectId("53549fd3e4b0aaf5d6d07f35"), "className" : "catalog.Store", "storeId" : "store0", "name" : "Bessemer store", "address" : { "addr1" : "1st Main St", "city" : "Bessemer", "state" : "AL", "zip" : "12345", "country" : "US" }, "location" : [ -86.95444, 33.40178 ] … } Inventory - Stores
    39. 39. 41 • Get a store by storeId db.stores.find({ productId: "301671" }) • Get nearby stores sorted by distance db.stores.runCommand({ "geoNear" : "stores" , "near" : [ -82.800672 , 40.090844] , "maxDistance" : 10.0 , "spherical" : true} Inventory - Stores
    40. 40. 42 > db.inventory.findOne() { "_id": "5354869f300487d20b2b011d", "storeId": "store0", "location": [ -86.95444, 33.40178 ], "productId": "p0", "vars": [ { "sku": "sku1", "q": 14 }, { "sku": "sku3", "q": 7 }, { "sku": "sku7", "q": 32 }, { "sku": "sku14", "q": 65 }, ... ] } Inventory - Quantities
    41. 41. 43 • Get all items in a store db.inventory.find({ storeId: "store100" }) • Get quantity for an item at a store db.inventory.find({ storeId: "store100", productId: "p200" }) • Get quantity for a sku at a store db.inventory.find( { storeId: "store100", productId: "p200", "vars.sku": "sku11736" }, { "vars.$": 1 }) • Increment / decrement inventory for an item at a store db.inventory.update( { storeId: "store100", productId: "p200", "vars.sku": "sku11736" }, { $inc: { "vars.$.q": 20 } }) • Indices: productId, storeId + productId, location (geo) + productId Inventory - Stores
    42. 42. 44 • Aggregate total quantity for an item db.inventory.aggregate([ { $match: { productId: "p200" }}, { $unwind: "$vars" }, { $group: { _id: "result", count: {$sum: 1} } }]) { "_id" : "result", "count" : 101752 } • Aggregate total quantity for a store db.inventory.aggregate([ { $match: { storeId: "store100" }}, { $unwind: "$vars" }, { $group: { _id: "result", count: {$sum: 1} } }]) { "_id" : "result", "count" : 29347 } Inventory - Stores
    43. 43. 45 • Get inventory for an item near a point db.runCommand( { "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] , "maxDistance" : 10.0 , "spherical" : true, limit: 10, query: { productId: "p200", "vars.sku": "sku11736" }}) • Get closest store with available sku db.runCommand( { "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] , "maxDistance" : 10.0 , "spherical" : true, limit: 10, query: { productId: "p200", vars: { $elemMatch: { "sku": "sku11736", q: { $gt: 0 } }}}}}) Inventory - Stores
    44. 44. Customer
    45. 45. 47 Customer Customer MongoDB Profile Market Segment Demographics Wish List Preference Inbox Sales / Support Chat Content Subscription
    46. 46. Channels
    47. 47. 49 Channels Channels MongoDB Location Store Assortment Point of Sale Channel Definition Planogram
    48. 48. Sales & Fulfillment
    49. 49. 51 Sales & Fulfillment Sales & Fulfillment MongoDB Sales Transaction Shipping Tracking Return & Exchange Business Rule Audit Shopping Cart
    50. 50. Insight
    51. 51. 53 Insight Insight MongoDB Advertising metrics Clickstream Recommendations Session Capture Activity Logging Geo Tracking Product Analytics Customer Insight Application Logs
    52. 52. 54 • Many user activities can be of interest: – Search – Product view, like or wish – Shopping cart add / remove – Sharing on social network – Ad impression, Clickstream • Those will be used to compute: – Product Map (relationships, etc) – User Preferences – Recommendations – Trends Activity Logging – Data of interest
    53. 53. 55 Activity logging - Architecture MongoDB HVDF API Activity Logging User History External Analytics: Hadoop, Spark, Storm, … User Preferences Recommendations Trends Product Map Apps Internal Analytics: Aggregation, MR All user activity is recorded MongoDB – Hadoop Connector Personalization
    54. 54. 56 Activity Logging
    55. 55. 57 • You need to store and manage an incoming stream of data samples (views, impressions, orders, …) – High arrival rate of data from many sources – Variable schema of arriving data – You need to control retention period of data • You need to compute derivative data sets based on these samples – Aggregations and statistics based on data – Roll-up data into pre-computed reports and summaries • You need low latency access to up-to-date data (user history) – Flexible indexing of raw and derived data sets – Rich querying based on time + meta-data fields in samples Activity Logging – Problem statement
    56. 56. 58 Activity logging - Requirements Requirement MongoDB Ingestion of 100ks of writes / sec Fast C++ process, multi-threads, multi-locks. Horizontal scaling via sharding. Sequential IO via time partitioning. Flexible schema Dynamic schema, each document is independent. Data is stored the same format and size as it is inserted. Fast querying on varied fields, sorting Secondary Btree indexes can lookup and sort the data in milliseconds. Easy clean up of old data Deletes are typically as expensive as inserts. Getting free deletes via time partitioning.
    57. 57. 59 Activity Logging using HVDF HVDF (High Volume Data Feed): • Open source reference implementation of high volume writing with MongoDB • Rest API server written in Java with most popular libraries • Public project, issues can be logged • Can be run as-is, or customized as needed
    58. 58. 60 Feed High volume data feed architecture Channel Sample Sample Sample Sample Source Source Processor Inline Processing Batch Processing Stream Processing The Channel is the sequence of data samples that a sensor sends into the platform. Sources send samples into the Channel Processors generate derivative Channels from other Channel data
    59. 59. 61 HVDF -- High Volume Data Feed engine HVDF – Reference implementation REST Service API Processor Plugins Inline Batch Stream Channel Data Storage Raw Channel Data Aggregated Rollup T1 Aggregated Rollup T2 Query Processor Streaming spout Custom Stream Processing Logic Incoming Sample Stream POST /feed/channel/data GET /feed/channeldata?time=XX X&range=YYY Real-time Queries
    60. 60. 62 { _id: ObjectId(), geoCode: 1, // used to localize write operations sessionId: "2373BB…", device: { id: "1234", type: "mobile/iphone", userAgent: "Chrome/34.0.1847.131" } type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity itemId: "301671", sku: "730223104376", order: { id: "12520185", … }, location: [ -86.95444, 33.40178 ], tags: [ "smartphone", "iphone", … ], // associated tags timeStamp: Date("2014/04/01 …") } User Activity - Model
    61. 61. 63 Dynamic schema for sample data Sample 1 { deviceId: XXXX, time: Date(…) type: "VIEW", … } Channel Sample 2 { deviceId: XXXX, time: Date(…) type: "CART_ADD", cartId: 123, … } Sample 3 { deviceId: XXXX, time: Date(…) type: “FB_LIKE” } Each sample can have variable fields
    62. 62. 64 Channels are sharded Shard Shard Shard Shard Shard Shard Key: Customer_id Sample { customer_id: XXXX, time: Date(…) type: "VIEW", } Channel You choose how to partition samples Samples can have dynamic schema Scale horizontally by adding shards Each shard is highly available
    63. 63. 65 Channels are time partitioned Channel Sample Sample Sample Sample Sample Sample Sample Sample - 2 days - 1 Day Today Partitioning keeps indexes manageable This is where all of the writes happen Older partitions are read only for best possible concurrency Queries are routed only to needed partitions Partition 1 Partition 2 Partition N Each partition is a separate collection Efficient and space reclaiming purging of old data
    64. 64. 66 Dynamic queries on Channels Channel Sample Sample Sample Sample App App App Indexes Queries Pipelines Map-Reduce Create custom indexes on Channels Use full mongodb query language to access samples Use mongodb aggregation pipelines to access samples Use mongodb inline map-reduce to access samples Full access to field, text, and geo indexing
    65. 65. 67 North America - West North America - East Europe Geographically distributed system Channel Sample Sample Sample Sample Source Source Source Source Source Source Sample Sample Sample Sample Geo shards per location Clients write local nodes Single view of channel available globally
    66. 66. 68 Insight
    67. 67. 69 Insight – Useful Data • Useful data for better shopping: – User history (e.g. recently seen products) – User statistics (e.g. total purchases, visits) – User interests (e.g. likes videogames and SciFi) – User social network – Cross-selling: people who bought this item had tendency to buy those other items (e.g. iPhone, then bought iPhone case) – Up-selling: people who looked at this item eventually bought those items (alternative product that may be better)
    68. 68. 70 Example of real-time aggregation with Agg Framework User Activity – Computing User Stats
    69. 69. 71 Example of real-time aggregation with Agg Framework User Activity – Computing User Stats
    70. 70. 72 Let's simplify each activity recorded as the following: { userId: 123, type: order, itemId: 2, time } { userId: 123, type: order, itemId: 3, time } { userId: 234, type: order, itemId: 7, time } To calculate items bought by a user for a period of time, let's use MongoDB's Map Reduce: - Match activities of type "order" for the past 2 weeks - map: emit the document by userId - reduce: push all itemId in a list - Output looks like { _id: userId, items: [2, 3, 8] } User Activity – Items frequently bought together
    71. 71. 73 Then run a 2nd mapreduce job that for each of the previous results: - map: emits every combination of 2 items, starting with lowest itemId - reduce: sum up the total. - output looks like { _id: { a: 2, b: 3 } , count: 36 } User Activity – Items frequently bought together
    72. 72. 74 The output collection can then be queried per item Id and sorted by count, and cutoff at a threshold. Need of index on { _id.a, count } and { _id.b, count } You then obtain an affiliation collection with docs like: { itemId: 2, affil: [ { id: 3, weight: 36}, { id: 8, weight: 23} ] } User Activity – Items frequently bought together
    73. 73. 75 Example of Hadoop integration User Activity – Hadoop integration
    74. 74. Social
    75. 75. 77 Social Social MongoDB Social Channels User Network Activity Chat Social Profiles Community Mgt Rewards / Gamification
    76. 76. Conclusion
    77. 77. Appendix
    78. 78. 83 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Single View of Product Cluster Topology
    79. 79. 84 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DCPrimary node replicates data to all secondaries in the shard as fast as possible Single View of Product Cluster Topology
    80. 80. 85 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Center Shard contains all the data for stores in Center region Single View of Product Cluster Topology
    81. 81. 86 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Center Shard contains all the data for stores in Center region Local writes enable very high throughput of updates Single View of Product Cluster Topology
    82. 82. 87 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Each region is able to see the data of all stores from its “local” DC. Single View of Product Cluster Topology
    83. 83. 88 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Two nodes in each DC for painless maintenance with zero downtime Single View of Product Cluster Topology
    84. 84. 89 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Even if a DC goes out, the database remains fully available thanks to automated failover Single View of Product Cluster Topology
    85. 85. 90 West DC Primary Primary Primary Shard “West” Shard “Center” Shard “East” Center DC East DC Data set can grow, shards can add up, without any rewrite of the application code Single View of Product Cluster Topology
    86. 86. Thank You! Antoine Girbal Senior Solutions Engineer, MongoDB Inc. @antoinegirbal
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×