Retail referencearchitecture productcatalog

One Catalog Service
to rule them all
Antoine Girbal
Principal Solutions Engineer, MongoDB Inc.
@antoinegirbal

The many catalogs problem
1. One department in charge of master product works hard at fitting
4
data into SQL tables
2. Resulting data sits in a SQL server with a couple replicas. It's
forbidden to hit it more than 100 times / sec
3. Other departments need to access the data way more often for
their own services
4. Other departments need more information that is not available
since it did not fit in that long devised rigid SQL schema
5. ETLs and Message Buses are put in place for other teams to try
figure it out themselves…
6. Data becomes inconsistent, fragmented, not up-to-date…
Problem visible both internally and by customers!

Search – Using Solr
5
How many Catalogs and
Catalog Caches do you have?

The many catalogs problem
6
Online Store
Catalog
Marketing
Catalog
Dozens of catalogs!
Department 3
Catalog
Product Department
Master
Catalog
Department 4
Catalog
Department 5
Catalog
Department 1
Catalog
Message
Bus
ETLs

Goal: Single View of Product
• Single view of a product, one central catalog
7
service
• Flexible schema containing all useful data
• Read volume high and sustained, 100k reads / s
• Can seamlessly take write spikes during catalog
update
• Advanced indexing and querying
• Geographical distribution for HA and low latency

Agenda
1. MongoDB Overview
2. Catalog Service Architecture
3. Data Store Models
4. Product Search
8

MongoDB is a great fit
• Holds complex JSON structures
• Dynamic Schema for Agility
• complex querying and in-place updating
• Secondary, compound and geo indexing
• full consistency, durability, atomic operations
• HA and geo-distributed via Replication
• Near linear scaling via Sharding
• Overall, MongoDB is a unique fit!
10

MongoDB Strategic Advantages
11
Application
Horizontally Scalable
-Sharding
Agile
Flexible
High Performance &
Strong Consistency
Highly
Available
-Replica Sets
{ customer: “roger”,
date: new Date(),
comment: “Spirited Away”,
tags: [“Tezuka”, “Manga”]}

build your data to fit your application
Relational MongoDB
12
{ customer_id : 1,
name : "Mark Smith",
city : "San Francisco",
orders: [ {
order_number : 13,
store_id : 10,
date: “2014-01-03”,
products: [
{SKU: 24578234,
Qty: 3,
Unit_price: 350},
{SKU: 98762345,
Qty: 1,
Unit_Price: 110}
]
},
{ <...> }
]
}
CustomerID First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Danields Boston
Order Number Store ID Product Customer ID
10 100 Tablet 0
11 101 Smartphone 0
12 101 Dishwasher 0
13 200 Sofa 1
14 200 Coffee table 1
15 201 Suit 2

Notions
13
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field

Architecture Overview
15
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social
Customer
Channels
Amazon
Ebay
…
Stores
POS
Kiosk
…
Mobile
Smartphone
Tablet
Website
Contact
Center
Social
Facebook
Twitter
…
Application
Servers
API
Data and
Service
Integration
Suppliers
Supply Chain
Management
System
Data
Warehouse
Analytics
3rd Party
In Network
Web
Servers

Commerce Functional Components
16
Customer Enterprise
Information
Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotions
Chat
Ads
Customer's
Perspective
Research
Browse
Search
Select
Shopping Cart
Purchase
Checkout
Receive
Track
Use
Feedback
Maintain
Dialog
Assist
Market / Offer
Guide
Offer
Semantic
Search
Recommend
Rule-based
Decisions
Pricing
Coupons
Sell / Fullfill
Orders
Payments
Fraud
Detection
Fulfillment
Business Rules
Insight
Session
Capture
Activity
Monitoring
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social

Merchandising Components
Merchandising
17
MongoDB
Item
Variant
Hierarchy
Localization
Pricing
Promotions
Ratings & Reviews
Calendar
Semantic Search

Merchandising - Architecture
19
MongoDB Data Store
Items Pricing Promotions
Variants
Ratings &
Reviews
Search Engine
…
Product Service API
Online Store Marketing Inventory SCMS Public API …

Models - Product Page
21
Product
images
General
Informatio
n
List of
Variants
External
Informatio
n
Localized
Description

Models - Overview
• Item: the overall product info (e.g. Levi’s 501)
• Variant: a specific variant of an item (e.g. in black size 6)
22
which typically has a specific SKU / UPC
• Price: price information may vary based on the store, the
variant, etc
• Hierarchy: the item taxonomy
• Facet: facets to search products by
• Vendors: a given sku may be available through several
vendors if the site is a marketplace
> Don't try to fit all in the same document!

23
One Item
Hundreds
of sizes
Dozens of
colors
Models – Overview

Models - Overview
• A single item may have thousands of variants
• Each variant can have hundreds of attributes
• Altogether a single item can represent many MBs
24
worth of JSON text
• Don't try to fit everything into the same
document!
• Use a schema that is natural and fits the API

Models - Item Model
{ "_id": "054VA72303012P", // the item id
25
"desc": [ // item descriptions
{ "lang": "en", "val": "Give your dressy look a lift with ..." }, ...
],
"name": "Women's Kate Ivory Peep-Toe Stiletto Heel",
"category": "/84700/80009/1282094266/1200003270", // hierarchy
"brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" },
"assets": { // references to all assets
"imgs": [
{ "img": { "width": 1900, "height": 1900, "src": "http://..." }, ...
]
},
"shipping": { // shipping specs }, "specs": { // item specs },
"attrs": [ // list of items attributes (facets)
{ "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" },
{ "name": "Toe", "value": "Open toe" }, ...
],
"variants": { // quick info on the variants
"cnt": 9,
"attrs": [
{ "dispType": "DROPDOWN", "name": "Color" },
{ "dispType": "DROPDOWN", "name": "Shoe Size" }, ...
]
},
"lastUpdated": 1400877254787 // keep track of updates }

Models - Item Model
• Get item by id
26
db.definition.findOne( { _id: "301671" } )
• Get items from list of ids
db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } )
• Get items by department
db.definition.find({ category: { $regex: "^/84700/" } })
• Get items by category prefix
db.definition.find( { category: { $regex: "^/84700/80009/" } } )
• Secondary Indices
name, category, lastUpdated

Models – Variant Model
{ "_id": "05458452563", // the sku
27
"name": "Width:Medium,Color:Ivory,Shoe Size:6.5",
"itemId": "054VA72303012P", // reference to the item id
"altIds": { "upc": "632576103580" },
"assets": { // list of assets specific to variant
"imgs": [
{ "width": 1900, "height": 1900, "src": "http://..." },
{ "width": 1900, "height": 1900, "src": "http://..." }, ...
]
},
"attrs": [ // list of attributes specific to variant
{ "name": "Width", "value": "Medium" },
{ "name": "Color", "family": "White", "value": "Ivory" },
{ "name": "Size", "value": "6.5" }, ...
],
"lastUpdated": 1400877254787 // keep track of updates }

Models – Variant Model
• Get variant from SKU
28
db.variant.find( { _id: "05458452563" } )
• Get all variants for a product, sorted by SKU
db.variant.find( { itemId: "054VA72303012P" } ).sort( { _id: 1 } )
• Indices
itemId, lastUpdated

Models - Hierarchy
29
{
"_id": "1200003270", // the node id
"name": "Women's Heels & Pumps",
"count": 22305, // how many items in this category
"parents": [ // list of parents
"1282094266"
],
"facets": [ // facets that exists for this category
"Heel Height",
"Toe",
"Upper Material",
"Width",
"Shoe Size",
"Color"
]
}

Models – Hierarchy
• Get hierarchy node by id
30
db.hierarchy.find( { _id: "1200003270" } )
• Get hierarchy node from parent id
db.hierarchy.find( { parents: "1282094266" } )
• Get departments (no parent)
db.hierarchy.find( { parents: null } )
• Secondary Indices
parents

Models – per Store Pricing
Per store pricing could result in billions of
documents…unless it is built in a modular way:
_id: concatenation of item and store.
Item: can be an item id or variant id (sku)
Store: can be a store group (online) or store id.
31
{ "_id": "skuSPM8824542513_1234/store123",
"price": 69.99,
"sale": {
"salePrice": 42.72,
"saleEndDate": "2050-12-31 23:59:59"
},
"lastUpdated": 1374647707394 }

Models – per store Pricing
• Get all prices for a given item
32
db.prices.find( { _id: /^item301671/ )
• Get all prices for a given sku (price could be at item level)
db.prices.find( { _id: { $in: [ /^sku730223104376/, /^item301671/ ])
• Get minimum and maximum prices for a sku
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },
max: { $max : price} } })
• Get price for a sku and store id (returns up to 4 prices)
db.prices.find( { _id: { $in: [ "sku730223104376/store1234",
"sku730223104376/sgroup0",
"item301671/store1234",
"item301671/sgroup0"] , { price: 1 })

Search – Browse and Search products
Browse by
category
34
Special
Lists
Filter by
attributes
Lists hundreds
of item
summaries
By far the toughest page to get right and fast …

Search – Browse and Search products
The previous page presents many challenges:
• Response within milliseconds for hundreds of items
• Faceted search on many attributes: category, brand, …
• Efficient sorting on several attributes: price, popularity
• Pagination feature which requires deterministic ordering
> Search engines are built for this purpose!
35

Search – Traditional Architecture
36
Product Data Store Product Search
Indexing
#1 obtain
search
results IDs
#2 obtain objects by
ID from cache or DB
Cache Application
Pre-joined
into objects

Search – Traditional Architecture
The traditional architecture issues:
• 3 different systems to maintain: RDBMS,
37
Search engine, Caching layer
• RDBMS schema is complex and static
• Applications needs to talk many languages

Search – Architecture with MongoDB
38
Product Data Store Product Search
Indexing
#1 obtain
search
results IDs
Applications
#2 obtain
objects by
list of IDs
MongoDB
Ready-to-use
product
documents
Search Engine
Product API
Application
issues single
query

Search - Mongo-Connector
39
MongoDB
Search
Engine
Oplog
Mongo
Connector
#1 Initial dump
of the
collections
#2 Updates
streaming via
Oplog
Translatio
n, filtering
Indexing
Indexing

Search - Mongo-Connector
• Open-source Project at
40
https://github.com/10gen-labs/mongo-connector
• Python app that reads from MongoDB's oplog
and publishes to target of choice
• Supports initial sync by dumping the data
• Default connectors for Solr, Elastic Search,
other MongoDB cluster
• Easily extensible to update other systems like
SQL

Search – Mongo-Connector
41
What is the data to index?

Search – More Searching
42
Images of the matching
variants are displayed
Price and
Rating
Facets for
variants

Search – More Searching
… more challenges:
• Attributes at the variant level: color, size, etc
• Attributes from other docs: pricing, ratings, etc
• Display the matching variant's image and details
• Thousands of matching variants for an item, still
43
need to display a single item
• Challenge to properly index the data
> Need for a single summary document per item

Search - Architecture
44
MongoDB Data Store
Items Summaries Pricing
Ratings &
Reviews
Variants Promotions

Search – Summary Model
{ "_id": "3ZZVA46759401P", // the item id
45
"name": "Women's Chic - Black Velvet Suede",
"dep": "84700", // useful as standalone for indexing
"cat": "/84700/80009/1282094266/1200003270",
"desc": { "lang": "en", "val": "This pointy toe slingback ..." },
"img": { "width": 450, "height": 330, "src": "http://..." },
"attrs": [ // global attributes, easily indexable by SE
"heel height=mid (1-3/4 to 2-1/4 in.)",
"brand=metaphor",
"shoe size=6",
"shoe size=6.5", ...
],
"sattrs": [ // global attributes, not to be indexed
"upper material=synthetic",
"toe=open toe", ...
],
"vars": [
{ "id": "05497884001",
"img": [ // images],
"attrs": [ // list of variant attributes to index ]
"sattrs": [ // list of variant attributes not to index ] }, …
] }

46
Let's use Solr …

Search - Using Solr
Defining the schema in schema.xml
<fields>

<field name="_id" type="string" indexed="true" stored="true" />
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="cat" type="string" indexed="true" stored="true" />
<field name="price" type="float" indexed="true" stored="true"/>

<field name="desc.0.val" type="text_general" indexed="true" stored="true"/>

<dynamicField name="attrs.*" type="string" indexed="true" stored="true"/>
<!– some Solr specific fields -->
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW"
multiValued="false"/>
<dynamicField name="*" type="ignored" multiValued="true"/>
</fields>
48

Search - Using Solr
Starting up the connector
> mongo-connector
> Keep it running, it will just stream the Oplog
49
-m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo
-t http://localhost:8983/solr // the solr
-d mongo_connector/doc_managers/solr_doc_manager.py
-n "catalog.summary" // target summary collection
--auto-commit-interval=60 // commit every 1 min
…

Document in Solr looks like:
{ "desc.0.val": "Our classic "Flying Duck" styled as a ...",
Lists are flattened which is difficult to use 
> Must use to named fields to implement Facets
50
"name": "Drake Waterfowl Duck Label SS T-Shirt Army Green",
"attrs.1": "brand=Drake Waterfowl",
"attrs.0": "style=t-shirts",
"cat": "/84700/1200000239/1282094207/1200000817",
"_id": "SPM10823491916",
"_version_": 1479173524477182000,
"timestamp": "2014-09-13T23:09:59.782Z"
}

Search – Using Elastic Search
51
Let's use Elastic Search…

Search - Using Elastic Search
52

ElasticSearch understands whole document right off the bat 
Just need to tell ES not to tokenize the facets:
> Everything else is indexed auto-magically!
53
$ curl -XPOST localhost:9200/largecat3.summary -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"string" : { // string is the name of default mapping type
"properties" : {
"attrs" : { "type" : "string", "index" : "not_analyzed" }
}
} } }'

Starting up the connector
> mongo-connector
> Keep it running, it will just stream the Oplog
54
-m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo
-t http://localhost:9200 // the ES
-d mongo_connector/doc_managers/elastic_doc_manager.py
-n "catalog.summary" // target summary collection
--auto-commit-interval=60 // commit every 1 min
…

Querying for documents, with Facet info… works well 
$ curl -X POST "http://localhost:9200/largecat3.summary/_search?pretty=true" -d '
{ "query" : { "query_string" : {"query" : "Ipad"} },
55
"facets" : { "tags" : { "terms" : {"field" : "attrs"} } } }'
{ "took" : 6,
"hits" : {
"total" : 151,
"max_score" : 0.5892989,
"hits" : [ {
"_index" : "largecat3.summary",
"_type" : "string",
"_id" : "000000000000000012730000000000QAU-QR2442P",
"_score" : 0.5892989,
"_source": { // original JSON from MongoDB }, ... ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"total" : 1577,
"terms" : [ { "term" : "ring size=9", "count" : 120 },
{ "term" : "ring size=8", "count" : 120 },
{ "term" : "metal=sterling silver", "count" : 112 }, ... ]
} } }

Search – Using MongoDB Indexing
56
How about MongoDB's indexes and
Full-Text-Search?

Search – Using MongoDB indexing
The summary contains:
• department e.g. "Shoes"
• Fields to index
57
– Category path, e.g. "Shoes/Women/Pumps"
– Price
– List of Item Attributes, e.g. Brand = Guess
– List of Variant Attributes, e.g. Color = red
• Fields not to index
– List of Item Secondary Attributes, e.g. Style = Designer
– List of Variant Secondary Attributes, e.g. heel height = 4.0

Search - Using MongoDB indexing
• Get summary from item id
58
db.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKU
db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )
• Get summary by department, sorted by rating
db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )
• Get summary with mix of parameters
db.variation.find( { department : "Shoes" ,
"vars.attrs" : { "color" : "Gray"} ,
"category" : ^/Shoes/Women/ ,
"price" : { "$gte" : 65.99 , "$lte" : 180.99 } } )

• The following indices are used:
59
– department + attr + category + _id
– department + vars.attrs + category + _id
– department + category + _id
– department + price + _id
– department + rating + _id
• _id used for pagination
• Can take advantage of index intersection
• With several attributes specified (e.g. color=red
and size=6), which one is looked up?

Facet samples:
{ "_id" : "Accessory Type=Hosiery" , "count" : 14}
{ "_id" : "Ladder Material=Steel" , "count" : 2}
{ "_id" : "Gold Karat=14k" , "count" : 10138}
{ "_id" : "Stone Color=Clear" , "count" : 1648}
{ "_id" : "Metal=White gold" , "count" : 10852}
Single operations to insert / update:
db.facet.update( { _id: "Accessory Type=Hosiery" },
60
{ $inc: 1 }, true, false)
The facet with lowest count is the most restrictive…
It should come first in the $all query!

Search – Comparing Solutions
• Search Engine advantages:
61
– Index size (~ 10x smaller than MongoDB's)
– Indexing speed
– Read speed, integrated cache
– All languages support
– Built-in facetted search, which includes facet counts
• MongoDB's Indexing advantages:
– Built-in the data store, no additional server / software needed
– Single query to get the results
– Can filter down the variant entry and save computing
> Winner here is Elastic Search

Search – Benchmarking
Department Category Price Primary
62
attribute
Time
Average
(ms)
90th (ms) 95th (ms)
1 0 0 0 2 3 3
1 1 0 0 1 2 2
1 0 1 0 1 2 3
1 1 1 0 1 2 2
1 0 0 1 0 1 2
1 1 0 1 0 1 1
1 0 1 1 1 2 2
1 1 1 1 0 1 1
1 0 0 2 1 3 3
1 1 0 2 0 2 2
1 0 1 2 10 20 35
1 1 1 2 0 1 1

Thank You!
Antoine Girbal
Principal Solutions Engineer, MongoDB Inc.
@antoinegirbal

Retail referencearchitecture productcatalog

Retail referencearchitecture productcatalog

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Retail referencearchitecture productcatalog

Similar to Retail referencearchitecture productcatalog (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Retail referencearchitecture productcatalog