1. Preparing for Peak Holiday Season:
A Seamless Customer Experience!
Antoine Girbal
Principal Solutions Engineer, MongoDB
@antoinegirbal
Rebecca Bucnis
Global Business Architect & Strategist, MongoDB
@rebeccabucnis
2. 3 Questions for this Session?
1. How is Peak Season shaping up this year?
1. How does MongoDB scale to support your business?
2. How do you capture the holiday Digital Customer
Experience with MongoDB?
2
3. MongoDB Speakers
3
About Rebecca:
Rebecca Bucnis
Global Business Architect
- Business Strategy
- Using data for business value
- Former Retailer
Washington, DC
rebecca.bucnis@mongodb.com
@rebeccabucnis
About Antoine:
Antoine Girbal
Principal Solutions Engineer
- Original team of MongoDB
- Engineer
- Solution Designer
Palo Alto, CA
Antoine.girbal@mongodb.com
@antoinegirbal
4. What to expect - Holiday Season 2014
• Consumers more
4
positive
• Increased spending
(+25%*)
• Extended holiday
buying window (with
fewer days) starts 6pm
* From Accenture Holiday Survey Oct 2014 Study on US Consumer Holiday Spending Plans
5. What to expect - Holiday Season 2014
• Cyber Monday bigger
5
than “Black Friday”
• Amazon has opened
“stores” for returns
• 58%* of shoppers will
shop with on-line
retailers only:
* From Accenture Holiday Survey Oct 2014 Study on US Consumer Holiday Spending Plans
6. The Opportunity - Holiday Season 2014
• Consumers want the
message right (*43% will
defect when irrelevant)
• Price, Convenience,
relevance & entertainment
• Collect immediate & longer
term shopping behavior for
action
6
* From Gigya Personalization Study 2014 State of Consumer Privacy & Personalization
7. The System of Engagement for
Retail
• A document model (holds mixed, variant data)
• Ability to add new & different data (agility)
• Ability to ask real-time questions based on right
7
now update (complex queries & in-place updates)
• Geo-Location built-in
• Power of traditional data bases (full consistency,
durability, atomic operations)
• Near linear expansion (scaling via sharding)
• MongoDB is a unique fit for frictionless retail
8. Use Cases: Modern, Seamless Retail
8
“Global Product 360”
Themes: Up to date product
details – with minimal down
time; Images, reviews;
Vendor and order management;
9. Use Cases: Modern, Seamless Retail
9
Consolidated Customer
View & Insight
Themes: Single View of Customer,
Consumer 360; Activity Capture;
Profiles for personalization
10. Technical Deep Dive
1. Detailed Product Information:
10
- Single View of Product Information– Catalog
2. Real-time Inventory and Fulfillment
- Real-time Inventory
- Shopping Carts / Orders
3. Detailed Customer Views:
- User Activity Logging
- Integrating Customer Insights
4. Monitoring and Scaling
- What to watch for and how to scale
11. Architecture Overview
11
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social
Customer
Channels
Amazon
Ebay
…
Stores
POS
Kiosk
…
Mobile
Smartphone
Tablet
Website
Contact
Center
Social
Facebook
Twitter
…
Application
Servers
API
Data and
Service
Integration
Suppliers
Supply Chain
Management
System
Data
Warehouse
Analytics
3rd Party
In Network
Web
Servers
15. The many catalogs problem
1. One department in charge of master product works hard at fitting
15
data into SQL tables
2. Resulting data sits in a SQL server with a couple replicas. It's
forbidden to hit it more than 100 times / sec
3. Other departments need to access the data way more often for
their own services
4. Other departments need more information that is not available
since it did not fit in that long devised rigid SQL schema
5. ETLs and Message Buses are put in place for other teams to try
figure it out themselves…
6. Data becomes inconsistent, fragmented, not up-to-date…
Problem visible both internally and by customers!
16. The many catalogs problem
16
Online Store
Catalog
Marketing
Catalog
Dozens of catalogs!
Department 3
Catalog
Product Department
Master
Catalog
Department 4
Catalog
Department 5
Catalog
Department 1
Catalog
Message
Bus
ETLs
17. Too many catalogs problem
17
How many Catalogs do you have?
Catalog Caches?
Message Buses and ETLs for them?
18. Goal: Single View of Product
• Single view of a product, one central service
• Flexible schema containing all useful data
• Read volume high and sustained, 100k reads / s
• Can seamlessly take write spikes during catalog
18
update
• Advanced indexing and querying
• Geographical distribution for HA and low latency
19. Merchandising - Architecture
19
MongoDB Data Store
Items Pricing Promotions
Variants
Ratings &
Reviews
Search Engine
…
Product Service API
Online Store Marketing Inventory SCMS Public API …
20. Models - Overview
• Item: the overall product info (e.g. Levi’s 501)
• Variant: a specific variant of an item (e.g. in black size 6)
20
which typically has a specific SKU / UPC
• Price: price information may vary based on the store, the
variant, etc
• Hierarchy: the item taxonomy
• Facet: facets to search products by
• Vendors: a given sku may be available through several
vendors if the site is a marketplace
21. Models - Item Model
{ "_id": "054VA72303012P", // the item id
21
"desc": [ // item descriptions
{ "lang": "en", "val": "Give your dressy look a lift with ..." }, ...
],
"name": "Women's Kate Ivory Peep-Toe Stiletto Heel",
"category": "/84700/80009/1282094266/1200003270", // hierarchy
"brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" },
"assets": { // references to all assets
"imgs": [
{ "img": { "width": 1900, "height": 1900, "src": "http://..." }, ...
]
},
"shipping": { // shipping specs }, "specs": { // item specs },
"attrs": [ // list of items attributes (facets)
{ "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" },
{ "name": "Toe", "value": "Open toe" }, ...
],
"variants": { // quick info on the variants
"cnt": 9,
"attrs": [
{ "dispType": "DROPDOWN", "name": "Color" },
{ "dispType": "DROPDOWN", "name": "Shoe Size" }, ...
]
},
"lastUpdated": 1400877254787 // keep track of updates }
22. Product Search – Traditional Architecture
22
Product Data Store Product Search
Indexing
#1 obtain
search
results IDs
#2 obtain objects by
ID from cache or DB
Cache Application
Pre-joined
into objects
23. Product Search – New Architecture
23
Product Data Store Product Search
Indexing
#1 obtain
search
results IDs
Applications
#2 obtain
objects by
list of IDs
MongoDB
Ready-to-use
product
documents
Search Engine
Product API
Application
issues single
query
26. Less than Real-Time Inventory
1. The Inventory system is centralized in a single SQL server
2. Latency to Inventory is too high, not accessible from individual
26
stores or distribution centers
3. Stores / DCs need to manage their own local inventory, then
ship the result once a day to the central system
4. Central inventory has no view of intra-day quantities. It does
forecast and replenish with up to 24h delay
5. Opportunities are lost due to overstock / shortage
6. Sometimes products are sold due to existing quantities in a
distant inventory. The product turns out not actually available,
customers are upset
27. Inventory – Traditional Architecture
27
Relational DB
System of Records
Analytics,
Aggregations,
Reports
Caching
Layer
Local
view only
Field Inventory
Internal &
External Apps
Once-a-day
sync
Stale view
Suboptimal
logic
28. Goal: Real-Time Inventory
• Single view of the inventory, one central service
• Used by most services and channels
• Read dominated workload
• Local, real-time writes
• Bulk writes for refresh
• Geographically distributed
• Horizontally scalable
28
29. Inventory – Target Architecture
29
Stores
Orders
MongoDB
Relational DB
System of Records
Analytics,
Aggregations,
Reports
Field Inventory
Internal &
External Apps
Inventory
Assortments
Shipments
Audits
Point-in-time
Loads
Nightly
Real-time check
updates
Real-time
view
Relevant
dataset
35. Inventory Updates – Availability
How to keep reads / writes local with low latency?
How to stay available during network partition?
35
36. 36
West DC Central DC East DC
Shard
West
Shard
Central
Shard
East
Inventory Updates – Availability
Primary
Primary
Primary
Ap
pAp
pAp
p
Ap
pAp
pAp
p
Ap
pAp
pAp
p
Basic Setup:
Writes go
everywhere
37. Inventory Updates – Availability
• Basic shard key
37
– { _id: 1 } // built as group key + store
• Shard key for "Geo-sharding"
– { geoCode: 1, _id: 1}
• Alternative "Geo-sharding", more granular
– { storeId: 1, _id: 1 }
38. 38
West DC Central DC East DC
Shard
West
Shard
Central
Shard
East
Inventory Updates – Availability
Primary
Primary
Primary
Ap
pAp
pAp
p
Ap
pAp
pAp
p
Ap
pAp
pAp
p
Using tag-aware
sharding: mostly
local writes
41. 41
West DC Central DC East DC
Shard
West
Shard
Central
Shard
East
Shopping Carts – Availability
Primary Replication
Primary
Primary
use
r
use
r
1. Shops in
West, cart
written locally
2. Shops in
East, same cart
read locally
Travel
42. Shopping Carts – Topology
• Each shard has 1 replica in every DC
• Primary servers are distributed among DCs
• Local Cart insert / update:
42
– Tag-aware Sharding using the geoCode field
• Local Cart lookup:
– Tag-aware Sharding using the geoCode field
• Local Cart lookup for all regions:
– Nearest Read Preference (closest replica)
45. Insights – Data of interest
Many user activities can be of interest:
• Search terms
• Product viewed, liked or wished
• Shopping cart add / remove
• Orders submitted
• Sharing on social network
• Ad impression, Clickstream
45
46. Insights – Data of interest
Data will be used to compute:
• User / Product History
• Product Map (relationships, etc)
• User Preferences
• Recommendations
• Trends
> This is the basis for Personalization
46
47. Insights – Today's Limitations
1. Originally system does not record user activity much, since it is
47
too voluminous. It ends up forgotten in log files.
2. Attempts are made to store it in SQL, but expensive to achieve
adequate write performance. Reporting across large data sets
(TB+) does not work.
3. Activity is recorded to Data Warehouse system which provides
good reporting but too expensive to scale.
4. Using technologies like Hadoop, good scaling and powerful
reporting are achieved.
5. Still there is a lack of scalable front end Data Store for real
time queries and aggregations from applications.
48. Insights – Traditional Architecture
48
External
Analytics:
Hadoop,
Greenplum,
Terradata,
…
Apps
Log Processor
Activity Logs
SQL Data Store
Delays
moving logs
Delays
processing
Output
limited by
schema
Limited read
capacity
49. Goal: Scalable and Powerful Insights
• Store and manage large stream of data samples
49
– High arrival rate from many sources
– Variable schema
– Control retention period of data
• Compute aggregations and derivative data sets
– Aggregations and statistics based on data
– Roll-up data into pre-computed reports and summaries
• Low latency access to up-to-date data
– Flexible indexing of raw and derived data sets
– Rich querying based on time + meta-data fields
50. Insights – MongoDB Architecture
50
MongoDB
HVDF
API
Activity Logging
User History
External
Analytics:
Hadoop,
Spark,
Storm,
…
Product Map
User Preferences
Recommendations
Trends
Apps
Internal
Analytics:
Aggregation,
MR
All user activity
is recorded
MongoDB –
Hadoop
Connector
Personalization
54. Insight – User History
• Recent activity for a user:
db.activity.find({ userId: "u123" })
54
.sort({ time: -1 }).limit(100)
• Recent activity for a product:
db.activity.find({ itemId: "301671" })
.sort({ time: -1 }).limit(100)
• Indices:
– userId + time, itemId + time, time
• All queries should be time bound for performance!
55. Insight – User Stats
• Recent number of views, purchases, etc for user
db.activities.aggregate(([
55
{ $match: { userId: "u123", ts: { $gt: DATE }}},
{ $group: { _id: "$type", count: { $sum: 1 }}}])
• Recent total sales for a user
db.activities.aggregate(([
{ $match: { userId:"u123", ts:{$gt:DATE}, type:"ORDER"}},
{ $group: { _id: "result", count: {$sum: "$total" }}}])
• Recent number of views, purchases, etc for item
db.activities.aggregate(([
{ $match: { itemId: "301671", ts: { $gt: DATE }}},
{ $group: { _id: "$type", count: { $sum: 1 }}}])
> Those aggregations are very fast, real-time
56. Insight – User Stats
• Map Reduce calculation of unique visitors:
var map = function() { emit(this.userId, 1); }
var reduce = function(key, values)
56
{ return Array.sum(values); }
db.activities.mapreduce(map, reduce,
{ query: { time: { $gt: NOW-1H } },
out: { replace: "lastHourUniques", sharded: true })
// number activities for a user
db.lastHourUniques.find({ userId: "u123" })
// total uniques, immediate result
db.lastHourUniques.count()
61. Monitoring Tips – Tools
Following are useful Monitoring tools:
• Mongo Monitoring Service (MMS)
• Mongostat – console based
• Mongotop – activity of each Namespace
• IOStat – disk activity
• Plugins for most popular frameworks (Munin,
61
Nagios, Cacti, SNMP …)
> Without Monitoring, impossible to quickly
troubleshoot and recover from downtime!
63. Monitoring Tips – Metrics
Metrics to watch for:
• Data Size vs Disk Size
• Active Set Size vs Ram Size
• Disk IO
• Write Lock
> Account and test for highest possible traffic!
> MongoDB's support team is there to help!
63
64. Replication Tips
Add replicas to:
• Reduce latency to users
• Add read capacity (data potentially stale)
• Increase data safety
> Adding / Removing replica is seamless
64
66. Sharding Tips
If you are not sharding yet …
It may be time to shard
Switch to sharding with no downtime …
Just make sure you pick the right shard key!
MongoDB Support is there to help
66
67. Sharding Tips
Add shards to:
• Increase read / write IO capacity
• Increase Storage space
• Increase RAM space
• Bring a primary closer to users
> Shard add / remove takes time and capacity
> Scales mostly linearly but broadcast queries
are sub-linear
67
72. 3 Answers for this Season
1. How is Peak Season shaping up this year?
1. Spending & confidence are back! Act fast!
1. How do you scale your business with MongoDB?
2. Create single view services and scale using
sharding
2. How do you capture the holiday Digital Customer
Experience with MongoDB?
3. High volume activity logging capture for
now & rest of the season for “insight”
73. What’s Next?
1. Assess your data and determine your monitoring gaps
2. Join us and Engage:
• MongoDB Days – London – November 19
• MongoDB Days- San Francisco – December 3
• MongoDB Meet-ups, MUG, Office Hours
3. Start one step at a time - with “prototype” capabilities