The document discusses driving personalized experiences using customer profiles. It outlines the benefits of personalization and the high-level personalization process, which includes creating customer profiles, enriching them with public data, capturing customer activity, performing clustering analysis to define customer personas, tagging customer profiles with personas, and personalizing interactions. MongoDB is recommended for personalization due to its document model, high performance, scalability, rich querying capabilities, and ability to integrate with Hadoop/Spark. Examples are provided of how customer data like interests, visits, and purchases can be collected and stored in profiles to enable real-time personalization based on persona matching.
2. 2
Big Data Analytics Track
1. Driving Personalized Experiences Using Customer Profiles
2. Leveraging Customer Behavior to Enhance Relevancy in
Personalization
3. Machine Learning to Engage the Customer, with Apache Spark,
IBM Watson, and MongoDB
3. 3
Agenda For This Session
1.Benefits of Personalization
2.High level process
3.Data capture steps
4.Data analysis steps
5.Real-time personalization
6.Summary
7.Q&A
4. 4
You Notice When Content is Personalized
When it looks like this outside
Left: from www.johnbyronkuhner.com via Google Images
Right: from www.steinmart.com via Google Images
Is this the best ad to show you?
5. 5
Or Better This
When it looks like this outside
Left: from www.johnbyronkuhner.com via Google Images
Right: www.linkedin.com/pulse/20140729161519-34678510-take-note-time-to-move-beyond-personalization-to-contextualization
More relevant
8. 8
High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public data
Common
technologies
• R
• Hadoop
• Spark
• Python
• Java
• Many other
options
4 & 5 performed
much less often
than tagging
9. 9
Why MongoDB for Personalization?
• Document model => customer profiles are rich structures perfect for documents
• High throughput => profiles are read/written every page so high performance is critical
• High scalability => high performance must scale easily for any data size & request volume
• Rich querying & indexes => often only portions of the profile are queried for and especially
ad hoc marketing requires rich querying capabilities. Geospatial indexes critical for mobile
• Real-time analytics => can analyze directly on MongoDB or prepare aggregated results for
external analysis with the aggregation framework
• Strong consistency => want profile changes & tracking to take effect immediately
• Hadoop/Spark integration => can run distributed analytics on data in MongoDB or copy it
to HDFS to run there both with the MongoDB Hadoop Connector
• Low TCO => Low cost enterprise software license, commodity hardware, & management
10. 10
Customer Example: Scratchpad
• Records all
activity in
researched trips
• Needed
– Document
model
– Dynamic
schema
– Rich querying
– Easy scaling
11. 11
And Many Other Customers Personalizing with MongoDB
• Sailthru
• Sitecore
• Adobe (AEM)
• Expedia
• ADP
• Foursquare
• Otto
• Chico’s
and 100s more…
13. 13
Anonymous user
Might just start with this if no cookie
{
"ipAddress" : "216.58.219.238",
"referrer" : "google.com"
}
Pretty useless, right?
14. 14
More Than Just What You Collect
IP Address
Referrer
Information
Broker
Location
Company
Weather
Avg Income
Interests
Possible Interests
e.g. Kay Jewelers, Dick’s Sporting Goods
Budget Indication
e.g. Barney’s
Search term
15. 15
Often User Creates a Profile
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ”kay.com",
"firstName" : "John",
"lastName" : "Doe",
"email" : "johndoe@gmail.com"
}
25. 25
Clustering Overview
• Think of each of your customers or users of your site as a data point
• How can we group users into like sets for marketing, cross-sell, etc. similarly
• K-means is a common algorithm for clustering
Image from: http://pypr.sourceforge.net/kmeans.html
Clustered DataOriginal Unclustered Data
26. 26
Clustering Process for Personalization
Customer Profile
Documents
Map to Vectors
[1, 3, 0, …]
Clustering Algo
Vectors
Iterate on inputs
Define
Personas
Clusters of customersUpdate profiles with
persona
Tag Profiles
with Personas
Clusters of customers
28. 28
Aggregation Framework for Filtering Profiles
//Adds up the visited counts (vc) and purchases to filter out those below 20 counts
db.profiles.aggregate( [
{$project:
{
vc: "$vc",
purchases: "$purchases",
total:
{$add: [
{$ifNull: ["$vc.mShirts", 0]},
{$ifNull: ["$vc.mPants", 0]},
{$ifNull: ["$vc.mShoes", 0]},
{$ifNull: ["$vc.mTies", 0]},
{$ifNull: ["$vc.mSunglass", 0]},
{$ifNull: ["$vc.mWatch", 0]},
{$ifNull: ["$vc.mBags", 0]},
{$multiply: [ {$size: "$purchases"}, 10 ]}
]}
}
},
{$match:
{total: {$gte: 20}}
}
])
29. 29
Input/Output for K-Means Algo
Clustering Algo
Iterate on inputs
Clusters of customers
Vectors: [
[11, 0, 10, 0, 1, 3, ...],
[ 0, 5, 10, 3, 0, 0, ...],
...
]
K = # of clusters
Driven by
marketing effort
or data analysis
N = # of iterations
{
Centers: [
{name: C1, vector:[..] },
{name: C2, vector:[..] }],
...
]
Clusters: [
{C1: [[11, 0, 10, 0, 1, 3, ...],...]},
{C2: [[ 0, 5, 0, 0, 10, 0, ...],...]},
...
]
}
Vectors
30. 30
Clustered DataOriginal Unclustered Data
Choosing Personas
• Each cluster would usually map to one persona you can identify, name, and target
• Common to name personas to be memorable, e.g. shoe fanatic, bargain hunter, researcher, etc.
C1
C2
C3 Shoe Fanatic?
35. 35
Many Personalization Techniques to Mix & Match
• Related content
• Content history
• Next best offer
• Trigger-based
• Threshold
• Last behavior
• Time & event
• Offer matching
• Filter-based
• Crowd-sourcing
• Voice of customer
• User-directed
• Persona matching
Source: http://semphonic.blogs.com/semangel/2014/03/strategies-for-personalization-delivering-an-extra-unexpected-treat-.html
36. 36
Alternatives Give Less Capabilities
Activity Logs
Customer Profiles
(no activity)
Application
Option - separate weblogs
Customer Profiles
with Activity Tracking
Application
Better option
Tag with Persona
Marketing
Clustering &
Analytics
Can market:
• On activity today
• With rich & specific
queries
37. 37
Better Option Enables Real-time Persona Matching
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public data
Can even match customer
to a persona while
customer is engaged
Logic is to calculate the
distance to each cluster
center and tag with the
closest one’s persona
40. 40
High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public data
Common
technologies
• R
• Hadoop
• Spark
• Python
• Java
• Many other
options
4 & 5 performed
much less often
than tagging
41. 41
Big Data Analytics Track
Driving Personalized Experiences Using Customer Profiles
2. Leveraging Customer Behavior to Enhance Relevancy in
Personalization
3. Machine Learning to Engage the Customer, with Apache Spark,
IBM Watson, and MongoDB
Editor's Notes
P2P10 Driving Personalized Experiences Using Customer Profiles
This session covers the end-to-end process of personalization and demonstrates a great example of combining operational data for an application in MongoDB with the ability to analyze that data and operationalize the results. We will discuss storing rich customer profiles in MongoDB, using clustering to develop a customer segmentation, and leveraging that as a filter for valuable personalization of your application. You'll walk away with a good idea of how to drive targeted experiences to customers for more relevant engagement and how personalization is accessible to companies large and small.
This session is broad end-to-end, then deeper in next 2 session.
Goal is for everyone here to believe personalization is achievable to build into your applications
Explain who did the survey and who was asked questions.
Actually easy to get value incrementally from starting small and adding more complex personalization
Actually easy to get value incrementally from starting small and adding more complex personalization
Mention other parts of track will cover the technologies used for batch analytics
70% of marketers said user preferences give high ROI
68% said user behavior
Point out schema design might be different depending on requirements and how using profile info
Probably have a separate collection for order info but relevant info stored with profile
2 dimensions might be how many shoes bought vs. how many tops (forgetting the axes). In reality can be many more dimensions
Might filter out any counts lower than 20 or some number, only run on customers with enough information (frequent customers)
Could have a different part of the vector for purchasing.
There is a choice of what vectors to send. Might just choose counts larger than e.g. 5 or only for those customers with at least 20 counts because you judge you have enough samples
Marketing might decide they want to focus on 5 personas to start, or through data analysis, you find one cluster really exhibits very different behavior within it and you want to break it up
(could mention the technology products that can use for clustering, e.g. spark, ML, language libraries)
Explain how k-means works at high level, iteratively moving the centers to define the nearby clusters
e.g. if the two axes were shoes vs. clothes, then green might be high frequency buyer of everything, red is high shoe buyers, and blue is little of everything
Might name it by the cluster center, especially focusing on how it is different from other cluster centers
Over time, you would learn whether these personas are stable or not or change frequently, in which case you might not focus on those, e.g. patterns in the month before Christmas (buying patterns very different).
A lot of work just for that little tag, but that tag represents a fast way to characterize that person and add to personalization rules
Even counts and therefore persona very helpful.
A good problem to have is too much information to personalize with – start simple, measure, and add
Great juxtaposition of two approaches. Even though I’m looking at a woman’s dress, it uses Feature Products to market to me personally. Other sections cover related items to this dress so best of both worlds
Featured products could be items commonly bought for my persona, or trending today by persona
More advanced
Could track products selling by persona today
Figuring out whether things are gifts (e.g. clothes for women and I’m a man)
Many of these are useful by themselves but many made better when you add a persona
Beyond just personalizing from customer profiles, rules-based
Suggest based on what already in the cart
what page visited for a while
weather in the area this weekend
responds to discounts
Can do ad hoc marketing & promotions, e.g.
Who looks at the swimwear or shoe category a lot
Who shopped last year on Black Friday
Who shopped a lot right before spring last year
Who bought a suit and bought or looked at ties
Most importantly can identify a persona while the person is shopping (once browsing enough) instead of waiting until next time they come to your app
Mention other parts of track will cover the technologies used for batch analytics
Actually easy to get value incrementally from starting small and adding more complex personalization
Mention other parts of track will cover the technologies used for batch analytics
If time, can ask people what algorithms they are using for personalization