Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Leveraging Customer Data to Enhance Relevancy
in Personalization
“Using Apache Data Processing Projects on top of MongoDB”...
2
Big Data Analytics Track
1. Driving Personalized Experiences Using Customer Profiles
2. Leveraging Data to Enhance Relev...
3
Agenda For This Session
• Personalization Process Review
• The Life of an Application
• Separation of Concerns / Real Wo...
4
High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analy...
5
Evolution of a Profile (1)
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ...
6
Evolution of a Profile (n+1)
{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),
"firstName" : "John",
"lastName" : "Doe",
"...
7
One size/document fits all?
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
8
Separation of Concerns
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
•...
9
Benefits
• Code does less, Document and Code stays focused
• Split ability
– Different Teams
– New Languages
– Defined D...
10
Result
• Code does less, Document and Code stays focused
• Split ability
– Different Teams
– New Languages
– Defined De...
Analytics and Personalization
From Query to Clustering
12
Separation of Concerns
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
...
13
Separation of Concerns
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
...
14
Architecture revised
Profile Service
Customer
Service
Session Service Persona Service
Frontend – System Backend– System...
15
Advice for Developers
• OWN YOUR DATA! (but only relevant Data)
• Say no! (to direct Data ie. DB Access)
Data Processing
17
Hadoop in a Nutshell
• An open source distributed storage and
distributed batch oriented processing framework
• Hadoop ...
18
Spark in a Nutshell
• Spark is a top-level Apache project
• Can be run on top of YARN and can read any
Hadoop API data,...
19
Flink in a Nutshell
• Flink is a top-level Apache project
• Can be run on top of YARN and can read any
Hadoop API data,...
20
Latency of query operations
Query Aggregation MapReduce Cluster Algorithms
time
MongoDB
Hadoop
Spark/Flink
Iterative Algorithms / Clustering
22
K-Means in Pictures
• Source: Wikipedia K-Means
23
K-Means as a Process
24
Iterations in Hadoop and Spark
25
Iterations in Flink
• Dedicated iteration operators
• Tasks keep running for the iterations, not redeployed for each st...
Demo
27
Result
28
More…?
29
Takeaways
• Stay focussed => Start and stay small
– Evaluate with BigDocuments but do a PoC focussed on the
topic
• Ext...
30
Next Steps
• Next Session => Hands on Spark and Whatson Content!
– „Machine Learning to Engage the Customer, with Apach...
Thank you!
Marc Schwering
Sr. Solutions Architect – EMEA
marc@mongodb.com
@m4rcsch
Upcoming SlideShare
Loading in …5
×

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Personalization

1,375 views

Published on

This session covers how to capture and analyize customer behavior to create more relevent contexts for customers. We will cover how to use your current BI features, and more importantly, how newer technologies approach the challenge. You will walk away with a good idea on how to build and drive even more contextually relevant experiences to customers for even more successful engagements.

Published in: Data & Analytics, Technology
  • We called it "operation mind control" - as we discovered a simple mind game that makes a girl become obsessed with you. (Aand it works even if you're not her type or she's already dating someone else) Here's how we figured it out...  https://tinyurl.com/unlockherlegss
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Personalization

  1. 1. Leveraging Customer Data to Enhance Relevancy in Personalization “Using Apache Data Processing Projects on top of MongoDB” Marc Schwering Sr. Solution Architect – EMEA marc@mongodb.com @m4rcsch
  2. 2. 2 Big Data Analytics Track 1. Driving Personalized Experiences Using Customer Profiles 2. Leveraging Data to Enhance Relevancy in Personalization 3. Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB
  3. 3. 3 Agenda For This Session • Personalization Process Review • The Life of an Application • Separation of Concerns / Real World Architecture • Apache Spark and Flink Data Processing Projects • Clustering with Apache Flink • Next Steps
  4. 4. 4 High Level Personalization Process 1. Profile created 2. Enrich with public data 3. Capture activity 4. Clustering analysis 5. Define Personas 6. Tag with personas 7. Personalize interactions Batch analytics Public data Common technologies • R • Hadoop • Spark • Python • Java • Many other options Personas changed much less often than tagging
  5. 5. 5 Evolution of a Profile (1) { "_id" : ObjectId("553ea57b588ac9ef066428e1"), "ipAddress" : "216.58.219.238", "referrer" : ”kay.com", "firstName" : "John", "lastName" : "Doe", "email" : "johndoe@gmail.com" }
  6. 6. 6 Evolution of a Profile (n+1) { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", "lastName" : "Doe", "address" : "229 W. 43rd St.", "city" : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "john.doe@mongodb.com", "twitterHandle" : "johndoe", "gender" : "male", "interests" : [ "electronics", "basketball", "weightlifting", "ultimate frisbee", "traveling", "technology" ], "visitedCounts" : { "watches" : 3, "shirts" : 1, "sunglasses" : 1, "bags" : 2 }, "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ], "persona" : "shoe-fanatic” }
  7. 7. 7 One size/document fits all? • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Application Batch analytics
  8. 8. 8 Separation of Concerns • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Batch analytics Layer Frontend - System Profile Service Customer Service Session Service Persona Service
  9. 9. 9 Benefits • Code does less, Document and Code stays focused • Split ability – Different Teams – New Languages – Defined Dependencies
  10. 10. 10 Result • Code does less, Document and Code stays focused • Split ability – Different Teams – New Languages – Defined Dependencies KISS => Keep it simple and save! => Clean Code <= • Robert C. Marten: https://cleancoders.com/ • M. Fowler / B. Meyer. et. al.: Command Query Separation
  11. 11. Analytics and Personalization From Query to Clustering
  12. 12. 12 Separation of Concerns • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Batch analytics Layer Frontend – System Profile Service Customer Service Session Service Persona Service
  13. 13. 13 Separation of Concerns • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Batch analytics Layer Frontend – System Profile Service Customer Service Session Service Persona Service
  14. 14. 14 Architecture revised Profile Service Customer Service Session Service Persona Service Frontend – System Backend– Systems Data Processing
  15. 15. 15 Advice for Developers • OWN YOUR DATA! (but only relevant Data) • Say no! (to direct Data ie. DB Access)
  16. 16. Data Processing
  17. 17. 17 Hadoop in a Nutshell • An open source distributed storage and distributed batch oriented processing framework • Hadoop Distributed File System (HDFS) to store data on commodity hardware • Yarn as resource management platform • MapReduce as programming model working on top of HDFS
  18. 18. 18 Spark in a Nutshell • Spark is a top-level Apache project • Can be run on top of YARN and can read any Hadoop API data, including HDFS or MongoDB • Fast and general engine for large-scale data processing and analytics • Advanced DAG execution engine with support for data locality and in-memory computing
  19. 19. 19 Flink in a Nutshell • Flink is a top-level Apache project • Can be run on top of YARN and can read any Hadoop API data, including HDFS or MongoDB • A distributed streaming dataflow engine • Streaming and batch • Iterative in memory execution and handling • Cost based optimizer
  20. 20. 20 Latency of query operations Query Aggregation MapReduce Cluster Algorithms time MongoDB Hadoop Spark/Flink
  21. 21. Iterative Algorithms / Clustering
  22. 22. 22 K-Means in Pictures • Source: Wikipedia K-Means
  23. 23. 23 K-Means as a Process
  24. 24. 24 Iterations in Hadoop and Spark
  25. 25. 25 Iterations in Flink • Dedicated iteration operators • Tasks keep running for the iterations, not redeployed for each step • Caching and optimizations done automatically
  26. 26. Demo
  27. 27. 27 Result
  28. 28. 28 More…?
  29. 29. 29 Takeaways • Stay focussed => Start and stay small – Evaluate with BigDocuments but do a PoC focussed on the topic • Extending functionality is easy – Aggregation, MapReduce – Hadoop Connector opens a new variety of Use Cases • Extending functionality could be challenging – Evolution is outpacing help channels – A lot of options (Spark, Flink, Storm, Hadoop….) – More than just a binary
  30. 30. 30 Next Steps • Next Session => Hands on Spark and Whatson Content! – „Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB“ – RDD Examples • Try out Spark and Flink – http://bit.ly/MongoDB_Hadoop_Spark_Webinar – http://flink.apache.org/ – https://github.com/mongodb/mongo-hadoop – https://github.com/m4rcsch/flink-mongodb-example • Participate and ask Questions! – @m4rcsch – marc@mongodb.com
  31. 31. Thank you! Marc Schwering Sr. Solutions Architect – EMEA marc@mongodb.com @m4rcsch

×