Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Customer Behaviour Analytics:
Billions of Events to
one Customer-Product Graph
Strata London, 12th November 2013
Presented...
About Paul Lam
Joined uSwitch.com as first Data Scientist in 2010
• developed internal data products
• built distributed da...
What is it
Customer-Product Graph
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:na...
Question: Who bought an iPhone?
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Expre...
Query: Who bought an iPhone?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}
START

x=node:node_auto_...
Question: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “Ameri...
Query: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

START x=node:node_auto_index(product='...
Hypothesis: People that buy X has interest in Y
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:...
Query: Who to recommend Y
{:name “Bob”}

X

{:product “iPhone”}
START
{:name “Tom”}
x=node:node_auto_index(product='iPhone...
Product Recommendation by Reasoning Example

Interactive demo at http://bit.ly/customer_graph

1. Start with an idea
2. Tr...
Challenge: Event Data to Graph Data

User ID

Product ID Action

Bob

iPhone

Bought

Tom

iPhone

Bought

Emily

iPhone

...
Why should you care
Customer Journey

14
Purchase Funnel

Interest

Desire

Action
15
Understanding Customer Intents

Interest

Desire

Action

16
Customer Experience as a Graph

17
http://insideintercom.io/criticism-and-two-way-streets/
18
A Feedback System to Drive Profit

19
Minimise effort between Q & A

Question
Answer

Data Interrogation
One Approach: Make data querying easier

Query = Function(Data) [1]
~ Function(Data Structure)

[1] Figure 1.3 from Big Da...
Data Structure: Relations versus Relations
a Edges
ak
Sale ID

User ID

Product ID Profit

1

1

1

£100

2

1

2

£50

{:p...
Using the right database for the right task

RDBMS

Graph DB

Data

Attributes

Entities and relations

Model

Record-base...
How does it work
User actions as time-stamped records

time

HDFS

Paul Ingles, “User as Data”, Euroclojure 2012
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
From Input to Output with Hadoop, aka ETL step
1. interface
3. output data

2. input data

4... The 80%
HDFS

Reshape

Gra...
Hadoop interface to Neo4J
•
•
•
•

Cascading-Neo4j tap [1]
Faunus Hadoop binaries [2]
CSV files*
etc.
[1] http://github.com...
Input data stored on HDFS

time

1

2

3

4

User

1
2

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepag...
Nodes and Edges CSVs to go into a property graph
Node ID

Properties

1

{:name “Paul”, email: “paul.lam@uswitch.com”}

2
...
Records to Graph in 3 Steps

^ impo
rtable C

SV

1. Design graph
2. Extract Nodes
3. Build Relations

31
Step 1: Designing your graph
{:name “Paul”
:email “paul.lam@uswitch.com}
:viewed {:timestamp ... }
{:page “homepage”}

:bo...
Step 2: Extract list of entity nodes
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.co...
Step 3: Building node-to-node relations
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google...
Do this across all customers and products
Use your data processing tool of choice:
Apache Hive
• Apache Pig
• Cascading
an...
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functi...
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functi...
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direc...
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direc...
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
Summary

HDFS

Reshape

Graph
Contact
Paul Lam, data scientist at uSwitch.com
Email: paul@quantisan.com
Twitter: @Quantisan
Upcoming SlideShare
Loading in …5
×

Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

3,839 views

Published on

Presented at #strataconf London

Published in: Technology, News & Politics

Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

  1. 1. Customer Behaviour Analytics: Billions of Events to one Customer-Product Graph Strata London, 12th November 2013 Presented by Paul Lam
  2. 2. About Paul Lam Joined uSwitch.com as first Data Scientist in 2010 • developed internal data products • built distributed data architecture • team of 3 with a developer and a statistician Code contributor to various open source tools • Cascalog, a big data processing library built on top of Cascading (comparable to Apache Pig) • Incanter, a statistical computing platform in Clojure Author of Web Usage Mining: Data Mining Visitor Patterns From Web Server Logs* to be published in late 2014 * tentative title
  3. 3. What is it
  4. 4. Customer-Product Graph {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} [:BUY {:timestamp 2013-11-01 13:00:00}] bought viewed
  5. 5. Question: Who bought an iPhone? {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} bought viewed
  6. 6. Query: Who bought an iPhone? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} START x=node:node_auto_index(product='iPhone')Express”} {:product “American {:name “Lily”} MATCH (person)-[:BUY]->(x) RETURN person
  7. 7. Question: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  8. 8. Query: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} START x=node:node_auto_index(product='iPhone') {:name MATCH “Emily”} {:product “American Express”} (x)<-[:BUY]-(person)-[:BUY]->(y) {:name “Lily”} RETURN y
  9. 9. Hypothesis: People that buy X has interest in Y {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  10. 10. Query: Who to recommend Y {:name “Bob”} X {:product “iPhone”} START {:name “Tom”} x=node:node_auto_index(product='iPhone'), y=node:node_auto_index(product='American Express') {:name “Emily”} Y Looked {:product “American Express”} MATCH (p)-[:BUY]->(x), at AE (p)-[:VIEW]->(y) WHERE NOT (p)-[:BUY]->(y) {:name “Lily”} RETURN p Haven’t bought AE
  11. 11. Product Recommendation by Reasoning Example Interactive demo at http://bit.ly/customer_graph 1. Start with an idea 2. Trace to connected nodes 3. Identify patterns from viewpoint of those nodes 4. Repeat from #1 until discovering actionable item 5. Apply pattern
  12. 12. Challenge: Event Data to Graph Data User ID Product ID Action Bob iPhone Bought Tom iPhone Bought Emily iPhone Bought Bob AE Bought Emily AE Viewed Lily AE Bought
  13. 13. Why should you care
  14. 14. Customer Journey 14
  15. 15. Purchase Funnel Interest Desire Action 15
  16. 16. Understanding Customer Intents Interest Desire Action 16
  17. 17. Customer Experience as a Graph 17
  18. 18. http://insideintercom.io/criticism-and-two-way-streets/ 18
  19. 19. A Feedback System to Drive Profit 19
  20. 20. Minimise effort between Q & A Question Answer Data Interrogation
  21. 21. One Approach: Make data querying easier Query = Function(Data) [1] ~ Function(Data Structure) [1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
  22. 22. Data Structure: Relations versus Relations a Edges ak Sale ID User ID Product ID Profit 1 1 1 £100 2 1 2 £50 {:profit £100} {:profit £50} User ID Name Product ID Name 1 Bob 1 iPhone 2 Emily 2 A.E.
  23. 23. Using the right database for the right task RDBMS Graph DB Data Attributes Entities and relations Model Record-based Associative Relation By-product of normalisation First class citizen Example use Reporting Reasoning
  24. 24. How does it work
  25. 25. User actions as time-stamped records time HDFS Paul Ingles, “User as Data”, Euroclojure 2012
  26. 26. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  27. 27. From Input to Output with Hadoop, aka ETL step 1. interface 3. output data 2. input data 4... The 80% HDFS Reshape Graph Database
  28. 28. Hadoop interface to Neo4J • • • • Cascading-Neo4j tap [1] Faunus Hadoop binaries [2] CSV files* etc. [1] http://github.com/pingles/cascading.neo4j [2] http://thinkaurelius.github.io/faunus/ HDFS Reshape Graph Database
  29. 29. Input data stored on HDFS time 1 2 3 4 User 1 2 Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User 4 Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User 3 Timestamp Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  30. 30. Nodes and Edges CSVs to go into a property graph Node ID Properties 1 {:name “Paul”, email: “paul.lam@uswitch.com”} 2 {:domain “google.com”} 3 {:page “/homepage/”} ... ... 5 {:product “iPhone”} From To Type Properties 1 2 :SOURCE {:timestamp “2013-11-01 13:00”} 1 3 :VIEWED {:timestamp “2013-11-01 13:00”} ... ... ... ... 1 5 :BOUGHT {:timestamp “2013-11-01 13:05”}
  31. 31. Records to Graph in 3 Steps ^ impo rtable C SV 1. Design graph 2. Extract Nodes 3. Build Relations 31
  32. 32. Step 1: Designing your graph {:name “Paul” :email “paul.lam@uswitch.com} :viewed {:timestamp ... } {:page “homepage”} :bought {:timestamp ... } :viewed { ... } {:product “iPhone”} :viewed { ... } {:page “blog”} :source {:timestamp ... } {:domain “google.com”} 32
  33. 33. Step 2: Extract list of entity nodes User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  34. 34. Step 3: Building node-to-node relations User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  35. 35. Do this across all customers and products Use your data processing tool of choice: Apache Hive • Apache Pig • Cascading and more ... • Scalding • Cascalog • Spark • your favourite programming language • Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
  36. 36. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 36
  37. 37. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 37
  38. 38. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 38
  39. 39. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 39
  40. 40. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  41. 41. Summary HDFS Reshape Graph
  42. 42. Contact Paul Lam, data scientist at uSwitch.com Email: paul@quantisan.com Twitter: @Quantisan

×