SlideShare a Scribd company logo
1 of 28
Download to read offline
PrefixSpan with Spark
Frank Wolf & Dr. Gundula Meckenhäuser
07.08.2015
Predictive Behavioral Targeting with Akanoo
22
Identify relevant situations
Interact in real time
Engage in a smart way
Agenda
1. PrefixSpan for Sequential Pattern Mining
2. A PrefixSpan Implementation with Spark
3
Agenda
1. PrefixSpan for Sequential Pattern Mining
2. A PrefixSpan Implementation with Spark
4
cart . . .
Motivation
54% buyers
46% non-buyers
What is going on within the last 5 page impressions?
5
Translating clickstream data into patterns
pagetypes: home, overview, product, sale, account, cart, checkout, search, about
cart . . . overview overview overview product overview
cart . . . 5th last PI 4th last PI 3rd last PI 2nd last PI last PI
pattern: identified by SessionId:
< ( overview, add ), ( overview, no ), ( overview, no ), ( product, no ), ( overview, no ) >
cart . . . add no no no no
cart changes: add, remove, no
item
itemset
6
Problem definition
7
pattern 1 < (overview, no) , ( overview, no ), ( overview, no ), ( product, no ), ( overview, no ) >
pattern 2 < (home, no), ( product, no ), ( product, no ), ( product, no ), ( overview, no ) >
...
pattern n < (overview, no) , ( product, add ), ( product, no ), ( cart, no ), ( checkout, no ) >
Example < (overview, no), (product, no)> is a subpattern of patterns 1 and n
< (overview, no), (product, remove) > is no subpattern
A pattern is frequent with support n if it is n times a subpattern of the database patterns
PrefixSpan
8
Sequential Pattern Mining by Pattern Growth
PrefixSpan with a toy example: frequent patterns of length 1
9
ID pattern
1 < a (a b c) (a c) >
2 < (a d) c >
3 < (e f) (a b) >
Database
ID < a > < b > < c > < d > < e > < f >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) -- -- --
2 (0,0) -- (1,0) (0,1) -- --
3 (1,0) (1,1) -- -- (0,0) (0,1)
Step1: Occurrences of base letters
ID < a > < b > < c > < d > < e > < f >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) -- -- --
2 (0,0) -- (1,0) (0,1) -- --
3 (1,0) (1,1) -- -- (0,0) (0,1)
support 3 2 2 1 1 1min support: 2
frequent patterns of length 1 are: < a >, < b >, < c >
PrefixSpan with a toy example: frequent patterns of length 2
10
Occurrences of frequent patterns of length 1
ID < a > < b > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
ID <a a> <a b> <a c> <b a> <b b> <b c> <c a> <c b> <c c>
1 (1,0), (2,0) (1,1) (1,2), (2,1) (2,0) -- (2,1) (2,0) (2,0) (2,1)
2 -- -- (1,0) -- -- -- -- -- --
3 -- -- -- -- -- -- -- --
Step 2: Occurrences of patterns of length 2
frequent patterns of length 2 are:
< a c >
ID < a > < b > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
Occurrences of frequent base letters
11
PrefixSpan with a toy example: frequent patterns of length 3
Occurrences of frequent base letters
ID < a > < b > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
ID <a c>
1 (1,2), (2,1)
2 (1,0)
3 --
ID <a c a> <a c b> <a c c>
1 (2,0) -- --
2 -- -- --
3 -- -- --
Occurrences of frequent patterns of length 2
Step 3: Occurrences of patterns of length 3
no frequent patterns of length 3
PrefixSpan with a toy example: all frequent patterns
12
ID pattern
1 < a (a b c) (a c) >
2 < (a d) c >
3 < (e f) (a b) >
Database Results: frequent patterns
pattern support
< a > 3
< b > 2
< c > 2
< a c > 2
min Support: 2
Agenda
1. PrefixSpan for Sequential Pattern Mining
2. A PrefixSpan Implementation with Spark
13
PrefixSpan with a toy example: all frequent patterns
14
ID pattern
1 < a (a b c) (a c) >
2 < (a d) c >
3 < (e f) (a b) >
Database
Results: frequent patterns
pattern support
< a > 3
< b > 2
< c > 2
< a c > 2
< (a b) > 2min Support: 2
?
Main class, main function
15
Pattern composition
Main class:
class SequenceRDDFunctions(
database: RDD[(SessionId, Pattern)]
)
Main function:
def mineFrequentPatterns(
patternGenerator: PatternGenerator,
minSupFraction: Double = 0.05
): List[PatternWithOccRddAndSupport]
Pattern
case class Pattern(
elements: Seq
[ItemSet]
)
val p = Pattern(
elements: Seq(a, d, c)
)
ItemSet
case class ItemSet(
items: Seq[Item])
val a = ItemSet(items =
Seq(item1)
)
Item
case class Item(
letter: Letter)
val item1 = Item(
letter =
pageTypeOverviewLetter
)
Letter type Letter = Int
val pageTypeOverviewLetter:
Letter = 1
From Session to Pattern (1 / 2)
16
{
"SessionID":"49624d7e",
...
},
{
"pageType":"overview",
...
},
{
"pageType":"product",
...
},
{
"pageType":"checkout",
...
},
Session
(json)
Visit
case class Visit(
session: Session,
viewList: List[View]
…
)
case class View (
viewId: String,
time: Long,
pageType: PageType,
…)
val a: View = View(
viewId= "view-id",
pageType = PageType.OVERVIEW,
...
val d: View = View(
viewId= "view-id",
pageType = PageType.PRODUCT,
...
)
val c: View = View(
viewId= "view-id",
pageType = PageType.CHECKOUT,
...
)
)
Views
From Session to Pattern (2 / 2)
Creating a pattern from a visit
def pageTypeLetterForView(view: View): Letter =
visits.viewList.view.pageType match {
case PageType.OVERVIEW => pageTypeOverviewLetter
case PageType.PRODUCT => pageTypeProductLetter
case PageType.ACCOUNT => pageTypeCheckoutLetter
}
)
def generatePattern(visit: Visit): Pattern = {
val itemSets = visit.map { (view) =>
val itemSet = pageTypeLetterForView(view)
.map(Item)
.map(ItemSet)
}
Pattern(itemSets)
}
Toy alphabet: pageTypeAlphabet
// letters
val pageTypeOverviewLetter: Letter = 1
val pageTypeCheckoutLetter: Letter = 3
val pageTypeProductLetter: Letter = 4
// alphabet
val pageTypeAlphabet: Alphabet = List(
pageTypeOverviewLetter,
pageTypeProductLetter,
pageTypeCheckoutLetter
)
Towards a database of patterns
Creating a pattern from a visit
def generatePattern(visit: Visit):
Pattern = {
val itemSets = visit.map { (view) =>
val itemSet =
pageTypeLetterForView(view)
.map(Item)
.map(ItemSet)
}
Pattern(itemSets)
}
Creating databse ...
… which is an RDD[(SessionId, Pattern)
def mapToPattern(visits: Seq(Visit)): RDD[(SessionId,
Pattern)] = {
visits.map(
visit =>
(visit.session.sessionId,generatePattern(visit)))
}
Main class, main function
19
Input params
Main class:
class SequenceRDDFunctions(
database: RDD[(SessionId, Pattern)]
)
Main function:
def mineFrequentPatterns(
patternGenerator: PatternGenerator,
minSupFraction: Double = 0.05
): List[PatternWithOccRddAndSupport]
patternGenerator
a function that lets us grab
the alphabet
minSupFraction: Double
= 0.05
a pattern is frequent if it
occurrs in at least 5% of all
database patterns
Main class, main function
20
Main class:
class SequenceRDDFunctions(
database: RDD[(SessionId, Pattern)]
)
Main function:
def mineFrequentPatterns(
patternGenerator: PatternGenerator,
minSupFraction: Double = 0.05
): List[PatternWithOccRddAndSupport]
case class PatternWithOccRddAndSupport(pattern:
Pattern, occRdd: RDD[(SessionId, Occ)], support: Long)
type Occ = Seq[(ItemSetIndex, ItemIndex)]
pattern 1 (a)
example SessionId Occurrence
< a (a b c) (a c) > 1 (0,0), (1,0), (2,0)
< (a d) c > 2 (0,0)
< (e f) (a b) > 3 (1,0)
Support 3
Output
List[PatternWithOccRddAndSupport]
One list item for each length 1 of
frequent patterns
How frequent patterns are found (1/6)
1. Calculate the occurrence table of the baseLetters
def occurrencesOfBaseLetters(baseLetter: Letter): RDD[(SessionId, Occ)] = {
database.mapValues(sessionPattern => {
sessionPattern.occurrence(Item(baseLetter))
})
}
case class Pattern(elements: Seq[ItemSet]) {
def occurrence(item: Item): Occ = {
elements.zipWithIndex.filter({
case (itemSet, itemSetIndex) => itemSet.contains(item)
})
.map({ case (itemSet, itemSetIndex) =>
(itemSetIndex, itemSet.indexOf(item))
})}
}
Reminder:
type Occ = Seq[(ItemSetIndex, ItemIndex)]
How frequent patterns are found (2/6)
22
One instance of PatternWithOccRddAndSupport for
any letter
letter 1
SessionId Occurrence
1 (1,2), (2,1)
2 (1,0)
3 --
Support 2
letter 2
SessionId Occurrence
1 (1,2), (2,1)
2 (1,0)
3 --
Support 2
ID < a > < b > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
...
… corresponds to this table of the
letter occurrence table in the theory
part
1. Calculate the occurrence table of the baseLetters
How frequent patterns are found (3/6)
2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns
val frequentPatterns = Stream.iterate(frequentBaseLettersAndOcccurences)(previousPatterns => {
previousPatterns.par.flatMap((previousPattern: PatternWithOccRddAndSupport) => {
previousPattern.occurrences.persist()
// creates n+1 patterns with enough support
val nextPatterns = getNextFrequentPatterns(previousPattern.pattern, previousPattern.occurrences)
previousPattern.occurrences.unpersist()
nextPatterns
}).toList
}).takeWhile(_.nonEmpty).flatten.toList
How frequent patterns are found (4/6)
2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns
// based on pattern of length n, identifies frequent patterns of length n+1 by appending and assembling frequent letters
def getNextFrequentPatterns(previousPattern: Pattern, previousPatternOccurrences: RDD[(SessionId, Occ)]): List
[PatternWithOccRddAndSupport] = {
val appendedFreqPatterns = frequentBaseLettersAndOcccurences.par.map(baseLetterAndOccs => {
//make <a b> + <b> => <a b b>
val nextPattern = previousPattern ++ baseLetterAndOccs.pattern
// joins occurrences of previous pattern and any letter
val joinedOccurrences: RDD[(SessionId, (Occ, Occ))] = previousPatternOccurrences.join(baseLetterAndOccs.occurrences)
// returns occurrences of new pattern
val nextPatternOccurrences: RDD[(SessionId, Occ)] = joinedOccurrences.mapAppendedOccPairToOcc()
PatternWithOccRddAndSupport(nextPattern, nextPatternOccurrences, nextPatternOccurrences.countSupport())
}) // only leaves pattern with enough support
.filter(_.support >= minSup).toList
appendedFreqPatterns
}
How frequent patterns are found (5/6)
2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns
// based on pattern of length n, identifies frequent patterns of length n+1 by appending and assembling frequent letters
def getNextFrequentPatterns(previousPattern: Pattern, previousPatternOccurrences: RDD[(SessionId, Occ)]): List
[PatternWithOccRddAndSupport] = {
…
// prefix OCC, suffix OCC
val joinedOccurrences: RDD[(SessionId, (Occ, Occ))] = previousPatternOccurrences.join(baseLetterAndOccs.occurrences)
// returns occurrences of new pattern
val nextPatternOccurrences: RDD[(SessionId, Occ)] = joinedOccurrences.mapAppendedOccPairToOcc()
...
def mapAppendedOccPairToOcc(): RDD[(SessionId, Occ)] = {
self.mapValues((occPair: (Occ, Occ)) => {
// occP = occ(<a b>) , occS = occ(<c>)
val (occPrefix, occSuffix) = occPair
PseudoProjection.pseudoProjectionAppend(occPrefix, occSuffix)
})
}
joinedOccurrences example entry:
<ac> append <a>
<ac> - prefixOcc: [(1,0), (2,1)]
<a> - suffixOcc: [(0,0), (1,0), (2,0)]
-> (2,0)
How frequent patterns are found (6/6)
2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns
def pseudoProjectionAppend(prefixOcc: Occ, suffixOcc: Occ): Occ =
suffixOcc.filter({ case (suffixItemSetIndex, suffixItemIndex) =>
// suffix occurrence after first occurrence of prefix
suffixItemSetIndex > prefixOcc.map(
{ case (prefixItemSetIndex, _) => prefixItemSetIndex }).min
})
def mapAppendedOccPairToOcc(): RDD[(SessionId, Occ)] = {
self.mapValues((occPair: (Occ, Occ)) => {
// occP = occ(<a b>) , occS = occ(<c>)
val (occPrefix, occSuffix) = occPair
PseudoProjection.pseudoProjectionAppend(occPrefix, occSuffix)
})
}
joinedOccurrences example entry:
<ac> append <a>
<ac> - prefixOcc: [(1,0), (2,1)]
<a> - suffixOcc: [(0,0), (1,0), (2,0)]
-> (2,0)
Results: Mining frequent patterns of conversions and abandonments
KäuferKaufabbrecher
Häufigkeit
Handlungsempfehlung: Mit potentiellen Abbrechern nach mehrfachen Besuch von zwei Übersichtsseiten
interagieren (Erinnerung an den Warenkorb, Abschluss-orientierte Kaufanreize)
Thank you for your attention!
Any further questions?
Find more information on akanoo.com or
write us a mail to hi@akanoo.com!

More Related Content

What's hot

Bois lamellé collé présentation et fiche technique 2016
Bois lamellé collé présentation et fiche technique 2016Bois lamellé collé présentation et fiche technique 2016
Bois lamellé collé présentation et fiche technique 2016Ageka
 
exposé sur L'architecture gothique
exposé sur L'architecture gothiqueexposé sur L'architecture gothique
exposé sur L'architecture gothiqueHoumour sabrine
 
مقدمة في التصميم الداخلي
مقدمة في التصميم الداخليمقدمة في التصميم الداخلي
مقدمة في التصميم الداخليmohmimare
 
الإرشادات العامة تصميم وإنشاء المستشفيات بحث 2015
الإرشادات العامة تصميم وإنشاء المستشفيات  بحث 2015الإرشادات العامة تصميم وإنشاء المستشفيات  بحث 2015
الإرشادات العامة تصميم وإنشاء المستشفيات بحث 2015mohmimare
 
برنامح الاندنوت EndNote programm
برنامح الاندنوت EndNote programmبرنامح الاندنوت EndNote programm
برنامح الاندنوت EndNote programmMurad Alyemeni
 
الجزء السادس التصميم الانشائى للطرق
الجزء السادس التصميم الانشائى للطرقالجزء السادس التصميم الانشائى للطرق
الجزء السادس التصميم الانشائى للطرقMostafa Khalil
 

What's hot (8)

Architecture gothique
Architecture gothiqueArchitecture gothique
Architecture gothique
 
Bois lamellé collé présentation et fiche technique 2016
Bois lamellé collé présentation et fiche technique 2016Bois lamellé collé présentation et fiche technique 2016
Bois lamellé collé présentation et fiche technique 2016
 
exposé sur L'architecture gothique
exposé sur L'architecture gothiqueexposé sur L'architecture gothique
exposé sur L'architecture gothique
 
مقدمة في التصميم الداخلي
مقدمة في التصميم الداخليمقدمة في التصميم الداخلي
مقدمة في التصميم الداخلي
 
Analyse swot
Analyse swotAnalyse swot
Analyse swot
 
الإرشادات العامة تصميم وإنشاء المستشفيات بحث 2015
الإرشادات العامة تصميم وإنشاء المستشفيات  بحث 2015الإرشادات العامة تصميم وإنشاء المستشفيات  بحث 2015
الإرشادات العامة تصميم وإنشاء المستشفيات بحث 2015
 
برنامح الاندنوت EndNote programm
برنامح الاندنوت EndNote programmبرنامح الاندنوت EndNote programm
برنامح الاندنوت EndNote programm
 
الجزء السادس التصميم الانشائى للطرق
الجزء السادس التصميم الانشائى للطرقالجزء السادس التصميم الانشائى للطرق
الجزء السادس التصميم الانشائى للطرق
 

Similar to PrefixSpan With Spark at Akanoo

Python 내장 함수
Python 내장 함수Python 내장 함수
Python 내장 함수용 최
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialjbellis
 
Tame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapperTame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapperGiordano Scalzo
 
Data in Motion: Streaming Static Data Efficiently
Data in Motion: Streaming Static Data EfficientlyData in Motion: Streaming Static Data Efficiently
Data in Motion: Streaming Static Data EfficientlyMartin Zapletal
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196Mahmoud Samir Fayed
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31Mahmoud Samir Fayed
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84Mahmoud Samir Fayed
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1Ke Wei Louis
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with CassandraJacek Lewandowski
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189Mahmoud Samir Fayed
 
Writing Domain-Specific Languages for BeepBeep
Writing Domain-Specific Languages for BeepBeepWriting Domain-Specific Languages for BeepBeep
Writing Domain-Specific Languages for BeepBeepSylvain Hallé
 
Functional Principles for OO Developers
Functional Principles for OO DevelopersFunctional Principles for OO Developers
Functional Principles for OO Developersjessitron
 
Asciidoctor New, Noteworthy and Beyond Devoxx-2017
Asciidoctor New, Noteworthy and Beyond Devoxx-2017Asciidoctor New, Noteworthy and Beyond Devoxx-2017
Asciidoctor New, Noteworthy and Beyond Devoxx-2017Alex Soto
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리용 최
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB
 

Similar to PrefixSpan With Spark at Akanoo (20)

Python 내장 함수
Python 내장 함수Python 내장 함수
Python 내장 함수
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 
Tame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapperTame Accidental Complexity with Ruby and MongoMapper
Tame Accidental Complexity with Ruby and MongoMapper
 
Data in Motion: Streaming Static Data Efficiently
Data in Motion: Streaming Static Data EfficientlyData in Motion: Streaming Static Data Efficiently
Data in Motion: Streaming Static Data Efficiently
 
03
0303
03
 
Leveraging Symfony2 Forms
Leveraging Symfony2 FormsLeveraging Symfony2 Forms
Leveraging Symfony2 Forms
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Learning with F#
Learning with F#Learning with F#
Learning with F#
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189
 
Intro
IntroIntro
Intro
 
Writing Domain-Specific Languages for BeepBeep
Writing Domain-Specific Languages for BeepBeepWriting Domain-Specific Languages for BeepBeep
Writing Domain-Specific Languages for BeepBeep
 
Functional Principles for OO Developers
Functional Principles for OO DevelopersFunctional Principles for OO Developers
Functional Principles for OO Developers
 
Asciidoctor New, Noteworthy and Beyond Devoxx-2017
Asciidoctor New, Noteworthy and Beyond Devoxx-2017Asciidoctor New, Noteworthy and Beyond Devoxx-2017
Asciidoctor New, Noteworthy and Beyond Devoxx-2017
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

PrefixSpan With Spark at Akanoo

  • 1. PrefixSpan with Spark Frank Wolf & Dr. Gundula Meckenhäuser 07.08.2015
  • 2. Predictive Behavioral Targeting with Akanoo 22 Identify relevant situations Interact in real time Engage in a smart way
  • 3. Agenda 1. PrefixSpan for Sequential Pattern Mining 2. A PrefixSpan Implementation with Spark 3
  • 4. Agenda 1. PrefixSpan for Sequential Pattern Mining 2. A PrefixSpan Implementation with Spark 4
  • 5. cart . . . Motivation 54% buyers 46% non-buyers What is going on within the last 5 page impressions? 5
  • 6. Translating clickstream data into patterns pagetypes: home, overview, product, sale, account, cart, checkout, search, about cart . . . overview overview overview product overview cart . . . 5th last PI 4th last PI 3rd last PI 2nd last PI last PI pattern: identified by SessionId: < ( overview, add ), ( overview, no ), ( overview, no ), ( product, no ), ( overview, no ) > cart . . . add no no no no cart changes: add, remove, no item itemset 6
  • 7. Problem definition 7 pattern 1 < (overview, no) , ( overview, no ), ( overview, no ), ( product, no ), ( overview, no ) > pattern 2 < (home, no), ( product, no ), ( product, no ), ( product, no ), ( overview, no ) > ... pattern n < (overview, no) , ( product, add ), ( product, no ), ( cart, no ), ( checkout, no ) > Example < (overview, no), (product, no)> is a subpattern of patterns 1 and n < (overview, no), (product, remove) > is no subpattern A pattern is frequent with support n if it is n times a subpattern of the database patterns
  • 9. PrefixSpan with a toy example: frequent patterns of length 1 9 ID pattern 1 < a (a b c) (a c) > 2 < (a d) c > 3 < (e f) (a b) > Database ID < a > < b > < c > < d > < e > < f > 1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) -- -- -- 2 (0,0) -- (1,0) (0,1) -- -- 3 (1,0) (1,1) -- -- (0,0) (0,1) Step1: Occurrences of base letters ID < a > < b > < c > < d > < e > < f > 1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) -- -- -- 2 (0,0) -- (1,0) (0,1) -- -- 3 (1,0) (1,1) -- -- (0,0) (0,1) support 3 2 2 1 1 1min support: 2 frequent patterns of length 1 are: < a >, < b >, < c >
  • 10. PrefixSpan with a toy example: frequent patterns of length 2 10 Occurrences of frequent patterns of length 1 ID < a > < b > < c > 1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) 2 (0,0) -- (1,0) 3 (1,0) (1,1) -- ID <a a> <a b> <a c> <b a> <b b> <b c> <c a> <c b> <c c> 1 (1,0), (2,0) (1,1) (1,2), (2,1) (2,0) -- (2,1) (2,0) (2,0) (2,1) 2 -- -- (1,0) -- -- -- -- -- -- 3 -- -- -- -- -- -- -- -- Step 2: Occurrences of patterns of length 2 frequent patterns of length 2 are: < a c > ID < a > < b > < c > 1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) 2 (0,0) -- (1,0) 3 (1,0) (1,1) -- Occurrences of frequent base letters
  • 11. 11 PrefixSpan with a toy example: frequent patterns of length 3 Occurrences of frequent base letters ID < a > < b > < c > 1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) 2 (0,0) -- (1,0) 3 (1,0) (1,1) -- ID <a c> 1 (1,2), (2,1) 2 (1,0) 3 -- ID <a c a> <a c b> <a c c> 1 (2,0) -- -- 2 -- -- -- 3 -- -- -- Occurrences of frequent patterns of length 2 Step 3: Occurrences of patterns of length 3 no frequent patterns of length 3
  • 12. PrefixSpan with a toy example: all frequent patterns 12 ID pattern 1 < a (a b c) (a c) > 2 < (a d) c > 3 < (e f) (a b) > Database Results: frequent patterns pattern support < a > 3 < b > 2 < c > 2 < a c > 2 min Support: 2
  • 13. Agenda 1. PrefixSpan for Sequential Pattern Mining 2. A PrefixSpan Implementation with Spark 13
  • 14. PrefixSpan with a toy example: all frequent patterns 14 ID pattern 1 < a (a b c) (a c) > 2 < (a d) c > 3 < (e f) (a b) > Database Results: frequent patterns pattern support < a > 3 < b > 2 < c > 2 < a c > 2 < (a b) > 2min Support: 2 ?
  • 15. Main class, main function 15 Pattern composition Main class: class SequenceRDDFunctions( database: RDD[(SessionId, Pattern)] ) Main function: def mineFrequentPatterns( patternGenerator: PatternGenerator, minSupFraction: Double = 0.05 ): List[PatternWithOccRddAndSupport] Pattern case class Pattern( elements: Seq [ItemSet] ) val p = Pattern( elements: Seq(a, d, c) ) ItemSet case class ItemSet( items: Seq[Item]) val a = ItemSet(items = Seq(item1) ) Item case class Item( letter: Letter) val item1 = Item( letter = pageTypeOverviewLetter ) Letter type Letter = Int val pageTypeOverviewLetter: Letter = 1
  • 16. From Session to Pattern (1 / 2) 16 { "SessionID":"49624d7e", ... }, { "pageType":"overview", ... }, { "pageType":"product", ... }, { "pageType":"checkout", ... }, Session (json) Visit case class Visit( session: Session, viewList: List[View] … ) case class View ( viewId: String, time: Long, pageType: PageType, …) val a: View = View( viewId= "view-id", pageType = PageType.OVERVIEW, ... val d: View = View( viewId= "view-id", pageType = PageType.PRODUCT, ... ) val c: View = View( viewId= "view-id", pageType = PageType.CHECKOUT, ... ) ) Views
  • 17. From Session to Pattern (2 / 2) Creating a pattern from a visit def pageTypeLetterForView(view: View): Letter = visits.viewList.view.pageType match { case PageType.OVERVIEW => pageTypeOverviewLetter case PageType.PRODUCT => pageTypeProductLetter case PageType.ACCOUNT => pageTypeCheckoutLetter } ) def generatePattern(visit: Visit): Pattern = { val itemSets = visit.map { (view) => val itemSet = pageTypeLetterForView(view) .map(Item) .map(ItemSet) } Pattern(itemSets) } Toy alphabet: pageTypeAlphabet // letters val pageTypeOverviewLetter: Letter = 1 val pageTypeCheckoutLetter: Letter = 3 val pageTypeProductLetter: Letter = 4 // alphabet val pageTypeAlphabet: Alphabet = List( pageTypeOverviewLetter, pageTypeProductLetter, pageTypeCheckoutLetter )
  • 18. Towards a database of patterns Creating a pattern from a visit def generatePattern(visit: Visit): Pattern = { val itemSets = visit.map { (view) => val itemSet = pageTypeLetterForView(view) .map(Item) .map(ItemSet) } Pattern(itemSets) } Creating databse ... … which is an RDD[(SessionId, Pattern) def mapToPattern(visits: Seq(Visit)): RDD[(SessionId, Pattern)] = { visits.map( visit => (visit.session.sessionId,generatePattern(visit))) }
  • 19. Main class, main function 19 Input params Main class: class SequenceRDDFunctions( database: RDD[(SessionId, Pattern)] ) Main function: def mineFrequentPatterns( patternGenerator: PatternGenerator, minSupFraction: Double = 0.05 ): List[PatternWithOccRddAndSupport] patternGenerator a function that lets us grab the alphabet minSupFraction: Double = 0.05 a pattern is frequent if it occurrs in at least 5% of all database patterns
  • 20. Main class, main function 20 Main class: class SequenceRDDFunctions( database: RDD[(SessionId, Pattern)] ) Main function: def mineFrequentPatterns( patternGenerator: PatternGenerator, minSupFraction: Double = 0.05 ): List[PatternWithOccRddAndSupport] case class PatternWithOccRddAndSupport(pattern: Pattern, occRdd: RDD[(SessionId, Occ)], support: Long) type Occ = Seq[(ItemSetIndex, ItemIndex)] pattern 1 (a) example SessionId Occurrence < a (a b c) (a c) > 1 (0,0), (1,0), (2,0) < (a d) c > 2 (0,0) < (e f) (a b) > 3 (1,0) Support 3 Output List[PatternWithOccRddAndSupport] One list item for each length 1 of frequent patterns
  • 21. How frequent patterns are found (1/6) 1. Calculate the occurrence table of the baseLetters def occurrencesOfBaseLetters(baseLetter: Letter): RDD[(SessionId, Occ)] = { database.mapValues(sessionPattern => { sessionPattern.occurrence(Item(baseLetter)) }) } case class Pattern(elements: Seq[ItemSet]) { def occurrence(item: Item): Occ = { elements.zipWithIndex.filter({ case (itemSet, itemSetIndex) => itemSet.contains(item) }) .map({ case (itemSet, itemSetIndex) => (itemSetIndex, itemSet.indexOf(item)) })} } Reminder: type Occ = Seq[(ItemSetIndex, ItemIndex)]
  • 22. How frequent patterns are found (2/6) 22 One instance of PatternWithOccRddAndSupport for any letter letter 1 SessionId Occurrence 1 (1,2), (2,1) 2 (1,0) 3 -- Support 2 letter 2 SessionId Occurrence 1 (1,2), (2,1) 2 (1,0) 3 -- Support 2 ID < a > < b > < c > 1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) 2 (0,0) -- (1,0) 3 (1,0) (1,1) -- ... … corresponds to this table of the letter occurrence table in the theory part 1. Calculate the occurrence table of the baseLetters
  • 23. How frequent patterns are found (3/6) 2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns val frequentPatterns = Stream.iterate(frequentBaseLettersAndOcccurences)(previousPatterns => { previousPatterns.par.flatMap((previousPattern: PatternWithOccRddAndSupport) => { previousPattern.occurrences.persist() // creates n+1 patterns with enough support val nextPatterns = getNextFrequentPatterns(previousPattern.pattern, previousPattern.occurrences) previousPattern.occurrences.unpersist() nextPatterns }).toList }).takeWhile(_.nonEmpty).flatten.toList
  • 24. How frequent patterns are found (4/6) 2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns // based on pattern of length n, identifies frequent patterns of length n+1 by appending and assembling frequent letters def getNextFrequentPatterns(previousPattern: Pattern, previousPatternOccurrences: RDD[(SessionId, Occ)]): List [PatternWithOccRddAndSupport] = { val appendedFreqPatterns = frequentBaseLettersAndOcccurences.par.map(baseLetterAndOccs => { //make <a b> + <b> => <a b b> val nextPattern = previousPattern ++ baseLetterAndOccs.pattern // joins occurrences of previous pattern and any letter val joinedOccurrences: RDD[(SessionId, (Occ, Occ))] = previousPatternOccurrences.join(baseLetterAndOccs.occurrences) // returns occurrences of new pattern val nextPatternOccurrences: RDD[(SessionId, Occ)] = joinedOccurrences.mapAppendedOccPairToOcc() PatternWithOccRddAndSupport(nextPattern, nextPatternOccurrences, nextPatternOccurrences.countSupport()) }) // only leaves pattern with enough support .filter(_.support >= minSup).toList appendedFreqPatterns }
  • 25. How frequent patterns are found (5/6) 2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns // based on pattern of length n, identifies frequent patterns of length n+1 by appending and assembling frequent letters def getNextFrequentPatterns(previousPattern: Pattern, previousPatternOccurrences: RDD[(SessionId, Occ)]): List [PatternWithOccRddAndSupport] = { … // prefix OCC, suffix OCC val joinedOccurrences: RDD[(SessionId, (Occ, Occ))] = previousPatternOccurrences.join(baseLetterAndOccs.occurrences) // returns occurrences of new pattern val nextPatternOccurrences: RDD[(SessionId, Occ)] = joinedOccurrences.mapAppendedOccPairToOcc() ... def mapAppendedOccPairToOcc(): RDD[(SessionId, Occ)] = { self.mapValues((occPair: (Occ, Occ)) => { // occP = occ(<a b>) , occS = occ(<c>) val (occPrefix, occSuffix) = occPair PseudoProjection.pseudoProjectionAppend(occPrefix, occSuffix) }) } joinedOccurrences example entry: <ac> append <a> <ac> - prefixOcc: [(1,0), (2,1)] <a> - suffixOcc: [(0,0), (1,0), (2,0)] -> (2,0)
  • 26. How frequent patterns are found (6/6) 2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns def pseudoProjectionAppend(prefixOcc: Occ, suffixOcc: Occ): Occ = suffixOcc.filter({ case (suffixItemSetIndex, suffixItemIndex) => // suffix occurrence after first occurrence of prefix suffixItemSetIndex > prefixOcc.map( { case (prefixItemSetIndex, _) => prefixItemSetIndex }).min }) def mapAppendedOccPairToOcc(): RDD[(SessionId, Occ)] = { self.mapValues((occPair: (Occ, Occ)) => { // occP = occ(<a b>) , occS = occ(<c>) val (occPrefix, occSuffix) = occPair PseudoProjection.pseudoProjectionAppend(occPrefix, occSuffix) }) } joinedOccurrences example entry: <ac> append <a> <ac> - prefixOcc: [(1,0), (2,1)] <a> - suffixOcc: [(0,0), (1,0), (2,0)] -> (2,0)
  • 27. Results: Mining frequent patterns of conversions and abandonments KäuferKaufabbrecher Häufigkeit Handlungsempfehlung: Mit potentiellen Abbrechern nach mehrfachen Besuch von zwei Übersichtsseiten interagieren (Erinnerung an den Warenkorb, Abschluss-orientierte Kaufanreize)
  • 28. Thank you for your attention! Any further questions? Find more information on akanoo.com or write us a mail to hi@akanoo.com!

Editor's Notes

  1. A motivation for mining frequent patterns in the field of e-commerce: At one of our biggest customers we have seen that 10% of the users check their cart, but only 54% buyt. To understand the behavior on the last 5 impressions, we wanted to mine frequent patterns of conversions and abandonments. Another example: I customers typically rent “Star Wars", then Empire Strikes Back", and then “Return of the Jedi". Note that these rentals need not be consecutive. Customers who rent some other videos in between also support this sequential pattern. < “Star Wars”, “Empire Strikes Back”, “Return of the Jedi” >
  2. Therefor, we have to translate clickstream data into patterns: We can extract pagetypes from each page impression and also cart changes such as adding or removing an item from the cart By doing so, we translate a clickstream into a pattern where each itemset corresponds to a page impression
  3. The algorithm starts to find frequent base patterns of length 1. In the following steps frequent patterns are identified by extending the mined patterns of the step before with frequent base patterns and we search for occurrences in a relevant part of the initial database. More detailed: PrefixSpan [5] is the most promising of the pattern-growth methods and is based on recursively constructing the patterns, as shown in figure 5. Its great advantage is the use of projected databases. An α-projected database is the set of subsequences in the database, that are suffixes of the sequences that have prefix α. In each step, the algorithm looks for the frequent sequences with prefix α, in the correspondent projected database. In this way the search space is reduced in each step, allowing for better performances in the presence of small support thresholds.
  4. In the next slides I will explain the algorithm within an example The first step is to calcultate the occurrences of the base patterns. Base patterns are: … What is the occurence of pattern < a > in the pattern 1? If we count zero-based, then a occurrs in the itemset 0 at index 0. It also occurrs in the first itemset at index zero. Plus at the second itemset at index 0. And in pattern 3? It appears in we call < a > a prefix each column is a realisation of the < a > pseudo-projected database (of the suffixes) We will see, that in the next steps, the pseudo-projected database will shrink
  5. We keep the table of the frequent base letters. First, we build up candidates by extending the frequent patterns of length 1 with frequent base patterns. (pruning) Then, we calculate the occurences: What is the occurrence of pattern < a c > in pattern 1? Well c occurres in itemset 1 and a appears in pattern 1 in itemset 0, before itemset 1. Thus, < a c > occurres at (1,2). Pruning: A prerequisite of a frequent pattern is that each subpattern of a frequent pattern has to be frequent.
  6. We keep the table of the frequent base letters and the table of occurrences of patterns of length 2 First, we build up candidates by extending the frequent patterns of length 2 with frequent base patterns. (pruning) Then, we calculate the occurences: What is the occurrence of pattern < a c a > in pattern 1? Well a occurres at itemset 0 but there is no chance that a c appears before, so we don’t take this occurrence. A occurres at itemset 1 but there is no chance that a c appears before, so we don’t take this occurrence. A occurres at itemset 2 and, indeed, <a c> occurres at (1,2), so we copy this occurrence There are no frequent patterns of length 3.
  7. The support is defined as fraction of the patternCount of the database
  8. The support is defined as fraction of the patternCount of the database
  9. The support is defined as fraction of the patternCount of the database
  10. The support is defined as fraction of the patternCount of the database
  11. The support is defined as fraction of the patternCount of the database
  12. In der Tat: das Muster Ü, Ü, Ü, Ü, Ü kann auch bei Käufern weiter vorne stattfinden, allerdings können wir mit Hilfe der Kaufwahrscheinlichkeit Käufer von Nichtkäufern trennen.
  13. In der Tat: das Muster Ü, Ü, Ü, Ü, Ü kann auch bei Käufern weiter vorne stattfinden, allerdings können wir mit Hilfe der Kaufwahrscheinlichkeit Käufer von Nichtkäufern trennen.