PrefixSpan With Spark at Akanoo

PrefixSpan with Spark
Frank Wolf & Dr. Gundula Meckenhäuser
07.08.2015

Predictive Behavioral Targeting with Akanoo
22
Identify relevant situations
Interact in real time
Engage in a smart way

Agenda
1. PrefixSpan for Sequential Pattern Mining
2. A PrefixSpan Implementation with Spark
3

Agenda
4

cart . . .
Motivation
54% buyers
46% non-buyers
What is going on within the last 5 page impressions?
5

Translating clickstream data into patterns
pagetypes: home, overview, product, sale, account, cart, checkout, search, about
cart . . . overview overview overview product overview
cart . . . 5th last PI 4th last PI 3rd last PI 2nd last PI last PI
pattern: identified by SessionId:
< ( overview, add ), ( overview, no ), ( overview, no ), ( product, no ), ( overview, no ) >
cart . . . add no no no no
cart changes: add, remove, no
item
itemset
6

Problem definition
7
pattern 1 < (overview, no) , ( overview, no ), ( overview, no ), ( product, no ), ( overview, no ) >
pattern 2 < (home, no), ( product, no ), ( product, no ), ( product, no ), ( overview, no ) >
...
pattern n < (overview, no) , ( product, add ), ( product, no ), ( cart, no ), ( checkout, no ) >
Example < (overview, no), (product, no)> is a subpattern of patterns 1 and n
< (overview, no), (product, remove) > is no subpattern
A pattern is frequent with support n if it is n times a subpattern of the database patterns

PrefixSpan
8
Sequential Pattern Mining by Pattern Growth

PrefixSpan with a toy example: frequent patterns of length 1
9
ID pattern
1 < a (a b c) (a c) >
2 < (a d) c >
3 < (e f) (a b) >
Database
ID < a > < c > < d > < e > < f >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) -- -- --
2 (0,0) -- (1,0) (0,1) -- --
3 (1,0) (1,1) -- -- (0,0) (0,1)
Step1: Occurrences of base letters
ID < a > < c > < d > < e > < f >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1) -- -- --
2 (0,0) -- (1,0) (0,1) -- --
3 (1,0) (1,1) -- -- (0,0) (0,1)
support 3 2 2 1 1 1min support: 2
frequent patterns of length 1 are: < a >, , < c >

10
Occurrences of frequent patterns of length 1
ID < a > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
ID <a a> <a b> <a c> <c a> <c b> <c c>
1 (1,0), (2,0) (1,1) (1,2), (2,1) (2,0) -- (2,1) (2,0) (2,0) (2,1)
2 -- -- (1,0) -- -- -- -- -- --
3 -- -- -- -- -- -- -- --
Step 2: Occurrences of patterns of length 2
frequent patterns of length 2 are:
< a c >
ID < a > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
Occurrences of frequent base letters

11
Occurrences of frequent base letters
ID < a > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
ID <a c>
1 (1,2), (2,1)
2 (1,0)
3 --
ID <a c a> <a c b> <a c c>
1 (2,0) -- --
2 -- -- --
3 -- -- --
Occurrences of frequent patterns of length 2
Step 3: Occurrences of patterns of length 3
no frequent patterns of length 3

PrefixSpan with a toy example: all frequent patterns
12
ID pattern
1 < a (a b c) (a c) >
2 < (a d) c >
3 < (e f) (a b) >
Database Results: frequent patterns
pattern support
< a > 3
 2
< c > 2
< a c > 2
min Support: 2

Agenda
13

PrefixSpan with a toy example: all frequent patterns
14
ID pattern
1 < a (a b c) (a c) >
2 < (a d) c >
3 < (e f) (a b) >
Database
Results: frequent patterns
pattern support
< a > 3
 2
< c > 2
< a c > 2
< (a b) > 2min Support: 2
?

Main class, main function
15
Pattern composition
Main class:
class SequenceRDDFunctions(
database: RDD[(SessionId, Pattern)]
)
Main function:
def mineFrequentPatterns(
patternGenerator: PatternGenerator,
minSupFraction: Double = 0.05
): List[PatternWithOccRddAndSupport]
Pattern
case class Pattern(
elements: Seq
[ItemSet]
)
val p = Pattern(
elements: Seq(a, d, c)
)
ItemSet
case class ItemSet(
items: Seq[Item])
val a = ItemSet(items =
Seq(item1)
)
Item
case class Item(
letter: Letter)
val item1 = Item(
letter =
pageTypeOverviewLetter
)
Letter type Letter = Int
val pageTypeOverviewLetter:
Letter = 1

From Session to Pattern (1 / 2)
16
{
"SessionID":"49624d7e",
...
},
{
"pageType":"overview",
...
},
{
"pageType":"product",
...
},
{
"pageType":"checkout",
...
},
Session
(json)
Visit
case class Visit(
session: Session,
viewList: List[View]
…
)
case class View (
viewId: String,
time: Long,
pageType: PageType,
…)
val a: View = View(
viewId= "view-id",
pageType = PageType.OVERVIEW,
...
val d: View = View(
viewId= "view-id",
pageType = PageType.PRODUCT,
...
)
val c: View = View(
viewId= "view-id",
pageType = PageType.CHECKOUT,
...
)
)
Views

From Session to Pattern (2 / 2)
Creating a pattern from a visit
def pageTypeLetterForView(view: View): Letter =
visits.viewList.view.pageType match {
case PageType.OVERVIEW => pageTypeOverviewLetter
case PageType.PRODUCT => pageTypeProductLetter
case PageType.ACCOUNT => pageTypeCheckoutLetter
}
)
def generatePattern(visit: Visit): Pattern = {
val itemSets = visit.map { (view) =>
val itemSet = pageTypeLetterForView(view)
.map(Item)
.map(ItemSet)
}
Pattern(itemSets)
}
Toy alphabet: pageTypeAlphabet
// letters
val pageTypeOverviewLetter: Letter = 1
val pageTypeCheckoutLetter: Letter = 3
val pageTypeProductLetter: Letter = 4
// alphabet
val pageTypeAlphabet: Alphabet = List(
pageTypeOverviewLetter,
pageTypeProductLetter,
pageTypeCheckoutLetter
)

Towards a database of patterns
Creating a pattern from a visit
def generatePattern(visit: Visit):
Pattern = {
val itemSets = visit.map { (view) =>
val itemSet =
pageTypeLetterForView(view)
.map(Item)
.map(ItemSet)
}
Pattern(itemSets)
}
Creating databse ...
… which is an RDD[(SessionId, Pattern)
def mapToPattern(visits: Seq(Visit)): RDD[(SessionId,
Pattern)] = {
visits.map(
visit =>
(visit.session.sessionId,generatePattern(visit)))
}

19
Input params
Main class:
)
Main function:
patternGenerator
a function that lets us grab
the alphabet
minSupFraction: Double
= 0.05
a pattern is frequent if it
occurrs in at least 5% of all
database patterns

20
Main class:
)
Main function:
case class PatternWithOccRddAndSupport(pattern:
Pattern, occRdd: RDD[(SessionId, Occ)], support: Long)
type Occ = Seq[(ItemSetIndex, ItemIndex)]
pattern 1 (a)
example SessionId Occurrence
< a (a b c) (a c) > 1 (0,0), (1,0), (2,0)
< (a d) c > 2 (0,0)
< (e f) (a b) > 3 (1,0)
Support 3
Output
List[PatternWithOccRddAndSupport]
One list item for each length 1 of
frequent patterns

How frequent patterns are found (1/6)
1. Calculate the occurrence table of the baseLetters
def occurrencesOfBaseLetters(baseLetter: Letter): RDD[(SessionId, Occ)] = {
database.mapValues(sessionPattern => {
sessionPattern.occurrence(Item(baseLetter))
})
}
case class Pattern(elements: Seq[ItemSet]) {
def occurrence(item: Item): Occ = {
elements.zipWithIndex.filter({
case (itemSet, itemSetIndex) => itemSet.contains(item)
})
.map({ case (itemSet, itemSetIndex) =>
(itemSetIndex, itemSet.indexOf(item))
})}
}
Reminder:
type Occ = Seq[(ItemSetIndex, ItemIndex)]

22
One instance of PatternWithOccRddAndSupport for
any letter
letter 1
SessionId Occurrence
1 (1,2), (2,1)
2 (1,0)
3 --
Support 2
letter 2
SessionId Occurrence
1 (1,2), (2,1)
2 (1,0)
3 --
Support 2
ID < a > < c >
1 (0,0), (1,0), (2,0) (1,1) (1,2), (2,1)
2 (0,0) -- (1,0)
3 (1,0) (1,1) --
...
… corresponds to this table of the
letter occurrence table in the theory
part
1. Calculate the occurrence table of the baseLetters

2. Calculate the occurrence tables of (n+1)-Patterns from n-Patterns
val frequentPatterns = Stream.iterate(frequentBaseLettersAndOcccurences)(previousPatterns => {
previousPatterns.par.flatMap((previousPattern: PatternWithOccRddAndSupport) => {
previousPattern.occurrences.persist()
// creates n+1 patterns with enough support
val nextPatterns = getNextFrequentPatterns(previousPattern.pattern, previousPattern.occurrences)
previousPattern.occurrences.unpersist()
nextPatterns
}).toList
}).takeWhile(_.nonEmpty).flatten.toList

// based on pattern of length n, identifies frequent patterns of length n+1 by appending and assembling frequent letters
def getNextFrequentPatterns(previousPattern: Pattern, previousPatternOccurrences: RDD[(SessionId, Occ)]): List
[PatternWithOccRddAndSupport] = {
val appendedFreqPatterns = frequentBaseLettersAndOcccurences.par.map(baseLetterAndOccs => {
//make <a b> + => <a b b>
val nextPattern = previousPattern ++ baseLetterAndOccs.pattern
// joins occurrences of previous pattern and any letter
val joinedOccurrences: RDD[(SessionId, (Occ, Occ))] = previousPatternOccurrences.join(baseLetterAndOccs.occurrences)
// returns occurrences of new pattern
val nextPatternOccurrences: RDD[(SessionId, Occ)] = joinedOccurrences.mapAppendedOccPairToOcc()
PatternWithOccRddAndSupport(nextPattern, nextPatternOccurrences, nextPatternOccurrences.countSupport())
}) // only leaves pattern with enough support
.filter(_.support >= minSup).toList
appendedFreqPatterns
}

// based on pattern of length n, identifies frequent patterns of length n+1 by appending and assembling frequent letters
def getNextFrequentPatterns(previousPattern: Pattern, previousPatternOccurrences: RDD[(SessionId, Occ)]): List
[PatternWithOccRddAndSupport] = {
…
// prefix OCC, suffix OCC
val joinedOccurrences: RDD[(SessionId, (Occ, Occ))] = previousPatternOccurrences.join(baseLetterAndOccs.occurrences)
// returns occurrences of new pattern
val nextPatternOccurrences: RDD[(SessionId, Occ)] = joinedOccurrences.mapAppendedOccPairToOcc()
...
def mapAppendedOccPairToOcc(): RDD[(SessionId, Occ)] = {
self.mapValues((occPair: (Occ, Occ)) => {
// occP = occ(<a b>) , occS = occ(<c>)
val (occPrefix, occSuffix) = occPair
PseudoProjection.pseudoProjectionAppend(occPrefix, occSuffix)
})
}
joinedOccurrences example entry:
<ac> append <a>
<ac> - prefixOcc: [(1,0), (2,1)]
<a> - suffixOcc: [(0,0), (1,0), (2,0)]
-> (2,0)

def pseudoProjectionAppend(prefixOcc: Occ, suffixOcc: Occ): Occ =
suffixOcc.filter({ case (suffixItemSetIndex, suffixItemIndex) =>
// suffix occurrence after first occurrence of prefix
suffixItemSetIndex > prefixOcc.map(
{ case (prefixItemSetIndex, _) => prefixItemSetIndex }).min
})
def mapAppendedOccPairToOcc(): RDD[(SessionId, Occ)] = {
self.mapValues((occPair: (Occ, Occ)) => {
// occP = occ(<a b>) , occS = occ(<c>)
val (occPrefix, occSuffix) = occPair
PseudoProjection.pseudoProjectionAppend(occPrefix, occSuffix)
})
}
joinedOccurrences example entry:
<ac> append <a>
<ac> - prefixOcc: [(1,0), (2,1)]
<a> - suffixOcc: [(0,0), (1,0), (2,0)]
-> (2,0)

Results: Mining frequent patterns of conversions and abandonments
KäuferKaufabbrecher
Häufigkeit
Handlungsempfehlung: Mit potentiellen Abbrechern nach mehrfachen Besuch von zwei Übersichtsseiten
interagieren (Erinnerung an den Warenkorb, Abschluss-orientierte Kaufanreize)

Thank you for your attention!
Any further questions?
Find more information on akanoo.com or
write us a mail to hi@akanoo.com!

PrefixSpan With Spark at Akanoo

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to PrefixSpan With Spark at Akanoo

Similar to PrefixSpan With Spark at Akanoo (20)

Recently uploaded

Recently uploaded (20)

PrefixSpan With Spark at Akanoo

Editor's Notes