Gunjan insight student conference v2

A Sequence-based and Context Modelling
Framework for Recommendation
Gunjan Kumar, Houssem Jerbi, and Michael P. O’Mahony
Insight Centre for Data Analytics
University College Dublin
Insight Student Conference 2017, Cork
Sep 8, 2017

Ubiquitous Sensors
Wearable Camera
Phone sensors & apps
Wearable Devices

Rich User Activity Data
• Sequential nature of user activities
• Activities have associated features/context, e.g.
location, time, weather, etc.

Rich User Activity Data
For Recommender Systems
Facilitates real time recommendations for a given user and context
(e.g. time, location, weather, etc.)
Our Work
A framework for sequence- and context-based recommendation:
• Recommending the next activity to perform.
- Lifelogging
- Transportation
• Recommending the next sequence of activities to perform.
- Tourism
Insight Centre for Data Analytics RecTour 2017 Slide 4

Related Work
Capturing
Sequence
Capturing
Context
Recommending
Sequences

Related Work
Capturing
Sequence
Capturing
Context
Recommending
Sequences
Hierarchical-graph-based models
[Li et al., 2008; Zheng et al., 2009; Yoon et al., 2010]
All-kth -order Markov models
[Bohnenberger and Jameson, 2001; Deshpande and
Karypis, 2004; Shani et al., 2005]
Tensor and matrix factorization models
[Zheng et al., 2010, 2012; Wang et al., 2010; Symeonidis et al., 2011;
Adomavicius et al., 2011; Braunhofer et al., 2013]
Music Playlists
[Baccigalupo and Plaza, 2006; Chen et al., 2012]
POI/Itinerary
[Tai et al., 2008; Yoon et al., 2012]
Activity Recommendation Framework
[Kumar et al., 2017]
Stochastic Modelling
[Sun et al., 2016]
Activity Recommendation Framework
[Kumar et al., 2014, 2016]

Our Contribution
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context.
• A ML approach to learn optimal subsequence length for
matching current and past subsequences of user activity
patterns.
• Application of the proposed approach in lifelogging [iiWAS 2014],
transportation [KDD UrbComp 2016] and tourism [RecSys RecTour
2017] domains.

Framework Overview
Ranking
User Data TimelinesData Modelling
Timeline Matching
Append top ranked activity
to timeline/RS
Similarity Assesment
Recommended Sequence (RS)

Data Model
Activity Object
A single occurrence of an activity and consists of a set of features
describing the activity or the context.
Activity Timeline
A chronological sequence of n activity objects performed by the
user during a time interval δ:
T =< ao1, ao2, ..., aon >
Museum, 10:15, (53.343N,-6.317W), 11015
Theatre, 16:50, (53.340,-6.263W), 2008
Italian-Food, 12:31, (53.339N,-6.277W), 128
ao1 ao2 ao3

Recommendation Algorithm
• Extracting candidate timelines from user timelines.
• ML: learning the optimal matching unit based on regularity
(sample entropy) and repetition (k-grams) in user activity
patterns in timelines.
• Timeline matching approaches:
• N-count matching
• N-hours matching
• Daywise matching
• Similarity Assessment: Two-level Edit Distance
• Ranking
• Predicting the next sequence of activities

Recommendation Algorithm: Overview
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
Score(aoj
rec) = 1 −
d(Tj, Tc) − min
Tp∈T
d(Tp, Tc)
max
Tp∈T
d(Tp, Tc) − min
Tp∈T
d(Tp, Tc)

Datasets
1. Lifelogging: 5 users, 41.4k images
2. Geolife Modes of transport: 18 users, 334 trips
3. Gowalla Checkins: 9.2k users, 6 million checkins
• median # checkins per user per day: 10 - 270
• median # of distinct categories per user per day : 2 - 76
0
500
1,000
1,500
2,000
2,500
3,000
3,500
0 25 50 75 100
Median #checkins per user per day
Numofusers
0
1000
2000
3000
4000
5000
0 10 20 30 40 50 60 70 80
Median #distinct categories per user per day
Numofusers

Methodology
User Timeline
Time
00 hrs 00 hrs 00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Recommendation time
(RT)
• Leave-one-out evaluation: Each user’s complete timeline is
split into training and test timelines by time.
• Agreement @ k (k = 1, 2, 3): % of RTs for a user where the
first k categories in the recommended sequence and the actual
sequence are an exact match.
• Recommendation algorithms:
• N-count recommendation algorithm (SeqNCSeqRec)
• Bi-gram-based sequence recommender (BiGramSeqRec)
• Popularity-based sequence recommender (PopSeqRec)

Recommendation Performance (Level 2)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 2 in the hierarchy.

Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Mexican
Travel
Asian
Italian
Coﬀee Shop
South American/Latin
Fish & Chips
Starbucks
Dunkin Donuts... ...
Level 1 Level 2 Level 3
(7) (134) (151)

Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Live Music
Travel
Casino
Art Museum
Ice Skating
Theatre
Zoo
...
Level 1 Level 2 Level 3
(7) (134) (151)

Recommendation Performance (Level 1)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 1 in the hierarchy.

Conclusion
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context [Kumar et al., 2014, 2016, 2017].
• Experiments demonstrate the efficacy of our approach in
recommending sequences given a diverse variety of activities
and user activity patterns.

Future Work
• Consider alternative approaches to suggest sequences of
activities (for example, using RNNs).
• Introduce new evaluation metrics for evaluating sequence
recommendation.
• Investigate the recommendation of context (for example,
where, when, with whom etc.) associated with each of the
suggested sequence of activities.
• Consider socio-economic characteristics, user demographics,
and travel variables.

Thank You
gunjan.kumar@insight-centre.org
@gunjanthesystem

References I
G. Dong and J. Pei. Sequence Data Mining (Advances in Database Systems).
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. ISBN 0387699368.
G. Kumar, H. Jerbi, C. Gurrin, and M. P. O’Mahony. Towards activity
recommendation from lifelogs. In Proceedings of the 16th International Conference
on Information Integration and Web-based Applications & Services, iiWAS ’14,
pages 87–96, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3001-5. doi:
10.1145/2684200.2684298. URL http://doi.acm.org/10.1145/2684200.2684298.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Personalised recommendations for modes
of transport: A sequence-based approach. The 5th ACM SIGKDD International
Workshop on Urban Computing (UrbComp 2016), 2016.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Towards the recommendation of
personalised activity sequences in the tourism domain. The 2nd ACM RecSys
Workshop on Recommenders in Tourism (RecTour 2017), 2017.
Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classiﬁcation. SIGKDD
Explor. Newsl., 12(1):40–48, Nov. 2010. ISSN 1931-0145. doi:
10.1145/1882471.1882478. URL http://doi.acm.org/10.1145/1882471.1882478.

Some Extra Bits

User Timeline
Time
00 hrs 00 hrs 00 hrs

User Timeline
Time
00 hrs 00 hrs 00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?

00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Current Timeline

00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Current Timeline
...

Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
... N-count matching (N = 4)
124 3
124 3
124 3
124 3
Matching unit
Determines the length of the subsequences to be compared.

Similarity Assessment
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
...
124 3
124 3
124 3
124 3
Two-level Edit Distance

Ranking
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
...
124 3
124 3
124 3
124 3
j
Ranked
1
Score(aoj
rec) = 1 −
d(Tj, Tc) − min
Tp∈T
d(Tp, Tc)
max
Tp∈T
d(Tp, Tc) − min
Tp∈T
d(Tp, Tc)

Recommending
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ?
Current Timeline
...
124 3
124 3
124 3
124 3
j
Ranked
1
aorec1

Recommending
Target Activity Seq
< aot2 , aot3>
Current
Activity (aoc)
? ?
Current Timeline
...
124 3
124 3
124 3
124 3
j
Ranked
1
aorec1

Recommending
Target Activity Seq
< aot3>
Current
Activity (aoc)
? ?
Current Timeline
124 3
aorec2
aorec1
( SeqNCSeqRec)

Recommending
?
Recommended Sequence
aorec2
( SeqNCSeqRec)
aorec1
aorec3

Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:3211:25

Two-level Distance
09:00 10:05 10:35 10:50 11:32
Step 1
Align activities
11:25
( (
dactivity (T1, T2) =
r
i=1
wobj × cins +
s
j=1
wobj × cdel +
t
k=1
wobj × csub

Two-level Distance
09:00 10:05 10:35 10:50 11:32
Step 1
Step 2
Align activities
Align features
11:25
(
(
(
(
d(T1, T2) = dactivity (T1, T2) +
n
i=1
dfeature(ao1
i , ao2
i ) .

Methodology
Learning Optimal Matching Unit range
• Wrapper attribute selection: C4.5 algorithm, greedy
backward search and area under ROC curve as evaluation
measure.
• Classification: pruned attribute vectors for each user fed into
a C4.5 induction algorithm to predict optimal matching unit
range.

Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time

User Timeline
Time
dist-travel
start-geo
start-time

User Timeline
Time
dist-travel
start-geo
dist-travel
start-geo
start-time
start-time

Timeline Attributes
Timeline Attributes
Regularity
Repetition

Timeline Attributes
Regularity Attributes: Sample Entropy
1. SampEnp
z : sample entropy of a feature sequence Sz for epoch
length p,
2. µSampEnp
T : mean sample entropy over all feature sequences
Sz , z = 1, 2, ..., m of timeline T for epoch length p,
3. σSampEnp
T : standard deviation of sample entropy over all
feature sequences Sz , z = 1, 2, ..., m of timeline T for epoch
length p.

Timeline Attributes
Repetition Attributes: k-gram attributes
Previously used for sequence classification, biological sequence
analysis and text classification [Xing et al., 2010; Dong and Pei, 2007].
1. ηk
z : total number of distinct k-grams in feature sequence Sz ,
normalised by total number of k-grams occurring in Sz ,
2. µf k
z : mean frequency of occurrence of distinct k-grams in
feature sequence Sz , normalised by total number of k-grams
occurring in Sz ,
3. σf k
z : standard deviation of frequency of occurrence of distinct
k-grams in feature sequence Sz , normalised by length of Sz .

Trend across matching-unit N
5
6
7
8
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444 6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666 8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
N−count
Mean%agreement
Group1[0]
Group2[1,4]
Group3[5+)
Figure: MRR versus matching unit for three user groups for ﬁrst
sequence index.

Gunjan insight student conference v2

Recommended

Recommended

More Related Content

Similar to Gunjan insight student conference v2

Similar to Gunjan insight student conference v2 (20)

Recently uploaded

Recently uploaded (20)

Gunjan insight student conference v2