Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Gunjan insight student conference v2
1. A Sequence-based and Context Modelling
Framework for Recommendation
Gunjan Kumar, Houssem Jerbi, and Michael P. O’Mahony
Insight Centre for Data Analytics
University College Dublin
Insight Student Conference 2017, Cork
Sep 8, 2017
4. Rich User Activity Data
• Sequential nature of user activities
• Activities have associated features/context, e.g.
location, time, weather, etc.
5. Rich User Activity Data
For Recommender Systems
Facilitates real time recommendations for a given user and context
(e.g. time, location, weather, etc.)
Our Work
A framework for sequence- and context-based recommendation:
• Recommending the next activity to perform.
- Lifelogging
- Transportation
• Recommending the next sequence of activities to perform.
- Tourism
Insight Centre for Data Analytics RecTour 2017 Slide 4
7. Related Work
Capturing
Sequence
Capturing
Context
Recommending
Sequences
Hierarchical-graph-based models
[Li et al., 2008; Zheng et al., 2009; Yoon et al., 2010]
All-kth -order Markov models
[Bohnenberger and Jameson, 2001; Deshpande and
Karypis, 2004; Shani et al., 2005]
Tensor and matrix factorization models
[Zheng et al., 2010, 2012; Wang et al., 2010; Symeonidis et al., 2011;
Adomavicius et al., 2011; Braunhofer et al., 2013]
Music Playlists
[Baccigalupo and Plaza, 2006; Chen et al., 2012]
POI/Itinerary
[Tai et al., 2008; Yoon et al., 2012]
Activity Recommendation Framework
[Kumar et al., 2017]
Stochastic Modelling
[Sun et al., 2016]
Activity Recommendation Framework
[Kumar et al., 2014, 2016]
8. Our Contribution
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context.
• A ML approach to learn optimal subsequence length for
matching current and past subsequences of user activity
patterns.
• Application of the proposed approach in lifelogging [iiWAS 2014],
transportation [KDD UrbComp 2016] and tourism [RecSys RecTour
2017] domains.
Insight Centre for Data Analytics RecTour 2017 Slide 7
9. Framework Overview
Ranking
User Data TimelinesData Modelling
Timeline Matching
Append top ranked activity
to timeline/RS
Similarity Assesment
Recommended Sequence (RS)
10. Data Model
Activity Object
A single occurrence of an activity and consists of a set of features
describing the activity or the context.
Activity Timeline
A chronological sequence of n activity objects performed by the
user during a time interval δ:
T =< ao1, ao2, ..., aon >
Museum, 10:15, (53.343N,-6.317W), 11015
Theatre, 16:50, (53.340,-6.263W), 2008
Italian-Food, 12:31, (53.339N,-6.277W), 128
ao1 ao2 ao3
11. Recommendation Algorithm
• Extracting candidate timelines from user timelines.
• ML: learning the optimal matching unit based on regularity
(sample entropy) and repetition (k-grams) in user activity
patterns in timelines.
• Timeline matching approaches:
• N-count matching
• N-hours matching
• Daywise matching
• Similarity Assessment: Two-level Edit Distance
• Ranking
• Predicting the next sequence of activities
Insight Centre for Data Analytics RecTour 2017 Slide 10
12. Recommendation Algorithm: Overview
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
Score(aoj
rec) = 1 −
d(Tj, Tc) − min
Tp∈T
d(Tp, Tc)
max
Tp∈T
d(Tp, Tc) − min
Tp∈T
d(Tp, Tc)
Insight Centre for Data Analytics RecTour 2017 Slide 11
13. Datasets
1. Lifelogging: 5 users, 41.4k images
2. Geolife Modes of transport: 18 users, 334 trips
3. Gowalla Checkins: 9.2k users, 6 million checkins
• median # checkins per user per day: 10 - 270
• median # of distinct categories per user per day : 2 - 76
0
500
1,000
1,500
2,000
2,500
3,000
3,500
0 25 50 75 100
Median #checkins per user per day
Numofusers
0
1000
2000
3000
4000
5000
0 10 20 30 40 50 60 70 80
Median #distinct categories per user per day
Numofusers
14. Methodology
User Timeline
Time
00 hrs 00 hrs 00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Recommendation time
(RT)
• Leave-one-out evaluation: Each user’s complete timeline is
split into training and test timelines by time.
• Agreement @ k (k = 1, 2, 3): % of RTs for a user where the
first k categories in the recommended sequence and the actual
sequence are an exact match.
• Recommendation algorithms:
• N-count recommendation algorithm (SeqNCSeqRec)
• Bi-gram-based sequence recommender (BiGramSeqRec)
• Popularity-based sequence recommender (PopSeqRec)
Insight Centre for Data Analytics RecTour 2017 Slide 13
15. Recommendation Performance (Level 2)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 2 in the hierarchy.
Insight Centre for Data Analytics RecTour 2017 Slide 14
16. Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Mexican
Travel
Asian
Italian
Coffee Shop
South American/Latin
Fish & Chips
Starbucks
Dunkin Donuts... ...
Level 1 Level 2 Level 3
(7) (134) (151)
Insight Centre for Data Analytics RecTour 2017 Slide 15
17. Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Live Music
Travel
Casino
Art Museum
Ice Skating
Theatre
Zoo
...
Level 1 Level 2 Level 3
(7) (134) (151)
Insight Centre for Data Analytics RecTour 2017 Slide 16
18. Recommendation Performance (Level 1)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 1 in the hierarchy.
Insight Centre for Data Analytics RecTour 2017 Slide 17
19. Conclusion
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context [Kumar et al., 2014, 2016, 2017].
• Experiments demonstrate the efficacy of our approach in
recommending sequences given a diverse variety of activities
and user activity patterns.
Insight Centre for Data Analytics RecTour 2017 Slide 18
20. Future Work
• Consider alternative approaches to suggest sequences of
activities (for example, using RNNs).
• Introduce new evaluation metrics for evaluating sequence
recommendation.
• Investigate the recommendation of context (for example,
where, when, with whom etc.) associated with each of the
suggested sequence of activities.
• Consider socio-economic characteristics, user demographics,
and travel variables.
Insight Centre for Data Analytics RecTour 2017 Slide 19
22. References I
G. Dong and J. Pei. Sequence Data Mining (Advances in Database Systems).
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. ISBN 0387699368.
G. Kumar, H. Jerbi, C. Gurrin, and M. P. O’Mahony. Towards activity
recommendation from lifelogs. In Proceedings of the 16th International Conference
on Information Integration and Web-based Applications & Services, iiWAS ’14,
pages 87–96, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3001-5. doi:
10.1145/2684200.2684298. URL http://doi.acm.org/10.1145/2684200.2684298.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Personalised recommendations for modes
of transport: A sequence-based approach. The 5th ACM SIGKDD International
Workshop on Urban Computing (UrbComp 2016), 2016.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Towards the recommendation of
personalised activity sequences in the tourism domain. The 2nd ACM RecSys
Workshop on Recommenders in Tourism (RecTour 2017), 2017.
Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. SIGKDD
Explor. Newsl., 12(1):40–48, Nov. 2010. ISSN 1931-0145. doi:
10.1145/1882471.1882478. URL http://doi.acm.org/10.1145/1882471.1882478.
36. Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:32
Step 1
Align activities
11:25
( (
dactivity (T1, T2) =
r
i=1
wobj × cins +
s
j=1
wobj × cdel +
t
k=1
wobj × csub
37. Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:32
Step 1
Step 2
Align activities
Align features
11:25
(
(
(
(
d(T1, T2) = dactivity (T1, T2) +
n
i=1
dfeature(ao1
i , ao2
i ) .
38. Methodology
Learning Optimal Matching Unit range
• Wrapper attribute selection: C4.5 algorithm, greedy
backward search and area under ROC curve as evaluation
measure.
• Classification: pruned attribute vectors for each user fed into
a C4.5 induction algorithm to predict optimal matching unit
range.
Insight Centre for Data Analytics RecTour 2017 Slide 8
39. Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
40. Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
dist-travel
start-geo
start-time
41. Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
dist-travel
start-geo
dist-travel
start-geo
start-time
start-time
43. Timeline Attributes
Regularity Attributes: Sample Entropy
1. SampEnp
z : sample entropy of a feature sequence Sz for epoch
length p,
2. µSampEnp
T : mean sample entropy over all feature sequences
Sz , z = 1, 2, ..., m of timeline T for epoch length p,
3. σSampEnp
T : standard deviation of sample entropy over all
feature sequences Sz , z = 1, 2, ..., m of timeline T for epoch
length p.
Insight Centre for Data Analytics RecTour 2017 Slide 11
44. Timeline Attributes
Repetition Attributes: k-gram attributes
Previously used for sequence classification, biological sequence
analysis and text classification [Xing et al., 2010; Dong and Pei, 2007].
1. ηk
z : total number of distinct k-grams in feature sequence Sz ,
normalised by total number of k-grams occurring in Sz ,
2. µf k
z : mean frequency of occurrence of distinct k-grams in
feature sequence Sz , normalised by total number of k-grams
occurring in Sz ,
3. σf k
z : standard deviation of frequency of occurrence of distinct
k-grams in feature sequence Sz , normalised by length of Sz .
Insight Centre for Data Analytics RecTour 2017 Slide 12
45. Trend across matching-unit N
5
6
7
8
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444 6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666 8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
N−count
Mean%agreement
Group1[0]
Group2[1,4]
Group3[5+)
Figure: MRR versus matching unit for three user groups for first
sequence index.