SlideShare a Scribd company logo
1 of 45
Download to read offline
A Sequence-based and Context Modelling
Framework for Recommendation
Gunjan Kumar, Houssem Jerbi, and Michael P. O’Mahony
Insight Centre for Data Analytics
University College Dublin
Insight Student Conference 2017, Cork
Sep 8, 2017
Ubiquitous Sensors
Ubiquitous Sensors
Wearable Camera
Phone sensors & apps
Wearable Devices
Rich User Activity Data
• Sequential nature of user activities
• Activities have associated features/context, e.g.
location, time, weather, etc.
Rich User Activity Data
For Recommender Systems
Facilitates real time recommendations for a given user and context
(e.g. time, location, weather, etc.)
Our Work
A framework for sequence- and context-based recommendation:
• Recommending the next activity to perform.
- Lifelogging
- Transportation
• Recommending the next sequence of activities to perform.
- Tourism
Insight Centre for Data Analytics RecTour 2017 Slide 4
Related Work
Capturing
Sequence
Capturing
Context
Recommending
Sequences
Related Work
Capturing
Sequence
Capturing
Context
Recommending
Sequences
Hierarchical-graph-based models
[Li et al., 2008; Zheng et al., 2009; Yoon et al., 2010]
All-kth -order Markov models
[Bohnenberger and Jameson, 2001; Deshpande and
Karypis, 2004; Shani et al., 2005]
Tensor and matrix factorization models
[Zheng et al., 2010, 2012; Wang et al., 2010; Symeonidis et al., 2011;
Adomavicius et al., 2011; Braunhofer et al., 2013]
Music Playlists
[Baccigalupo and Plaza, 2006; Chen et al., 2012]
POI/Itinerary
[Tai et al., 2008; Yoon et al., 2012]
Activity Recommendation Framework
[Kumar et al., 2017]
Stochastic Modelling
[Sun et al., 2016]
Activity Recommendation Framework
[Kumar et al., 2014, 2016]
Our Contribution
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context.
• A ML approach to learn optimal subsequence length for
matching current and past subsequences of user activity
patterns.
• Application of the proposed approach in lifelogging [iiWAS 2014],
transportation [KDD UrbComp 2016] and tourism [RecSys RecTour
2017] domains.
Insight Centre for Data Analytics RecTour 2017 Slide 7
Framework Overview
Ranking
User Data TimelinesData Modelling
Timeline Matching
Append top ranked activity
to timeline/RS
Similarity Assesment
Recommended Sequence (RS)
Data Model
Activity Object
A single occurrence of an activity and consists of a set of features
describing the activity or the context.
Activity Timeline
A chronological sequence of n activity objects performed by the
user during a time interval δ:
T =< ao1, ao2, ..., aon >
Museum, 10:15, (53.343N,-6.317W), 11015
Theatre, 16:50, (53.340,-6.263W), 2008
Italian-Food, 12:31, (53.339N,-6.277W), 128
ao1 ao2 ao3
Recommendation Algorithm
• Extracting candidate timelines from user timelines.
• ML: learning the optimal matching unit based on regularity
(sample entropy) and repetition (k-grams) in user activity
patterns in timelines.
• Timeline matching approaches:
• N-count matching
• N-hours matching
• Daywise matching
• Similarity Assessment: Two-level Edit Distance
• Ranking
• Predicting the next sequence of activities
Insight Centre for Data Analytics RecTour 2017 Slide 10
Recommendation Algorithm: Overview
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
Score(aoj
rec) = 1 −
d(Tj, Tc) − min
Tp∈T
d(Tp, Tc)
max
Tp∈T
d(Tp, Tc) − min
Tp∈T
d(Tp, Tc)
Insight Centre for Data Analytics RecTour 2017 Slide 11
Datasets
1. Lifelogging: 5 users, 41.4k images
2. Geolife Modes of transport: 18 users, 334 trips
3. Gowalla Checkins: 9.2k users, 6 million checkins
• median # checkins per user per day: 10 - 270
• median # of distinct categories per user per day : 2 - 76
0
500
1,000
1,500
2,000
2,500
3,000
3,500
0 25 50 75 100
Median #checkins per user per day
Numofusers
0
1000
2000
3000
4000
5000
0 10 20 30 40 50 60 70 80
Median #distinct categories per user per day
Numofusers
Methodology
User Timeline
Time
00 hrs 00 hrs 00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Recommendation time
(RT)
• Leave-one-out evaluation: Each user’s complete timeline is
split into training and test timelines by time.
• Agreement @ k (k = 1, 2, 3): % of RTs for a user where the
first k categories in the recommended sequence and the actual
sequence are an exact match.
• Recommendation algorithms:
• N-count recommendation algorithm (SeqNCSeqRec)
• Bi-gram-based sequence recommender (BiGramSeqRec)
• Popularity-based sequence recommender (PopSeqRec)
Insight Centre for Data Analytics RecTour 2017 Slide 13
Recommendation Performance (Level 2)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 2 in the hierarchy.
Insight Centre for Data Analytics RecTour 2017 Slide 14
Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Mexican
Travel
Asian
Italian
Coffee Shop
South American/Latin
Fish & Chips
Starbucks
Dunkin Donuts... ...
Level 1 Level 2 Level 3
(7) (134) (151)
Insight Centre for Data Analytics RecTour 2017 Slide 15
Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Live Music
Travel
Casino
Art Museum
Ice Skating
Theatre
Zoo
...
Level 1 Level 2 Level 3
(7) (134) (151)
Insight Centre for Data Analytics RecTour 2017 Slide 16
Recommendation Performance (Level 1)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 1 in the hierarchy.
Insight Centre for Data Analytics RecTour 2017 Slide 17
Conclusion
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context [Kumar et al., 2014, 2016, 2017].
• Experiments demonstrate the efficacy of our approach in
recommending sequences given a diverse variety of activities
and user activity patterns.
Insight Centre for Data Analytics RecTour 2017 Slide 18
Future Work
• Consider alternative approaches to suggest sequences of
activities (for example, using RNNs).
• Introduce new evaluation metrics for evaluating sequence
recommendation.
• Investigate the recommendation of context (for example,
where, when, with whom etc.) associated with each of the
suggested sequence of activities.
• Consider socio-economic characteristics, user demographics,
and travel variables.
Insight Centre for Data Analytics RecTour 2017 Slide 19
Thank You
gunjan.kumar@insight-centre.org
@gunjanthesystem
References I
G. Dong and J. Pei. Sequence Data Mining (Advances in Database Systems).
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. ISBN 0387699368.
G. Kumar, H. Jerbi, C. Gurrin, and M. P. O’Mahony. Towards activity
recommendation from lifelogs. In Proceedings of the 16th International Conference
on Information Integration and Web-based Applications & Services, iiWAS ’14,
pages 87–96, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3001-5. doi:
10.1145/2684200.2684298. URL http://doi.acm.org/10.1145/2684200.2684298.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Personalised recommendations for modes
of transport: A sequence-based approach. The 5th ACM SIGKDD International
Workshop on Urban Computing (UrbComp 2016), 2016.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Towards the recommendation of
personalised activity sequences in the tourism domain. The 2nd ACM RecSys
Workshop on Recommenders in Tourism (RecTour 2017), 2017.
Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. SIGKDD
Explor. Newsl., 12(1):40–48, Nov. 2010. ISSN 1931-0145. doi:
10.1145/1882471.1882478. URL http://doi.acm.org/10.1145/1882471.1882478.
Some Extra Bits
Insight Centre for Data Analytics RecTour 2017 Slide 2
Recommendation Algorithm
User Timeline
Time
00 hrs 00 hrs 00 hrs
Insight Centre for Data Analytics RecTour 2017 Slide 3
Recommendation Algorithm
User Timeline
Time
00 hrs 00 hrs 00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Insight Centre for Data Analytics RecTour 2017 Slide 3
Recommendation Algorithm
00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Current Timeline
Insight Centre for Data Analytics RecTour 2017 Slide 3
Recommendation Algorithm
00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
Insight Centre for Data Analytics RecTour 2017 Slide 3
Recommendation Algorithm
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
... N-count matching (N = 4)
124 3
124 3
124 3
124 3
[Kumar et al., 2016]
Matching unit
Determines the length of the subsequences to be compared.
Insight Centre for Data Analytics RecTour 2017 Slide 3
Similarity Assessment
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
Two-level Edit Distance
[Kumar et al., 2014]
Insight Centre for Data Analytics RecTour 2017 Slide 4
Ranking
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
Score(aoj
rec) = 1 −
d(Tj, Tc) − min
Tp∈T
d(Tp, Tc)
max
Tp∈T
d(Tp, Tc) − min
Tp∈T
d(Tp, Tc)
Insight Centre for Data Analytics RecTour 2017 Slide 5
Recommending
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
aorec1
Insight Centre for Data Analytics RecTour 2017 Slide 6
Recommending
Target Activity Seq
< aot2 , aot3>
Current
Activity (aoc)
? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
aorec1
Insight Centre for Data Analytics RecTour 2017 Slide 6
Recommending
Target Activity Seq
< aot3>
Current
Activity (aoc)
? ?
Current Timeline
124 3
aorec2
aorec1
Recommendation Algorithm
( SeqNCSeqRec)
Insight Centre for Data Analytics RecTour 2017 Slide 6
Recommending
?
Recommended Sequence
aorec2
Recommendation Algorithm
( SeqNCSeqRec)
aorec1
aorec3
Insight Centre for Data Analytics RecTour 2017 Slide 6
Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:3211:25
Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:32
Step 1
Align activities
11:25
( (
dactivity (T1, T2) =
r
i=1
wobj × cins +
s
j=1
wobj × cdel +
t
k=1
wobj × csub
Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:32
Step 1
Step 2
Align activities
Align features
11:25
(
(
(
(
d(T1, T2) = dactivity (T1, T2) +
n
i=1
dfeature(ao1
i , ao2
i ) .
Methodology
Learning Optimal Matching Unit range
• Wrapper attribute selection: C4.5 algorithm, greedy
backward search and area under ROC curve as evaluation
measure.
• Classification: pruned attribute vectors for each user fed into
a C4.5 induction algorithm to predict optimal matching unit
range.
Insight Centre for Data Analytics RecTour 2017 Slide 8
Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
dist-travel
start-geo
start-time
Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
dist-travel
start-geo
dist-travel
start-geo
start-time
start-time
Timeline Attributes
Timeline Attributes
Regularity
Repetition
Timeline Attributes
Regularity Attributes: Sample Entropy
1. SampEnp
z : sample entropy of a feature sequence Sz for epoch
length p,
2. µSampEnp
T : mean sample entropy over all feature sequences
Sz , z = 1, 2, ..., m of timeline T for epoch length p,
3. σSampEnp
T : standard deviation of sample entropy over all
feature sequences Sz , z = 1, 2, ..., m of timeline T for epoch
length p.
Insight Centre for Data Analytics RecTour 2017 Slide 11
Timeline Attributes
Repetition Attributes: k-gram attributes
Previously used for sequence classification, biological sequence
analysis and text classification [Xing et al., 2010; Dong and Pei, 2007].
1. ηk
z : total number of distinct k-grams in feature sequence Sz ,
normalised by total number of k-grams occurring in Sz ,
2. µf k
z : mean frequency of occurrence of distinct k-grams in
feature sequence Sz , normalised by total number of k-grams
occurring in Sz ,
3. σf k
z : standard deviation of frequency of occurrence of distinct
k-grams in feature sequence Sz , normalised by length of Sz .
Insight Centre for Data Analytics RecTour 2017 Slide 12
Trend across matching-unit N
5
6
7
8
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444 6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666 8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
N−count
Mean%agreement
Group1[0]
Group2[1,4]
Group3[5+)
Figure: MRR versus matching unit for three user groups for first
sequence index.

More Related Content

Similar to Gunjan insight student conference v2

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsThe University of Edinburgh
 
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksParts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksRishabh Mehrotra
 
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...IJSRD
 
SIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMSSIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMScscpconf
 
Size estimation of olap systems
Size estimation of olap systemsSize estimation of olap systems
Size estimation of olap systemscsandit
 
cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02PRIYANKA MEHTA
 
Markov chain for the recommendation of
Markov chain for the recommendation ofMarkov chain for the recommendation of
Markov chain for the recommendation ofIJCSEA Journal
 
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...IJCSEA Journal
 
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...IJCSEA Journal
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Data Con LA
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...ResearchSpace
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
 
Cloud workload analysis and simulation
Cloud workload analysis and simulationCloud workload analysis and simulation
Cloud workload analysis and simulationPrabhakar Ganesamurthy
 
IRJET- Comparative Analysis between Critical Path Method and Monte Carlo S...
IRJET- 	  Comparative Analysis between Critical Path Method and Monte Carlo S...IRJET- 	  Comparative Analysis between Critical Path Method and Monte Carlo S...
IRJET- Comparative Analysis between Critical Path Method and Monte Carlo S...IRJET Journal
 
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...IRJET Journal
 

Similar to Gunjan insight student conference v2 (20)

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender Systems
 
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksParts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks
 
sigir16
sigir16sigir16
sigir16
 
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...
 
SIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMSSIZE ESTIMATION OF OLAP SYSTEMS
SIZE ESTIMATION OF OLAP SYSTEMS
 
Size estimation of olap systems
Size estimation of olap systemsSize estimation of olap systems
Size estimation of olap systems
 
cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02cloudworkloadanalysisandsimulation-140521153543-phpapp02
cloudworkloadanalysisandsimulation-140521153543-phpapp02
 
Markov chain for the recommendation of
Markov chain for the recommendation ofMarkov chain for the recommendation of
Markov chain for the recommendation of
 
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...
 
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
Cloud workload analysis and simulation
Cloud workload analysis and simulationCloud workload analysis and simulation
Cloud workload analysis and simulation
 
IRJET- Comparative Analysis between Critical Path Method and Monte Carlo S...
IRJET- 	  Comparative Analysis between Critical Path Method and Monte Carlo S...IRJET- 	  Comparative Analysis between Critical Path Method and Monte Carlo S...
IRJET- Comparative Analysis between Critical Path Method and Monte Carlo S...
 
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Gunjan insight student conference v2

  • 1. A Sequence-based and Context Modelling Framework for Recommendation Gunjan Kumar, Houssem Jerbi, and Michael P. O’Mahony Insight Centre for Data Analytics University College Dublin Insight Student Conference 2017, Cork Sep 8, 2017
  • 3. Ubiquitous Sensors Wearable Camera Phone sensors & apps Wearable Devices
  • 4. Rich User Activity Data • Sequential nature of user activities • Activities have associated features/context, e.g. location, time, weather, etc.
  • 5. Rich User Activity Data For Recommender Systems Facilitates real time recommendations for a given user and context (e.g. time, location, weather, etc.) Our Work A framework for sequence- and context-based recommendation: • Recommending the next activity to perform. - Lifelogging - Transportation • Recommending the next sequence of activities to perform. - Tourism Insight Centre for Data Analytics RecTour 2017 Slide 4
  • 7. Related Work Capturing Sequence Capturing Context Recommending Sequences Hierarchical-graph-based models [Li et al., 2008; Zheng et al., 2009; Yoon et al., 2010] All-kth -order Markov models [Bohnenberger and Jameson, 2001; Deshpande and Karypis, 2004; Shani et al., 2005] Tensor and matrix factorization models [Zheng et al., 2010, 2012; Wang et al., 2010; Symeonidis et al., 2011; Adomavicius et al., 2011; Braunhofer et al., 2013] Music Playlists [Baccigalupo and Plaza, 2006; Chen et al., 2012] POI/Itinerary [Tai et al., 2008; Yoon et al., 2012] Activity Recommendation Framework [Kumar et al., 2017] Stochastic Modelling [Sun et al., 2016] Activity Recommendation Framework [Kumar et al., 2014, 2016]
  • 8. Our Contribution • A generic activity recommendation framework to recommend the next sequence of activities to users based on past activity patterns and context. • A ML approach to learn optimal subsequence length for matching current and past subsequences of user activity patterns. • Application of the proposed approach in lifelogging [iiWAS 2014], transportation [KDD UrbComp 2016] and tourism [RecSys RecTour 2017] domains. Insight Centre for Data Analytics RecTour 2017 Slide 7
  • 9. Framework Overview Ranking User Data TimelinesData Modelling Timeline Matching Append top ranked activity to timeline/RS Similarity Assesment Recommended Sequence (RS)
  • 10. Data Model Activity Object A single occurrence of an activity and consists of a set of features describing the activity or the context. Activity Timeline A chronological sequence of n activity objects performed by the user during a time interval δ: T =< ao1, ao2, ..., aon > Museum, 10:15, (53.343N,-6.317W), 11015 Theatre, 16:50, (53.340,-6.263W), 2008 Italian-Food, 12:31, (53.339N,-6.277W), 128 ao1 ao2 ao3
  • 11. Recommendation Algorithm • Extracting candidate timelines from user timelines. • ML: learning the optimal matching unit based on regularity (sample entropy) and repetition (k-grams) in user activity patterns in timelines. • Timeline matching approaches: • N-count matching • N-hours matching • Daywise matching • Similarity Assessment: Two-level Edit Distance • Ranking • Predicting the next sequence of activities Insight Centre for Data Analytics RecTour 2017 Slide 10
  • 12. Recommendation Algorithm: Overview Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... 124 3 124 3 124 3 124 3 j Ranked 1 Score(aoj rec) = 1 − d(Tj, Tc) − min Tp∈T d(Tp, Tc) max Tp∈T d(Tp, Tc) − min Tp∈T d(Tp, Tc) Insight Centre for Data Analytics RecTour 2017 Slide 11
  • 13. Datasets 1. Lifelogging: 5 users, 41.4k images 2. Geolife Modes of transport: 18 users, 334 trips 3. Gowalla Checkins: 9.2k users, 6 million checkins • median # checkins per user per day: 10 - 270 • median # of distinct categories per user per day : 2 - 76 0 500 1,000 1,500 2,000 2,500 3,000 3,500 0 25 50 75 100 Median #checkins per user per day Numofusers 0 1000 2000 3000 4000 5000 0 10 20 30 40 50 60 70 80 Median #distinct categories per user per day Numofusers
  • 14. Methodology User Timeline Time 00 hrs 00 hrs 00 hrs Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Recommendation time (RT) • Leave-one-out evaluation: Each user’s complete timeline is split into training and test timelines by time. • Agreement @ k (k = 1, 2, 3): % of RTs for a user where the first k categories in the recommended sequence and the actual sequence are an exact match. • Recommendation algorithms: • N-count recommendation algorithm (SeqNCSeqRec) • Bi-gram-based sequence recommender (BiGramSeqRec) • Popularity-based sequence recommender (PopSeqRec) Insight Centre for Data Analytics RecTour 2017 Slide 13
  • 15. Recommendation Performance (Level 2) Figure: Mean percentage agreements for recommended sequences for SeqNCSeqRec (top 10% neighbours) and baseline algorithms using timelines constructed from categories at level 2 in the hierarchy. Insight Centre for Data Analytics RecTour 2017 Slide 14
  • 16. Recommendation Performance • Some level 2 activities are semantically closer than others. • ‘true’ performance between those at level 2 and level 1. Food Outdoors Shopping Nightlife Entertainment Community Mexican Travel Asian Italian Coffee Shop South American/Latin Fish & Chips Starbucks Dunkin Donuts... ... Level 1 Level 2 Level 3 (7) (134) (151) Insight Centre for Data Analytics RecTour 2017 Slide 15
  • 17. Recommendation Performance • Some level 2 activities are semantically closer than others. • ‘true’ performance between those at level 2 and level 1. Food Outdoors Shopping Nightlife Entertainment Community Live Music Travel Casino Art Museum Ice Skating Theatre Zoo ... Level 1 Level 2 Level 3 (7) (134) (151) Insight Centre for Data Analytics RecTour 2017 Slide 16
  • 18. Recommendation Performance (Level 1) Figure: Mean percentage agreements for recommended sequences for SeqNCSeqRec (top 10% neighbours) and baseline algorithms using timelines constructed from categories at level 1 in the hierarchy. Insight Centre for Data Analytics RecTour 2017 Slide 17
  • 19. Conclusion • A generic activity recommendation framework to recommend the next sequence of activities to users based on past activity patterns and context [Kumar et al., 2014, 2016, 2017]. • Experiments demonstrate the efficacy of our approach in recommending sequences given a diverse variety of activities and user activity patterns. Insight Centre for Data Analytics RecTour 2017 Slide 18
  • 20. Future Work • Consider alternative approaches to suggest sequences of activities (for example, using RNNs). • Introduce new evaluation metrics for evaluating sequence recommendation. • Investigate the recommendation of context (for example, where, when, with whom etc.) associated with each of the suggested sequence of activities. • Consider socio-economic characteristics, user demographics, and travel variables. Insight Centre for Data Analytics RecTour 2017 Slide 19
  • 22. References I G. Dong and J. Pei. Sequence Data Mining (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. ISBN 0387699368. G. Kumar, H. Jerbi, C. Gurrin, and M. P. O’Mahony. Towards activity recommendation from lifelogs. In Proceedings of the 16th International Conference on Information Integration and Web-based Applications & Services, iiWAS ’14, pages 87–96, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3001-5. doi: 10.1145/2684200.2684298. URL http://doi.acm.org/10.1145/2684200.2684298. G. Kumar, H. Jerbi, and M. P. O’Mahony. Personalised recommendations for modes of transport: A sequence-based approach. The 5th ACM SIGKDD International Workshop on Urban Computing (UrbComp 2016), 2016. G. Kumar, H. Jerbi, and M. P. O’Mahony. Towards the recommendation of personalised activity sequences in the tourism domain. The 2nd ACM RecSys Workshop on Recommenders in Tourism (RecTour 2017), 2017. Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. SIGKDD Explor. Newsl., 12(1):40–48, Nov. 2010. ISSN 1931-0145. doi: 10.1145/1882471.1882478. URL http://doi.acm.org/10.1145/1882471.1882478.
  • 23. Some Extra Bits Insight Centre for Data Analytics RecTour 2017 Slide 2
  • 24. Recommendation Algorithm User Timeline Time 00 hrs 00 hrs 00 hrs Insight Centre for Data Analytics RecTour 2017 Slide 3
  • 25. Recommendation Algorithm User Timeline Time 00 hrs 00 hrs 00 hrs Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Insight Centre for Data Analytics RecTour 2017 Slide 3
  • 26. Recommendation Algorithm 00 hrs Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Current Timeline Insight Centre for Data Analytics RecTour 2017 Slide 3
  • 27. Recommendation Algorithm 00 hrs Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... Insight Centre for Data Analytics RecTour 2017 Slide 3
  • 28. Recommendation Algorithm Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... N-count matching (N = 4) 124 3 124 3 124 3 124 3 [Kumar et al., 2016] Matching unit Determines the length of the subsequences to be compared. Insight Centre for Data Analytics RecTour 2017 Slide 3
  • 29. Similarity Assessment Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... 124 3 124 3 124 3 124 3 Two-level Edit Distance [Kumar et al., 2014] Insight Centre for Data Analytics RecTour 2017 Slide 4
  • 30. Ranking Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... 124 3 124 3 124 3 124 3 j Ranked 1 Score(aoj rec) = 1 − d(Tj, Tc) − min Tp∈T d(Tp, Tc) max Tp∈T d(Tp, Tc) − min Tp∈T d(Tp, Tc) Insight Centre for Data Analytics RecTour 2017 Slide 5
  • 31. Recommending Target Activity Seq < aot1 , aot2 , aot3 > Current Activity (aoc) ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... 124 3 124 3 124 3 124 3 j Ranked 1 aorec1 Insight Centre for Data Analytics RecTour 2017 Slide 6
  • 32. Recommending Target Activity Seq < aot2 , aot3> Current Activity (aoc) ? ? Current Timeline Candidate Timeline #2 Candidate Timeline #1 Candidate Timeline #j ... 124 3 124 3 124 3 124 3 j Ranked 1 aorec1 Insight Centre for Data Analytics RecTour 2017 Slide 6
  • 33. Recommending Target Activity Seq < aot3> Current Activity (aoc) ? ? Current Timeline 124 3 aorec2 aorec1 Recommendation Algorithm ( SeqNCSeqRec) Insight Centre for Data Analytics RecTour 2017 Slide 6
  • 34. Recommending ? Recommended Sequence aorec2 Recommendation Algorithm ( SeqNCSeqRec) aorec1 aorec3 Insight Centre for Data Analytics RecTour 2017 Slide 6
  • 35. Two-level Distance • Inspired by edit distance • Adapted for sequence of objects 09:00 10:05 10:35 10:50 11:3211:25
  • 36. Two-level Distance • Inspired by edit distance • Adapted for sequence of objects 09:00 10:05 10:35 10:50 11:32 Step 1 Align activities 11:25 ( ( dactivity (T1, T2) = r i=1 wobj × cins + s j=1 wobj × cdel + t k=1 wobj × csub
  • 37. Two-level Distance • Inspired by edit distance • Adapted for sequence of objects 09:00 10:05 10:35 10:50 11:32 Step 1 Step 2 Align activities Align features 11:25 ( ( ( ( d(T1, T2) = dactivity (T1, T2) + n i=1 dfeature(ao1 i , ao2 i ) .
  • 38. Methodology Learning Optimal Matching Unit range • Wrapper attribute selection: C4.5 algorithm, greedy backward search and area under ROC curve as evaluation measure. • Classification: pruned attribute vectors for each user fed into a C4.5 induction algorithm to predict optimal matching unit range. Insight Centre for Data Analytics RecTour 2017 Slide 8
  • 39. Attribute Extraction: Timeline Decomposition • Each user represented by an attribute vector. • For attribute extraction: timelines are decomposed into features-sequence : User Timeline Time 00 hrs 00 hrs 00 hrs
  • 40. Attribute Extraction: Timeline Decomposition • Each user represented by an attribute vector. • For attribute extraction: timelines are decomposed into features-sequence : User Timeline Time 00 hrs 00 hrs 00 hrs dist-travel start-geo start-time
  • 41. Attribute Extraction: Timeline Decomposition • Each user represented by an attribute vector. • For attribute extraction: timelines are decomposed into features-sequence : User Timeline Time 00 hrs 00 hrs 00 hrs dist-travel start-geo dist-travel start-geo start-time start-time
  • 43. Timeline Attributes Regularity Attributes: Sample Entropy 1. SampEnp z : sample entropy of a feature sequence Sz for epoch length p, 2. µSampEnp T : mean sample entropy over all feature sequences Sz , z = 1, 2, ..., m of timeline T for epoch length p, 3. σSampEnp T : standard deviation of sample entropy over all feature sequences Sz , z = 1, 2, ..., m of timeline T for epoch length p. Insight Centre for Data Analytics RecTour 2017 Slide 11
  • 44. Timeline Attributes Repetition Attributes: k-gram attributes Previously used for sequence classification, biological sequence analysis and text classification [Xing et al., 2010; Dong and Pei, 2007]. 1. ηk z : total number of distinct k-grams in feature sequence Sz , normalised by total number of k-grams occurring in Sz , 2. µf k z : mean frequency of occurrence of distinct k-grams in feature sequence Sz , normalised by total number of k-grams occurring in Sz , 3. σf k z : standard deviation of frequency of occurrence of distinct k-grams in feature sequence Sz , normalised by length of Sz . Insight Centre for Data Analytics RecTour 2017 Slide 12
  • 45. Trend across matching-unit N 5 6 7 8 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444 6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666 8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888 N−count Mean%agreement Group1[0] Group2[1,4] Group3[5+) Figure: MRR versus matching unit for three user groups for first sequence index.