How 
Big 
Data 
is 
Changing 
User 
Engagement 
Mounia Lalmas 
Yahoo Labs London 
mounia@acm.org 
Strata + HW Barcelona – 19 November 2014
This talk 
§ User engagement 
› Definitions 
› Characteristics 
› Metrics 
§ Case studies: Absence time 
› Within sessions 
› Across sessions
User engagement
What is user engagement? 
“User engagement is a quality of the user 
experience that emphasizes the phenomena 
associated with wanting to use a technological 
resource longer and frequently” (Attfield et al, 2011) 
self-report: happy, sad, 
enjoyment, … 
analytics: click, upload, 
read, comment, share … 
physiology: gaze, body heat, 
mouse movement, … 
emotional, cognitive and behavioural connection 
that exists, at any point in time and over time, between 
a user and a technological resource 
4 
focus of this talk … Big Data
Why is it important to engage users? 
§ In today’s wired world, users have enhanced expectations 
about their interactions with technology 
… resulting in increased competition amongst the 
purveyors and designers of interactive systems. 
§ In addition to utilitarian factors, such as usability, we must 
consider the hedonic and experiential factors of interacting 
with technology, such as fun, fulfillment, play, and user 
engagement. 
(O’Brien, Lalmas & Yom-Tov, 2014)
Online sites differ with respect to 
their engagement pattern 
Games 
Users spend 
much time per 
visit 
Search 
Users come 
frequently and 
do not stay long 
Social media 
Users come 
frequently and 
stay long 
Niche 
Users come on 
average once 
a week e.g. weekly 
post 
News 
Users come 
periodically, 
e.g. morning and 
evening 
Service 
Users visit site, 
when needed, 
e.g. to renew 
subscription 
(Lehmann etal, 2012)
Why is it important interpret user 
engagement measurement well? 
CTR 
new version
Characteristics of user engagement 
Focused attention 
Positive Affect 
Aesthetics 
Endurability 
Novelty 
Richness and control 
Reputation, trust and 
expectation 
Motivation, interests, 
incentives, and benefits 
(Attfield etal, 2011)
Large scale measurements – analytics 
• within-session engagement measures success in attracting user to the 
site. 
• across-session engagement measured by observing user long-term 
value. 
within-session measures across-session measures 
• Dwell time 
• Session duration 
• Bounce rate 
• Play time (video) 
• Click through rate (CTR) 
• Number of pages viewed 
• Conversion rate 
• Number of UCG (comments). 
• Time between visits 
• … 
• Fraction of return visits 
• Total view time per month (video) 
• Lifetime value (number of actions) 
• Number of sessions per unit of time 
• Total usage time per unit of time 
• Number of friends on site (social 
networks) 
• Number of UCG (comments) 
• Time between visits 
• … 
loyalty 
popularity 
activity
Two case studies: Absence time 
news facebook news news mail news 
absence time 
within sessions across sessions
Case study I 
Absence time within 
sessions
The context – Online multitasking 
Browsing the “old way” 
1min 2min 1min 3min 
facebook news news news news mail 
Dwell time during a visit on news site: 7min on average 
news site 
Browsing “nowadays” 
1min 2min 1min 3min 
news facebook news news mail news 
Dwell time during a visit on news site: 2.33min on average (1min | 3min | 3min)
The dataset … Big Data 
› July 2012 
› 2.5M users 
› 785M page views 
› Categorization of the most 
frequent accessed sites 
• 11 categories (e.g. news), 33 
subcategories (e.g. news finance, 
news society) 
• 760 sites from 70 countries/regions 
short sessions: average 3.01 distinct sites visited with revisitation rate 10% 
long sessions: average 9.62 distinct sites visited with revisitation rate 22%
Time spent at each revisit 
0.33 
0.28 
0.23 
mail sites social media sites news (finance) sites news (tech) sites 
p-value = 0.09 
m = -0.01 
p-value = 0.07 
m = -0.02 
p-value = 0.79 
m = 0.00 
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 
100% 
50% 
0% 
decreasing attention increasing attention constant attention complex attention 
Teleportation Hyperlinking Backpaging 
Proportion of total 
dwell time on site 
Percentage 
of navigation type 
ith visit on site ith visit on site ith visit on site ith visit on site 
Four patterns: decreasing, increasing, constant and complex 
Backpaging increasingly used at each revisit 
Metrics: 
1. AttShift: 1 (-1) when time spent at each revisit increases (decreases); 
0 when no pattern 
2. AttRange: 0 when time spent at each revisit is constant
Time spent between each visit 
• 50% of sites are revisited after 
less than 1min 
interruption of a task? 
• There are revisits after a long 
break 
returning to a site to perform a 
new task? 
1.00 
0.75 
0.50 
0.25 
0.00 
news (finance) 
news (tech) 
social media 
mail 
10ï2 10ï1 100 101 102 
Cumulative probability 
Absence time [min] 
2nd 
1st 3rd 
… visit 
absence time 
Metric: 
1. CumAct: increases with time spent between visits
Models of online multitasking 
§ Clustering of sites using multitasking and standard engagement 
metrics: 
› CumAct, AttShift, AttRange 
› Visit, Session 
§ Five models 
C4: 74 sites 
0.75 
0.25 
-0.25 
-0.75 
C5: 166 sites 
0.75 
0.25 
-0.25 
-0.75 
C3: 156 sites 
0.75 
0.25 
-0.25 
-0.75 
C2: 108 sites 
0.75 
0.25 
-0.25 
-0.75 
C1: 172 sites 
0.75 
0.25 
-0.25 
-0.75 
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
One task during a session 
C2: 108 sites 
auctions, front page, 
shopping, dating 
0.75 
0.25 
-0.25 
-0.75 
C1: 172 sites 
mail, maps, news, 
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min] 
news (soc.) 
0.75 
0.25 
-0.25 
-0.75 
§ High dwell time per visit and during 
entire session 
§ Users return to continue a task 
(short absence time) 
§ C1: attention is shifting to another 
site 
§ C2: attention is shifting slowly 
towards the site
Several tasks during a session 
C4: 74 sites 
front page, search, 
download 
C3: 156 sites 
auctions, search, 
front page, shopping 
0.75 
0.25 
-0.25 
-0.75 
0.75 
0.25 
-0.25 
-0.75 
§ Users perform several tasks during 
a session 
§ No simple activity pattern 
§ C3: Dwell time per visit is low, but 
dwell time per session is high 
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
Sites with low activity 
C5: 166 sites 
service, download, 
blogging, news (soc.) 
0.75 
0.25 
-0.25 
-0.75 
§ Users do not spend a lot of time on 
these sites 
§ Time between visits is short 
§ Attention is shifting towards the site 
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
Browsing behavior can differ between 
sites of the same category 
C2: 108 sites 
auctions, front page, 
shopping, dating 
0.75 
0.25 
-0.25 
-0.75 
C3: 156 sites 
auctions, search, 
front page, shopping 
0.75 
0.25 
-0.25 
-0.75 
§ C2: users visit site once to perform 
their task 
§ C3: users visit site several times to 
perform task(s) 
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
The message – absence time within 
sessions 
Online multitasking happens 
How it affects a site depends 
on the site 
Towards a taxonomy of online multitasking 
One main continuous task versus user loyalty 
New metrics that account for time spent between visits and at each visit 
Applications: story-focused reading & network of sites 
(Lehmann etal, 2013)
Case study II 
Absence time across 
sessions
The context – search experience
The context – search experience
Absence time and survival analysis 
Users (%) who read story 2 but did not come back after 10 hours 
story 1 
story 2 
story 3 
story 4 
story 5 
story 6 
story 7 
story 8 
story 9 
SURVIVE 
0 5 10 15 20 
0.0 0.2 0.4 0.6 0.8 1.0 
Users (%) who did come back 
DIE 
DIE = RETURN TO SITE èSHORT ABSENCE TIME
The dataset … Big Data 
Ranking function on Yahoo Answer Japan 
Two-weeks click data on Yahoo Answer Japan: search 
One millions users 
Six ranking functions 
30-minute session boundary
Examples of search session metrics 
§ Number of clicks 
§ Click at given position 
§ Time to first click 
§ Skipping 
§ Abandonment rate 
§ Number of query reformulations
Absence time and number of clicks on 
search result page 
survival analysis: high hazard rate (die quickly) = short absence 
5 clicks 
control = no click 
3 clicks
Absence time – search experience 
search session metrics absence time 
1. No click means a bad user experience 
2. Clicking between 3-5 results leads to same user experience 
3. Clicking on more than 5 results reflects poorer user experience; users 
cannot find what they are looking for 
4. Clicking lower in the ranking (2nd, 3rd) suggests more careful choice from 
the user (compared to 1st) 
5. Clicking at bottom is a sign of low quality overall ranking 
6. Users finding their answers quickly (click sooner) return sooner to the 
search application 
7. Returning to the same search result page is a worse user experience than 
reformulating the query
The message – absence time across 
sessions 
short absence is 
a sign of loyalty 
important indication 
of user satisfaction 
with search portal 
(Dupret & Lalmas, 2013) 
Easy to implement and 
interpret 
Can compare many things in 
one go 
No need to estimate 
baselines 
But need lots of data to 
account for noise 
Applications: direct display & 
churn rate & advertising
Some final words
§ “If you cannot measure it, you cannot 
improve it” – William Thomson (Lord Kelvin) 
§ “You cannot control what you cannot 
measure” – DeMarco 
§ “The way you measure is more important 
than what you measure” – Art Gust

How Big Data is Changing User Engagement

  • 1.
    How Big Data is Changing User Engagement Mounia Lalmas Yahoo Labs London mounia@acm.org Strata + HW Barcelona – 19 November 2014
  • 2.
    This talk §User engagement › Definitions › Characteristics › Metrics § Case studies: Absence time › Within sessions › Across sessions
  • 3.
  • 4.
    What is userengagement? “User engagement is a quality of the user experience that emphasizes the phenomena associated with wanting to use a technological resource longer and frequently” (Attfield et al, 2011) self-report: happy, sad, enjoyment, … analytics: click, upload, read, comment, share … physiology: gaze, body heat, mouse movement, … emotional, cognitive and behavioural connection that exists, at any point in time and over time, between a user and a technological resource 4 focus of this talk … Big Data
  • 5.
    Why is itimportant to engage users? § In today’s wired world, users have enhanced expectations about their interactions with technology … resulting in increased competition amongst the purveyors and designers of interactive systems. § In addition to utilitarian factors, such as usability, we must consider the hedonic and experiential factors of interacting with technology, such as fun, fulfillment, play, and user engagement. (O’Brien, Lalmas & Yom-Tov, 2014)
  • 6.
    Online sites differwith respect to their engagement pattern Games Users spend much time per visit Search Users come frequently and do not stay long Social media Users come frequently and stay long Niche Users come on average once a week e.g. weekly post News Users come periodically, e.g. morning and evening Service Users visit site, when needed, e.g. to renew subscription (Lehmann etal, 2012)
  • 7.
    Why is itimportant interpret user engagement measurement well? CTR new version
  • 8.
    Characteristics of userengagement Focused attention Positive Affect Aesthetics Endurability Novelty Richness and control Reputation, trust and expectation Motivation, interests, incentives, and benefits (Attfield etal, 2011)
  • 9.
    Large scale measurements– analytics • within-session engagement measures success in attracting user to the site. • across-session engagement measured by observing user long-term value. within-session measures across-session measures • Dwell time • Session duration • Bounce rate • Play time (video) • Click through rate (CTR) • Number of pages viewed • Conversion rate • Number of UCG (comments). • Time between visits • … • Fraction of return visits • Total view time per month (video) • Lifetime value (number of actions) • Number of sessions per unit of time • Total usage time per unit of time • Number of friends on site (social networks) • Number of UCG (comments) • Time between visits • … loyalty popularity activity
  • 10.
    Two case studies:Absence time news facebook news news mail news absence time within sessions across sessions
  • 11.
    Case study I Absence time within sessions
  • 12.
    The context –Online multitasking Browsing the “old way” 1min 2min 1min 3min facebook news news news news mail Dwell time during a visit on news site: 7min on average news site Browsing “nowadays” 1min 2min 1min 3min news facebook news news mail news Dwell time during a visit on news site: 2.33min on average (1min | 3min | 3min)
  • 13.
    The dataset …Big Data › July 2012 › 2.5M users › 785M page views › Categorization of the most frequent accessed sites • 11 categories (e.g. news), 33 subcategories (e.g. news finance, news society) • 760 sites from 70 countries/regions short sessions: average 3.01 distinct sites visited with revisitation rate 10% long sessions: average 9.62 distinct sites visited with revisitation rate 22%
  • 14.
    Time spent ateach revisit 0.33 0.28 0.23 mail sites social media sites news (finance) sites news (tech) sites p-value = 0.09 m = -0.01 p-value = 0.07 m = -0.02 p-value = 0.79 m = 0.00 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 100% 50% 0% decreasing attention increasing attention constant attention complex attention Teleportation Hyperlinking Backpaging Proportion of total dwell time on site Percentage of navigation type ith visit on site ith visit on site ith visit on site ith visit on site Four patterns: decreasing, increasing, constant and complex Backpaging increasingly used at each revisit Metrics: 1. AttShift: 1 (-1) when time spent at each revisit increases (decreases); 0 when no pattern 2. AttRange: 0 when time spent at each revisit is constant
  • 15.
    Time spent betweeneach visit • 50% of sites are revisited after less than 1min interruption of a task? • There are revisits after a long break returning to a site to perform a new task? 1.00 0.75 0.50 0.25 0.00 news (finance) news (tech) social media mail 10ï2 10ï1 100 101 102 Cumulative probability Absence time [min] 2nd 1st 3rd … visit absence time Metric: 1. CumAct: increases with time spent between visits
  • 16.
    Models of onlinemultitasking § Clustering of sites using multitasking and standard engagement metrics: › CumAct, AttShift, AttRange › Visit, Session § Five models C4: 74 sites 0.75 0.25 -0.25 -0.75 C5: 166 sites 0.75 0.25 -0.25 -0.75 C3: 156 sites 0.75 0.25 -0.25 -0.75 C2: 108 sites 0.75 0.25 -0.25 -0.75 C1: 172 sites 0.75 0.25 -0.25 -0.75 Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
  • 17.
    One task duringa session C2: 108 sites auctions, front page, shopping, dating 0.75 0.25 -0.25 -0.75 C1: 172 sites mail, maps, news, Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min] news (soc.) 0.75 0.25 -0.25 -0.75 § High dwell time per visit and during entire session § Users return to continue a task (short absence time) § C1: attention is shifting to another site § C2: attention is shifting slowly towards the site
  • 18.
    Several tasks duringa session C4: 74 sites front page, search, download C3: 156 sites auctions, search, front page, shopping 0.75 0.25 -0.25 -0.75 0.75 0.25 -0.25 -0.75 § Users perform several tasks during a session § No simple activity pattern § C3: Dwell time per visit is low, but dwell time per session is high Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
  • 19.
    Sites with lowactivity C5: 166 sites service, download, blogging, news (soc.) 0.75 0.25 -0.25 -0.75 § Users do not spend a lot of time on these sites § Time between visits is short § Attention is shifting towards the site Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
  • 20.
    Browsing behavior candiffer between sites of the same category C2: 108 sites auctions, front page, shopping, dating 0.75 0.25 -0.25 -0.75 C3: 156 sites auctions, search, front page, shopping 0.75 0.25 -0.25 -0.75 § C2: users visit site once to perform their task § C3: users visit site several times to perform task(s) Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
  • 21.
    The message –absence time within sessions Online multitasking happens How it affects a site depends on the site Towards a taxonomy of online multitasking One main continuous task versus user loyalty New metrics that account for time spent between visits and at each visit Applications: story-focused reading & network of sites (Lehmann etal, 2013)
  • 22.
    Case study II Absence time across sessions
  • 23.
    The context –search experience
  • 24.
    The context –search experience
  • 25.
    Absence time andsurvival analysis Users (%) who read story 2 but did not come back after 10 hours story 1 story 2 story 3 story 4 story 5 story 6 story 7 story 8 story 9 SURVIVE 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Users (%) who did come back DIE DIE = RETURN TO SITE èSHORT ABSENCE TIME
  • 26.
    The dataset …Big Data Ranking function on Yahoo Answer Japan Two-weeks click data on Yahoo Answer Japan: search One millions users Six ranking functions 30-minute session boundary
  • 27.
    Examples of searchsession metrics § Number of clicks § Click at given position § Time to first click § Skipping § Abandonment rate § Number of query reformulations
  • 28.
    Absence time andnumber of clicks on search result page survival analysis: high hazard rate (die quickly) = short absence 5 clicks control = no click 3 clicks
  • 29.
    Absence time –search experience search session metrics absence time 1. No click means a bad user experience 2. Clicking between 3-5 results leads to same user experience 3. Clicking on more than 5 results reflects poorer user experience; users cannot find what they are looking for 4. Clicking lower in the ranking (2nd, 3rd) suggests more careful choice from the user (compared to 1st) 5. Clicking at bottom is a sign of low quality overall ranking 6. Users finding their answers quickly (click sooner) return sooner to the search application 7. Returning to the same search result page is a worse user experience than reformulating the query
  • 30.
    The message –absence time across sessions short absence is a sign of loyalty important indication of user satisfaction with search portal (Dupret & Lalmas, 2013) Easy to implement and interpret Can compare many things in one go No need to estimate baselines But need lots of data to account for noise Applications: direct display & churn rate & advertising
  • 31.
  • 32.
    § “If youcannot measure it, you cannot improve it” – William Thomson (Lord Kelvin) § “You cannot control what you cannot measure” – DeMarco § “The way you measure is more important than what you measure” – Art Gust