2. Slides 3-4 Slides 5 Slides 6-7 Slides 8-11 Slides 12-18 Slides 19-20
2
Agenda Overview
Introduction
The
Experiments
The Need
Measurement
Framework
Putting it to Use
Closing / Q&As
3. A small multidisciplinary innovation team of editors, reporters, product
managers and engineers in the Guardian US newsroom testing new
mobile storytelling formats without the constraint of monetization
• Funded by the John S. and James L. Knight Foundation for two years
• Enabled by The Guardian
• Working for the industry
3
Guardian US Mobile Innovation Lab
Mission & background
4. Sarah Schmalbach, Product Manager
Sasha Koren, Editor
Alastair Coote, Engineer
Madeline Welsh, Associate Editor
Dylan Greif, Product Designer
Mazin Sidhamed, Reporter
Connor Jennings, Engineer
4
The team
5. 5
What do our notification experiments look like?
Daily leaderboard Real-time medal alerts Quizzes
6. Are publishers taking full advantage of notification technology for better
storytelling on mobile?
Will audiences understand and enjoy new types of news notifications?
Are audiences willing to subscribe to personalized notifications that cater to their
interests?
Does the audience find real-time data, quizzes and polls in notifications useful?
How does the audience interact with the notifications?
6
What were we hoping to learn?
7. Higher engagement with the website does not equate to a better
experience with notifications
• Sessions
• Pageviews
• Time on site
7
Challenges with established web metrics
≠
8. A Metrics Framework was established in order to have a scientific way of
making sense of the data and measuring success across experiments
Goals defined by the team:
• High level goal: Determine which formats for delivering content are the
most successful
• Strategic goal #1: Understand how users are engaging with the notifications
• Strategic goal #2: Understand what do users think of the user experience
8
Redefining success measurement with a framework
9. How users are engaging?
• Custom Google Analytics
implementation with event
tracking
9
Custom data collection
What did users think?
• Survey data
10. How are users engaging?
10
A new KPI : net interaction rate
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝒆𝒏𝒈𝒂𝒈𝒆𝒎𝒆𝒏𝒕𝒔 − 𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝒆𝒏𝒈𝒂𝒈𝒆𝒎𝒆𝒏𝒕𝒔
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒏𝒐𝒕𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏𝒔 𝒔𝒉𝒐𝒘𝒏
Net interaction rate:
• Signals whether or not the experiment was a success based on how positive or
negative the experience was based on types of interactions
11. What did users think?
11
Additional new KPIs: net survey scores
𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝒖𝒔𝒆𝒓 −𝟐.𝟓
𝟏.𝟓
x 100%
Net score for each surveyed aspect (1-4 scale useful/interesting):
Net score for re-enrollment: (yes/no/maybe)
𝑹𝒆𝒔𝒑𝒐𝒏𝒅𝒆𝒏𝒕𝒔 𝒘𝒉𝒐 𝒂𝒏𝒔𝒘𝒆𝒓𝒆𝒅 𝒚𝒆𝒔 −𝒏𝒐
𝑻𝒐𝒕𝒂𝒍 𝒓𝒆𝒔𝒑𝒐𝒏𝒅𝒆𝒏𝒕𝒔
x 100%
12. Analyzing the Leaderboard alerts
• Trackable user interactions:
• Tapping on the notification itself
• Tap on leaderboard button
• Tap on manage updates
• Swipe to dismiss
12
The Olympics: putting the framework to use
13. Analysis by total interactions is misleading
13
The Olympics: value of the net interaction rate
0
100
200
300
400
500
600
8/5 8/6 8/7 8/8 8/9 8/10 8/11 8/12 8/13 8/14 8/15 8/16 8/17 8/18 8/19
Engagements
Total Interactions
15. Net interaction rate aggregates the positive and negative engagements
15
The Olympics: value of the net interaction rate
16. Net interaction rate
16
The Olympics: evaluating success
Re-enrollment survey KPI
-53%
-20%
9%
Alerts
Polls
Quizzes
-24%
-10%
70%
Quizzes
Polls
Alerts
Alerts had a negative net interaction rate from dismisses but a high re-enrollment score
So were the alerts successful?
17. New survey question: If you dismissed the alert,
what was the most common reason?
17
The Olympics: evaluating success
Re-evaluating assumptions:
Are dismisses actually negative?
I was informed and didn't
need it anymore
I was annoyed and didn't want
it
I wanted to unsubscribe but
didn't know how
Other
84%
18. 18
The Olympics: evaluating success
• Overall success with the alerts
• High re-enrollment rate
• Demonstrated user interest and the
opportunities for more custom alerts
• Positive net interaction rate
• Majority of users gained utility from the alert and
some engaged with the full leaderboard content
• Signals that alerts can provide utility on their own
and also give convenient access to deeper coverage
14% Net Interaction Rate
(dismisses as neutral)
70% Re-enrollment Rate
19. 19
Takeaways / Q&As
Create a framework around project goals
• Discuss project hypotheses, and positive and negative outcomes in advance
Build upon that framework over time
• Think ahead about how this framework might be built upon in the future for
similar projects
Be flexible, persistent and inclusive during analysis and interpretation
• As with building the framework, include the entire team in the interpretation and
analysis, and be open to all viewpoints
Open communication is necessary
• Incorporating many viewpoints often strengthens the quality and sharpens the
focus of the of the data you gather
In order to answer the questions Sarah mentioned, it is important to analyze engagement by looking at specific metrics. However, we couldn’t just look at existing web metrics such as sessions, pageviews, and time on site. These metrics are specific to engagement on the website and not all notifications actually drove to the website. If you looked only at web metrics, we would be missing out on any of the itneractions occurring prior to someone hitting the website. Additionally, higher engagement with the website doesn’t equate to a better experience with the notifications, maybe someone accidentally clicked through but had no idea what the buttons were meant to do.
Since existing metrics weren’t the best form of measurement, we set out to create a metrics framework. A metrics framework essentially outlines what metrics to use based on your business goals. It provides a scientific way of making sense of the quantitative data. So the first step for us as a team was to determine what the actual goal of the experiments was. After a lot of discussion, we agreed that the high level goal is to determine which formats for delivering content are the most successful. Under that umbrella, two additional strategic goals.
Strategic goal #1: Understand how users are engaging with the notifications
Strategic goal #2: Understand what do users think of the user experience in order to improve future experiments
Now that we had a goal in mind, we needed to ensure that sufficient data was being captured in order to calculate any KPIs and create benchmarks
Strategic goal #1: Understand how users are engaging with the notifications
Since the default Google Analytics implementation does not track interactions prior to a user landing on a website, a custom implementation must be used
Each user interaction is tracked as an event with detailed granularity to differentiate the type of interaction (e.g. expand, click-through, dismiss, stop notifications, etc.)
Strategic goal #2: Understand what do users think of the user experience in order to improve future experiments
In order to best understand user opinion, a survey is provided at the end of the experiment
While surveys are qualitative in nature, questions with dichotomous or numerical scale answer choices can be measured quantitatively
Essential to ask consistent questions
Overall, what was your impression of the experimental alerts? [Useful/Interesting]
1-5 rating scale
Would you sign up for these notifications again?
Yes/No/Maybe
With data collection in place, we developed two KPIs to help measure success -Not just about getting as many engagements as possible as the type of engagements matter as well
Users engage in various ways due to different reasons
The net interaction rate signals whether or not an experiment was a success based on how positive or negative the experience was from specific types of interactions
Specific positive and negative engagements would need to be defined by team based on deep thinking about the interactions they are trying to drive
E.g. is someone closing a notification bad? Is someone sharing their quiz results good?
Developing this metric requires teams to talk upfront about the user experience
Promotes open communication with a multidisciplinary team (editorial, product, analytics, development, design) and solidifies that everyone has a shared vision before launching
Establishing common definitions ensures that everyone can consistently interpret the data
We also created a set of KPIs based on the key survey questions to understand what did users think. We had a net score for usefulness as well as interesting based on the 1-4 rating scale. The equation here takes the 1-4 scale and creates a score so that if all users thought it was useful, it would have a score of 100% and if all rated 1, it would be -100%. The score makes it easy to understand success on a spectrum. Same with re-enrollment.
having these quantifiable scores makes it easy to create benchmarks and compare against other experiment formats.
User satisfaction metrics (quantifiable from survey question answers)
Net scores for relevance, usefulness, and interesting based on 1-5 rating scale
3 is the score users will rate if they are ambivalent
2 is the denominator to normalize for the maximum range possible from the median
Net score for re-enrollment:
Assign value of +1 for “Yes”
Assign value of -1 for “No”
Assign value of 0 for “Maybe”
Update of medal counts sent in a daily news notification during the Rio Olympics
Trackable possible user interactions:
Tapping on the notification itself: Drove to the live blog page
Tap on leaderboard: Drove to an interactive page with stats on each team’s performance
Tap on manage updates
Dismissing the notification
By only examining total interactions over time, such a KPI might lead you to think the experiment was more successful towards the end of the Olympics
Breaking out the engagements into positive and negative groupings, revealed a different story
Defined by the team:
Positive: Taps, Leaderboard Taps, Manage Update Taps
Negative: Dismisses
Although the total number of interactions increased over time, it was driven by negative engagements
Using the net interaction rate aggregates the positive and negative engagements
Graphing the rate over time shows a clear negative linear relationship
The net interaction rate decreasing over time, suggested that users were more likely to close the notification after receiving the leaderboard alert every day for two weeks
From this it showed us we needed to recategorize dismisses and reevaluate the net interaction rate so that dismisses weren’t negative. While we had to modify the components of the metrics framework, having the metrics framework gave us a place to start. The nature of the experiments were experimental and brand new so an iterative learning process for measurement is required. There wasn’t any way prior to the experiment for us to know that dismisses weren’t in fact negative since it was a new notification with new types of user engagement. Unlike typical measurement frameworks, working with these experiments really requires you to be flexible and adapt.