Jonathan Roberts
Chief Innovation Officer, Dotdash.com
Companies make many decisions for many different reasons, some of which are sometimes based in data. As a data scientist, your research skills can get you to the correct answer. But the real value of a data science team comes when you can identify the correct question. Jon will talk about his four-year transition from research physicist to senior executive, how data science research helped drive the biggest change in the company’s history, and drove one of the biggest experiments ever undertaken on the internet.
3. First convince people that science can
be applied to their problems
3
Physics is all about
extracting meaning from
messy time series data
4. First convince people that science can
be applied to their problems
4
Physics is all about
extracting meaning from
messy time series data
Turns out the internet has some
messy time series data too
6. Spend as much timing getting to the right
question, as getting to the right answer.
6
7. Spend as much timing getting to the right
question, as getting to the right answer.
Asking the right question is rarer
than being able to answer it
7
8. Don’t mix up machine learning and
human learning.
8
9. Don’t mix up machine learning and
human learning.
Interpretability matters. Never use a
neural net if a linear fit will work.
9
11. Make every slide title the conclusion.
Your slides will be taken out of context.
11
12. The result? You get a lot of questions
• What makes people click on a link?
• Is there a ‘best’ length of a piece of content?
• Do twitter followers affect search traction?
• Is the concept of a ‘kitten’ more closely related the concept of a ‘cat’ or the concept
of ‘cuteness’?
12
13. The result? You get a lot of questions
• What makes people click on a link?
• Is there a ‘best’ length of a piece of content?
• Do twitter followers affect search traction?
• Is the concept of a ‘kitten’ more closely related the concept of a ‘cat’ or the concept
of ‘cuteness’?
What is About.com?
13
14. A Little History - and a Problem
• About is one year older than Google
14
15. A Little History - and a Problem
• About is one year older than Google
• It was sold by the New York Times in 2012
15
16. A Little History - and a Problem
• About is one year older than Google
• It was sold by the New York Times in 2012
• When IAC bought it, there were millions of pieces of content, covering over a thousand
topics, read by millions of people every day.
16
17. A Little History - and a Problem
• About is one year older than Google
• It was sold by the New York Times in 2012
• When IAC bought it, there were millions of pieces of content, covering over a thousand
topics, read by millions of people every day.
• We had 1000 writers, 200 full time staff, 10 editors, and 1 problem
17
18. A Little History - and a Problem
• About is one year older than Google
• It was sold by the New York Times in 2012
• When IAC bought it, there were millions of pieces of content, covering over a thousand
topics, read by millions of people every day.
• We had 1000 writers, 200 full time staff, 10 editors, and 1 problem:
What was About.com?
18
19. Understanding our company was a
big data problem.
1. Categorise all the content
2. Provide one clear simple plot
3. Use our content to understand our audience and tell data driven stories.
19
23. Are American’s more interested in
gymnastics during an Olympics?
Or in football during the Superbowl?
23
24. Are American’s more interested in
gymnastics during an Olympics?
Or in football during the Superbowl?
It’s Gymnastics. For every Olympics in
the 21st Century.
24
26. “We don’t have a millennial audience”
Yes we do – we are representative of the
internet. And let me tell you about them.
26
27. Millennial women are 3x more interested
in going to Paris than non-millennial
women
27
28. Millennial women are 3x more interested
in going to Paris than non-millennial
women
Millennial guys are just as un-interested
in going to Paris as non-millennial guys.
28
29. Build a stable baseline, and look for
exceptions.
• We don’t need to use February 2017
to predict March 2017.
• We can use 20 years of Marches to
predict each day of March 2017.
• Every day we know whether the
country is behaving as expected.
29
30. On November 9th the world changed
• Health interest dropped 40% in one hour
• It did not return to normal for three days
• We see the same pattern after the Super Bowl, and after
holidays
The election gave the country a three day hangover
30
31. So when you listen to 20 years of
data, what does it tell you?
• One great recipe is better than 10 slight variations
We focused on deepening our best content
• Updating good content beats writing new articles
Our articles are now updated at least every six months
• Get out of the way of users, and answer the question they have right now
We prioritised page speed, clean simple design, and recommendations off the
user’s intent
• You can’t be all things to all people
About.com is the wrong product for today’s internet
31
33. 33
• With a stable baseline, you can see
the world change:
• Through Q3, traffic in 2016 followed
the daily prediction within a few
percent
• We launched Lifewire on October
15th
• After a short (expected) dip – visits
went through the roof.
And it totally worked.
Lifewire traffic vs. Seasonal Prediction
34. All five mature launches are now top 10 sites, and by
far the fastest growing in their categories.
34
35. Conclusions
• Companies (and the internet) are big data problems.
• Executives aren’t used to seeing clear answers on complex systems.
• Spend as much time on the right question, as on the right answer.
• Don’t sacrifice human learning at the expense of machine learning.
• One simple plot is much more valuable than a deep dive into methodology.
• Find a way to tell a story: anyone can retell a story, very few can retell a research
paper.
35