Web Science: How is it different?
Daniel Tunkelang, LinkedIn
Keynote Address at ACM Web Science 2014 Conference
The scientific method of observation, measurement, and experiment may be our greatest achievement as a species. The technological innovation we enjoy today is the product of a culture of systematized scientific experimentation.
But historically scientific experimentation has been expensive. Experiments consumed natural resources, took a long time to conduct, and required even more time and labor to analyze. In order to be productive, scientists have had to factor these costs into their work and to optimize accordingly.
Web science is different. Not, as some have speciously argued, because big data has made the scientific method obsolete. The key difference is that web science has changed the economics of scientific experimentation. Thus, even as web scientists apply the traditional scientific method, they optimize based on very different economics.
In this talk, I'll survey how web science has changed our approach to experimentation, for better and for worse. Specifically, I'll talk about differences in hypothesis generation, offline analysis, and online testing.
Bio
Daniel Tunkelang is Head of Query Understanding at LinkedIn, where he previously formed and led the product data science team. LinkedIn search allows members to find people, companies, jobs, groups and other content. His team aims to provide users with the best possible results that satisfy their information needs and help to get insights from professional data. Tunkelang has BS and MS degrees in computer science and math from MIT, and a PhD in computer science from CMU. He co-founded the annual symposium on human-computer interaction and information retrieval (HCIR) and wrote the first book on Faceted Search (Morgan and Claypool 2009). Prior to joining LinkedIn, Tunkelang was Chief Scientist of Endeca (acquired by Oracle in 2011 for $1.1B) and leader of the local search quality team at Google, mapping local businesses to their home pages. He is the co-inventor of 20 patents.
4. How have the web and big data changed science?
Let’s ask some of the experts.
5. “You have to kiss a lot of frogs to find one prince.
So how can you find your prince faster? By finding
more frogs and kissing them faster and faster.”
Mike Moran
Do It Wrong Quickly: How the Web Changes the Old Marketing Rules, 2007
Cited by Kohavi in Online Controlled Experiments at Large Scale, 2013
7. “The cost of experimentation is now the same or
less than the cost of analysis. You can get more
value…by doing a quick experiment than from
doing a sophisticated analysis.”
Michael Schrage
Value-Creation, Experiments, and Why IT Does Matter, 2010
9. “with massive data, this approach to science —
hypothesize, model, test — is becoming obsolete…
Petabytes allow us to say: "Correlation is enough."
We can stop looking for models…analyze the data
without hypotheses…throw the numbers into the
biggest computing clusters the world…and let…
algorithms find patterns where science cannot.”
Chris Anderson
The End of Theory, 2008
32. But we pay the price.
Example: search engine improvements in batch
evaluations don’t always predict real user benefits.
[Hersh et al, 2000] Do Batch and User Evaluations Give the Same Results?
[Turpin & Hersh, 2001] Why Batch and User Evaluations do not Give the Same Results
[Turpin, Scholer, 2006] User Performance versus Precision Measures for Simple
Search Tasks
But also see…
[Smucker & Jethani, 2010] Human Performance and Retrieval Precision Revisited
34. To summarize: how is
web science different?
• Online testing is cheaper and scalable.
• Data exploration tools make hypothesis
generation cheaper and easier.
• But the experiments that are easy and
cheap aren’t always the most valuable.
• Easy to forget our biases as scientists.
35. Take-Aways
• The scientific method is alive and well. Big
data has just changes the economics.
• Cheaper hypothesis testing and generation
has already been transformative.That’s why
big data matters.
• But we neglect the human side of scientific
experimentation at our peril.
James Lind thought that scurvy was due to putrefaction of the body which could be helped by acids, and thus included a dietary supplement of an acidic quality in the experiment. This began after two months at sea when the ship was afflicted with scurvy. He divided twelve scorbutic sailors into six groups of two. They all received the same diet but, in addition, group one was given a quart of cider daily, group two twenty-five drops of elixir of vitriol (sulfuric acid), group three six spoonfuls of vinegar, group four half a pint of seawater, group five received two oranges and one lemon, and the last group a spicy paste plus a drink of barley water. The treatment of group five stopped after six days when they ran out of fruit, but by that time one sailor was fit for duty while the other had almost recovered. Apart from that, only group one also showed some effect of its treatment.