Improving Research Efficiency: User and Content Fingerprinting
1. Kevin Cohn
Chief Operating Officer
@Atypon
Improving Research
Efficiency
Academic Publishing in Europe, Berlin
30 January 2013
User and Content Fingerprinting
2.
3. • Provider of Software as a Service content
delivery for publishers
• Literatum platform used to deliver 15M journal
articles and 70,000 eBooks
• 1.5 billion user sessions in 2012
About Atypon
3 Improving Research Efficiency
4. • Research efficiency can be greatly improved if
publishers tap into their huge volume of data to
better connect users to content.
Thesis
4 Improving Research Efficiency
11. • Relevancy is the only order that matters
• > 50% of clicks are to the first result
• > 90% of clicks are on the first page
• Filters/facets aren’t used
Observations
9 Improving Research Efficiency
12. • Give users what they want: a simple, Google-
like search interface
• But use proprietary data to calculate relevancy
for each individual user
Objectives
10 Improving Research Efficiency
14. • Based on a statistical model called latent
Dirichlet allocation (LDA)
• Creates “topics:” collections of words that occur
together with great frequency
Topic #1: {mammal, primate, hominoidea}
Topic #2: {academic, publishing, europe}
Automatic Topic Modeling
12 Improving Research Efficiency
25. • My search for “APE” returns results about this
conference, not primates
• The same is true for recommendations
• Better related articles (topics 1 and 2 are not
related, despite sharing “APE”)
Applications
19 Improving Research Efficiency
26. • Topics are self-updating = low-cost, low-
maintenance
• Flat (not hierarchical) = avoids troublesome
questions about classification
• Probabilistic (not binary) = better at expressing
relevancy to topics
Not a Taxonomy/Ontology...
20 Improving Research Efficiency
29. • Topics are “collections of words that occur
together with great frequency”
• Knowing that “APE” is an acronym for
“Academic Publishing in Europe”
• Knowing that “CC0” and “CC BY” are Creative
Commons license types
...But Is Helped by Them
22 Improving Research Efficiency
30. • We didn’t invent ATM (or LDA)
• Our implementation started as a collaboration
with academic researchers...
• ...and will require considerable experimentation
and testing to get right
Worth Mentioning
23 Improving Research Efficiency
31. • Usage is not personally identifiable
• Usage is not shared with third parties
• Users can opt out of personalization
Privacy
24 Improving Research Efficiency
32. • ATM uses proprietary data to calculate
relevancy for each individual user
• Gives users what they want: a simple, Google-
like search interface
• Improves research efficiency by freeing up
searching time for reading
Summary
25 Improving Research Efficiency
33. Thank You
26 Improving Research Efficiency
KCohn@Atypon.com
Kevin Cohn
Chief Operating Officer, Atypon