SlideShare a Scribd company logo
1 of 33
Download to read offline
Kevin Cohn
Chief Operating Officer
@Atypon
Improving Research
Efficiency
Academic Publishing in Europe, Berlin
30 January 2013
User and Content Fingerprinting
• Provider of Software as a Service content
delivery for publishers
• Literatum platform used to deliver 15M journal
articles and 70,000 eBooks
• 1.5 billion user sessions in 2012
About Atypon
3 Improving Research Efficiency
• Research efficiency can be greatly improved if
publishers tap into their huge volume of data to
better connect users to content.
Thesis
4 Improving Research Efficiency
Users don’t want “advanced search...”
...but they do want relevant results.
This is the APE I’m looking for.
Data can drive this behavior.
• Relevancy is the only order that matters
• > 50% of clicks are to the first result
• > 90% of clicks are on the first page
• Filters/facets aren’t used
Observations
9 Improving Research Efficiency
• Give users what they want: a simple, Google-
like search interface
• But use proprietary data to calculate relevancy
for each individual user
Objectives
10 Improving Research Efficiency
Automatic Topic Modeling
11 Improving Research Efficiency
• Based on a statistical model called latent
Dirichlet allocation (LDA)
• Creates “topics:” collections of words that occur
together with great frequency
Topic #1: {mammal, primate, hominoidea}
Topic #2: {academic, publishing, europe}
Automatic Topic Modeling
12 Improving Research Efficiency
13 Improving Research Efficiency
13 Improving Research Efficiency
Topic #1
Topic #2
16 Improving Research Efficiency
16 Improving Research Efficiency
17 Improving Research Efficiency
17 Improving Research Efficiency
17 Improving Research Efficiency
18 Improving Research Efficiency
• My search for “APE” returns results about this
conference, not primates
• The same is true for recommendations
• Better related articles (topics 1 and 2 are not
related, despite sharing “APE”)
Applications
19 Improving Research Efficiency
• Topics are self-updating = low-cost, low-
maintenance
• Flat (not hierarchical) = avoids troublesome
questions about classification
• Probabilistic (not binary) = better at expressing
relevancy to topics
Not a Taxonomy/Ontology...
20 Improving Research Efficiency
21 Improving Research Efficiency
21 Improving Research Efficiency
• Topics are “collections of words that occur
together with great frequency”
• Knowing that “APE” is an acronym for
“Academic Publishing in Europe”
• Knowing that “CC0” and “CC BY” are Creative
Commons license types
...But Is Helped by Them
22 Improving Research Efficiency
• We didn’t invent ATM (or LDA)
• Our implementation started as a collaboration
with academic researchers...
• ...and will require considerable experimentation
and testing to get right
Worth Mentioning
23 Improving Research Efficiency
• Usage is not personally identifiable
• Usage is not shared with third parties
• Users can opt out of personalization
Privacy
24 Improving Research Efficiency
• ATM uses proprietary data to calculate
relevancy for each individual user
• Gives users what they want: a simple, Google-
like search interface
• Improves research efficiency by freeing up
searching time for reading
Summary
25 Improving Research Efficiency
Thank You
26 Improving Research Efficiency
KCohn@Atypon.com
Kevin Cohn
Chief Operating Officer, Atypon

More Related Content

Similar to Improving Research Efficiency: User and Content Fingerprinting

Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Rafal Kasprowski
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 

Similar to Improving Research Efficiency: User and Content Fingerprinting (20)

NISO's IOTA Working Group: Creating an Index for Measuring the Quality of Ope...
NISO's IOTA Working Group: Creating an Index for Measuring the Quality of Ope...NISO's IOTA Working Group: Creating an Index for Measuring the Quality of Ope...
NISO's IOTA Working Group: Creating an Index for Measuring the Quality of Ope...
 
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
 
Chandran Honour, Nature.com
Chandran Honour, Nature.comChandran Honour, Nature.com
Chandran Honour, Nature.com
 
29 cc 2_b_all_speakers
29 cc 2_b_all_speakers29 cc 2_b_all_speakers
29 cc 2_b_all_speakers
 
ASA Conference - New roles for the Modern Intermediary
ASA Conference - New roles for the Modern IntermediaryASA Conference - New roles for the Modern Intermediary
ASA Conference - New roles for the Modern Intermediary
 
Optimising Your Content for Findability
Optimising Your Content for FindabilityOptimising Your Content for Findability
Optimising Your Content for Findability
 
Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)
 
Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design) Value stream mapping for complex processes (innovation, Lean, service design)
Value stream mapping for complex processes (innovation, Lean, service design)
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
 
Understanding the Depth of Google Scholar and its Implication for Webometrics...
Understanding the Depth of Google Scholar and its Implication for Webometrics...Understanding the Depth of Google Scholar and its Implication for Webometrics...
Understanding the Depth of Google Scholar and its Implication for Webometrics...
 
166 sspcc1 b_newman
166 sspcc1 b_newman166 sspcc1 b_newman
166 sspcc1 b_newman
 
ROI In Corporate Libraries
ROI In Corporate LibrariesROI In Corporate Libraries
ROI In Corporate Libraries
 
Apis and scientific publishing
Apis and scientific publishingApis and scientific publishing
Apis and scientific publishing
 
We all do better when we work together: The International EconBiz Partner Net...
We all do better when we work together: The International EconBiz Partner Net...We all do better when we work together: The International EconBiz Partner Net...
We all do better when we work together: The International EconBiz Partner Net...
 
IWMW 2002: open source sofware debate: kelly
IWMW 2002: open source sofware debate: kellyIWMW 2002: open source sofware debate: kelly
IWMW 2002: open source sofware debate: kelly
 
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
Improving Search Strategies of Auditors –A Focus Group on Reflection Interven...
 
Shared book Academicpub.com Publisher Partnership Deck 2011
Shared book Academicpub.com Publisher Partnership Deck 2011Shared book Academicpub.com Publisher Partnership Deck 2011
Shared book Academicpub.com Publisher Partnership Deck 2011
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
 
The current oer search dilemma
The current oer search dilemmaThe current oer search dilemma
The current oer search dilemma
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Improving Research Efficiency: User and Content Fingerprinting

  • 1. Kevin Cohn Chief Operating Officer @Atypon Improving Research Efficiency Academic Publishing in Europe, Berlin 30 January 2013 User and Content Fingerprinting
  • 2.
  • 3. • Provider of Software as a Service content delivery for publishers • Literatum platform used to deliver 15M journal articles and 70,000 eBooks • 1.5 billion user sessions in 2012 About Atypon 3 Improving Research Efficiency
  • 4. • Research efficiency can be greatly improved if publishers tap into their huge volume of data to better connect users to content. Thesis 4 Improving Research Efficiency
  • 5.
  • 6. Users don’t want “advanced search...”
  • 7.
  • 8. ...but they do want relevant results.
  • 9. This is the APE I’m looking for.
  • 10. Data can drive this behavior.
  • 11. • Relevancy is the only order that matters • > 50% of clicks are to the first result • > 90% of clicks are on the first page • Filters/facets aren’t used Observations 9 Improving Research Efficiency
  • 12. • Give users what they want: a simple, Google- like search interface • But use proprietary data to calculate relevancy for each individual user Objectives 10 Improving Research Efficiency
  • 13. Automatic Topic Modeling 11 Improving Research Efficiency
  • 14. • Based on a statistical model called latent Dirichlet allocation (LDA) • Creates “topics:” collections of words that occur together with great frequency Topic #1: {mammal, primate, hominoidea} Topic #2: {academic, publishing, europe} Automatic Topic Modeling 12 Improving Research Efficiency
  • 15. 13 Improving Research Efficiency
  • 16. 13 Improving Research Efficiency
  • 19. 16 Improving Research Efficiency
  • 20. 16 Improving Research Efficiency
  • 21. 17 Improving Research Efficiency
  • 22. 17 Improving Research Efficiency
  • 23. 17 Improving Research Efficiency
  • 24. 18 Improving Research Efficiency
  • 25. • My search for “APE” returns results about this conference, not primates • The same is true for recommendations • Better related articles (topics 1 and 2 are not related, despite sharing “APE”) Applications 19 Improving Research Efficiency
  • 26. • Topics are self-updating = low-cost, low- maintenance • Flat (not hierarchical) = avoids troublesome questions about classification • Probabilistic (not binary) = better at expressing relevancy to topics Not a Taxonomy/Ontology... 20 Improving Research Efficiency
  • 27. 21 Improving Research Efficiency
  • 28. 21 Improving Research Efficiency
  • 29. • Topics are “collections of words that occur together with great frequency” • Knowing that “APE” is an acronym for “Academic Publishing in Europe” • Knowing that “CC0” and “CC BY” are Creative Commons license types ...But Is Helped by Them 22 Improving Research Efficiency
  • 30. • We didn’t invent ATM (or LDA) • Our implementation started as a collaboration with academic researchers... • ...and will require considerable experimentation and testing to get right Worth Mentioning 23 Improving Research Efficiency
  • 31. • Usage is not personally identifiable • Usage is not shared with third parties • Users can opt out of personalization Privacy 24 Improving Research Efficiency
  • 32. • ATM uses proprietary data to calculate relevancy for each individual user • Gives users what they want: a simple, Google- like search interface • Improves research efficiency by freeing up searching time for reading Summary 25 Improving Research Efficiency
  • 33. Thank You 26 Improving Research Efficiency KCohn@Atypon.com Kevin Cohn Chief Operating Officer, Atypon