Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DATA FOR SCIENCE
HOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS
Paul Groth | @pgroth | pgroth.com
Disruptive Te...
12 million people
per month
40 million reactions
75 million compounds
500 million facts
3 EXAMPLES
• Personalized: what should I read?
• Actionable: who should I collaborate with?
• Consumable: how do I make my...
RECOMMENDATIONS AT MENDELEY
• Maya Hristakeva
• Data Scientist at Mendeley
• @mayahhf
• Spark Summit 2015
• http://www.sli...
Read
&
Organize
Search
&
Discover
Collaborate
&
Network
Experiment
&
Synthesize
MENDELEY BUILDS TOOLS TO HELP
RESEARCHERS …
BEING THE BEST RESEARCHER YOU CAN BE!
• Good researchers are on top of their game
• Large amount of research produced
• Ta...
PERSONALIZED ARTICLE RECOMMENDATION
Input:
User libraries
Output:
Suggested
articles to read
Algorithms:
• Collaborative F...
Costly & GoodCostly & Bad
Cheap & GoodCheap & Bad
Tuned IB Mahout
Tuned UB Mahout
Tuned UB Spark
Tuned IB Spark
UB DimSum
...
CALCULATING 75 TRILLION METRICS
• Benchmark 4600 institutions & 220 countries updated weekly
• 40 terabytes of data
• HPCC...
ALL DATA ISN’T CURATED
60 % OF TIME IS SPENT ON DATA
PREPARATION
10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA
https://www.elsevier.com/con
nect/10-aspects-of-highly-
effective-research-da...
http://data.mendeley.com/
Each dataset receives a versioned DOI,
so it can be cited
The citation for the
associated articl...
ACADEMIC COLLABORATIONS
CONCLUSION
• Researchers are faced with an ever growing amount of data and content
• Data Science is key to making systems...
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
Upcoming SlideShare
Loading in …5
×

Data for Science: How Elsevier is using data science to empower researchers

825 views

Published on

Each month 12 million people use Elsevier’s ScienceDirect platform. The Mendeley social network has 4.6 million registered users. 3500 institutions make use of ClinicalKey to bring the latest in medical research to doctors and nurses. How can we help these users be more effective? In this talk, I give an overview of how Elsevier is employing data science to improve its services from recommendation systems, to natural language processing and analytics. While data science is changing how Elsevier serves researchers, it’s also changing research practice itself. In that context, I discuss the impact that large amounts of open research data are having and the challenges researchers face in making use of it, in particular, in terms of data integration and reuse. We are at just beginning to see of how technology and data is changing science correspondingly this impacts how best to empower those who practice it.

Published in: Technology
  • Be the first to comment

Data for Science: How Elsevier is using data science to empower researchers

  1. 1. DATA FOR SCIENCE HOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs European Data Forum 2016
  2. 2. 12 million people per month
  3. 3. 40 million reactions 75 million compounds 500 million facts
  4. 4. 3 EXAMPLES • Personalized: what should I read? • Actionable: who should I collaborate with? • Consumable: how do I make my data available?
  5. 5. RECOMMENDATIONS AT MENDELEY • Maya Hristakeva • Data Scientist at Mendeley • @mayahhf • Spark Summit 2015 • http://www.slideshare.net/SparkSummit/sparkin g-science-up-with-research-recommendations- by-maya-hristakeva
  6. 6. Read & Organize Search & Discover Collaborate & Network Experiment & Synthesize MENDELEY BUILDS TOOLS TO HELP RESEARCHERS …
  7. 7. BEING THE BEST RESEARCHER YOU CAN BE! • Good researchers are on top of their game • Large amount of research produced • Takes time to get what you need • Help researchers by recommending relevant research
  8. 8. PERSONALIZED ARTICLE RECOMMENDATION Input: User libraries Output: Suggested articles to read Algorithms: • Collaborative Filtering – Item-based – User-Based – Matrix Factorization • Content-based
  9. 9. Costly & GoodCostly & Bad Cheap & GoodCheap & Bad Tuned IB Mahout Tuned UB Mahout Tuned UB Spark Tuned IB Spark UB DimSum Spark MLlib ALS Matrix Fact. Spark MLlib Performance +100% +150% ~$50
  10. 10. CALCULATING 75 TRILLION METRICS • Benchmark 4600 institutions & 220 countries updated weekly • 40 terabytes of data • HPCC massively parallel compute system – 40 node system
  11. 11. ALL DATA ISN’T CURATED
  12. 12. 60 % OF TIME IS SPENT ON DATA PREPARATION
  13. 13. 10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA https://www.elsevier.com/con nect/10-aspects-of-highly- effective-research-data
  14. 14. http://data.mendeley.com/ Each dataset receives a versioned DOI, so it can be cited The citation for the associated article is displayed
  15. 15. ACADEMIC COLLABORATIONS
  16. 16. CONCLUSION • Researchers are faced with an ever growing amount of data and content • Data Science is key to making systems that help them • I’ve shown three Elsevier examples. Many more! • Antonio Gulli’s codingplayground.blogspot.nl • labs.elsevier.com • Of course, we’re hiring  Contact: Paul Groth @pgroth

×