Small Data: How Elsevier Might Help with Research Data Management


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Metadata are not captured digitallyOpen to mis-interpretationLack of creditIntellectual property or possible patent issuesEasier contradictionNo incentive, little value to the sharer, even dis-incented from current reward modelsDifferent skill setsTakes time and mindset away from researchRequires common nomenclaturemissing in many domainsnomenclature convergence only happens in mature sciencemany researchers are invested in nomenclature discussionsPrivacy and security concernsCostIt is a long-tail problem: thousands of narrow solutions provides the best value
  • Analytics at scaleMEDai: analyze every treatment event in a hospital for protocol variationsRisk Solutions: analyze public data for fraud detection and predictionShepardizing legal casesFunding from freeReaxys (chemical reactions database, literature and patents)Chemical resistance of plastics (manufacturer data normalized)Pathway Studio (enzymatic pathways for drug discovery, eventually personalized medicine)Geofacets (geologic information for exploration)LexisNexis
  • Small Data: How Elsevier Might Help with Research Data Management

    1. 1. Small Data: How Elsevier Might Help With Research Data Management David Marques 27 February 2013 Research Data Symposium Columbia University
    2. 2. Assertions• We share a common goal: an open system of ubiquitous sharing of research data in repositories that are – discipline-specific – controlled-vocabulary annotated – Normalized• A very small portion of research data is being shared to the discipline-specific repositories 2
    3. 3. Problem statement• There are a lot of barriers to sharing of data• There are problems with sustainable funding for repositories 3
    4. 4. Points of this presentation• We can help remove the barriers by – applying rigorous yet efficient process – using discipline-specific informatics skills – providing credit assignment and assessment – helping capture metadata early and digital• It is possible, and we can help to create sustainable funding models for open data repositories 4
    5. 5. Big Data vs Research Data Plan Data life cycle taken from DataONE Analyze Collect PlanIntegrate Big Data Emphasis Assure Analyze Collect Research Data Pain Discover Describe Integrate Assure Preserve Discover Describe Preserve 5
    6. 6. Dataset Repositories: MANY solutions• Figshare [] (Digital Science)• GigaDB [] (BioMed Central)• DataDryad []• Australian National Data Service [] – but: their goal is to move from• Amazon’s Glacier [] 6
    7. 7. Problem 1: Barriers to Data Disclosure and Sharing• Non-digital Metadata • Open to mis-interpretation• Different skill sets • Lack of credit• Takes time and mindset • Intellectual property or away from research possible patent issues• Requires common • Easier contradiction nomenclature • No incentive, little value to• Cost the sharer• It is a long-tail problem: • Privacy and security thousands of narrow concerns solutions provides the best value 7
    8. 8. Are supplemental files the answer?• Scope – 15% of 2012 Elsevier articles had supplemental files – ~ 1% have spreadsheets – ~ 2% have either spreadsheets or zip files• Extracting value – no rules for supplemental files – no common nomenclatures – analytics, comparisons, trends are hard• Elsevier recommends (and some journals such as Cell Press journals require) that authors share/deposit data in discipline repositories• Linking helps ovecome the credit barrier – Elsevier links articles to/from datasets in open repositories – 35 today (including EarthChem) – 10 more in progress 8
    9. 9. 9
    10. 10. 10
    11. 11. 11
    12. 12. 12
    13. 13. Problem 2: Sustainability• Many are grant-funded initially, as research projects – and funding bodies often do not intend to fund repositories long term• Can we fund from a Gold Open Access model?• Can we fund from high-end analytics subscriptions?• Can we fund some of them from health care and corporate use? 13
    14. 14. PLAN 10% PROPOSE SUPPORT SERVICES 25% 19% ACQUISITION 15% ACCESS submission agreement STORAGE, data formats searching and ordering DATA MANAGEMENT IP rules user guides user documentation anddelivery of result sets and support reports 6% INGEST 25% receive QA and validation transform create metadata (taxonomies) updates PRODUCE/ PUBLISH reference linking MANAGE Summary of data in: Keeping Research Data Safe2, Beagrie et al, 2010 funded by JISC 14
    15. 15. Pain Points and Elsevier Strengths and Expertise• Taxonomies – 50+ discipline-specific taxonomies – core to Elsevier• At-scale, efficient, best-practices process• At-scale analytics• Turning freely-available data into high-value solutions for corporate use without advertising (advertising models require very large customer groups)• Impact analysis and reporting 15
    16. 16. Research Data Services – new group at Elsevier• Goals – Increase archiving and sharing of research data (as requested by funding bodies) – Increase the value of shared data (with metadata) – Foster and assist with the credit and impact assessment of research data for the researcher, the institution, and the funding bodies – Increase the sustainability of data repositories• Principles – Open data – all data remain open and available – Collaborative – with institutions, the research community, funding bodies – Transparent business model – if we make money, some goes back to fund the repositories 16
    17. 17. Pilot: see if we Research Data Management can scale a Plan Pilot: collecting repository and data with an make it app, integrating Data Management financially and sharing with sustainable Analyze Collect a dashboard Plan An aly c t nd t Do i c s e ru a ur st u s m EnPilot: user fra B ain gi K nes I n ataLDR to , D Pilot: collect andconnect standardize Method Toolsdata fromdifferent Integrate Linked Data Repositories RDM (VizTrails) Assure method and provenancerepositories IEDA/EarthC s, T ie ries B e ax ubeto create o m to st ono Repositories, Data x on irec Pr m ac ie collaborationinsight Mgmt Plans Ta , D tic s, O es with Kerstin SE Discover Describe Lehnert. Pilot: annotate Pilot: create data and directories to methods with help discover standard Preserve data in shared taxonomies repositories 17
    18. 18. Disclosure Pilot Benefits for the Researcher• Immediate visibility and overview of the research (PI Dashboard)• Enhanced discoverability of research data attributable to the university and the research team• Credit/impact for the university, the research team, and the funding bodies• Acknowledgement by the funding bodies of the disclosure/sharing of the data• [better, faster science] 18
    19. 19. Disclosure Pilot Benefits for the Institution• Increased rigor of data management – consistency – best practices – overview metadata in research management information systems• Step toward completeness of research data management• Compliance to funding body requirements, stronger base from which to request• Increased visibility, discoverability, credit 19
    20. 20. Disclosure Pilot Benefits for the Funding Body• Increased data disclosure and sharing• Increased discoverability of data (with funding body credit)• Increased opportunity for ‘fourth paradigm’ (analytics- derived) science – better, faster science• Credit/impact for sponsored research• Standardization and best practices in data management plans and actual data curation/preservation 20
    21. 21. Research fundingToday’s funding models Data mgmt (Gold OA) FREE License or subs. 21
    22. 22. Research fundingIncreasingly common models Data mgmt (Gold OA) FREE License or subs. Translational Medicine Analytics 22
    23. 23. Research fundingWorking together, we could do this Data mgmt (Gold OA) FREE License or subs. Task-specific Analytics 23
    24. 24. An interesting quote at the IDCC13 cost workshop [loosely quoted, I did not catch it verbatim]We can’t do this by ourselves. We should get someone with business savvy to partner with us. 24
    25. 25. ? 25