Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Developing data services: a tale from two Oregon universities

1,873 views

Published on

While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.

Published in: Education, Technology
  • Be the first to comment

Developing data services: a tale from two Oregon universities

  1. 1. Developing data services A tale from two Oregon universities NN/LM, Pacific Northwest Region PNR Rendezvous | 18 June 2014 Melissa Haendel OHSU Library Amanda Whitmire OSU Libraries
  2. 2. B.S. in Aquatic Biology, 2000 Worked in a bioluminescence laboratory Ph.D. in Oceanography, emphasis in biological oceanography, 2008 Dissertation study area: bio-optics; using optical tools to study ocean ecology (N. California Current) Post-doc in Oceanography, emphasis in biological oceanography, 2008-2012 Study area: bio-optics; using optical tools to study ocean ecology in low oxygen zones (N. Chile) Assistant Professor, Data Management Specialist, Sept. 2012 - present About Amanda… Not a librarian.
  3. 3. B.A. in Chemistry, 1990 Modeled drug-receptor ligand binding Ph.D. in Neuroscience, 1999, Dissertation study area: Identification of novel genes involved in neural development in the mouse Post-doc, 2002-2004 Study area: Toxic effects of biocides in zebrafish and salmon Assistant Professor, Library, 2010 – present Lead semantic research team About Melissa… Not a librarian. Post-doc, 2000-2002, Study area: Role of thyroid hormone during neural cell death in zebrafish Post-doc, 2002-2004 Study area: Ontologies, data models, gene nomenclature, biocuration ?
  4. 4. Do you have any data-related tasks or responsibilities in your job description or duties? [Yes/No] What role do you believe metadata plays in the modern research cycle? [big, small, none, other] Questions
  5. 5. Why data management? The researcher perspective Why libraries? Why bring in non-librarians? Amanda & Melissa share their experiences Wrap-up image credit: http://www.flickr.com/photos/54803625@N08/8296296949/
  6. 6. “…the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” Research data is: U.S. Office of Management and Budget, Circular A-110 6
  7. 7. “Unlike other types of information, research data are collected, observed, or created, for the purposes of analysis to produce and validate original research results.” What is research data? University of Edinburgh MANTRA Research Data Management Training, ‘Research Data Explained’ 7
  8. 8. Actions that contribute to effective storage, use, preservation, and reuse of data and documentation throughout the research lifecycle. Data management:
  9. 9. Why data management?
  10. 10. Images collected by DataONE.org
  11. 11. Photocourtesyofwww.carboafrica.net Data is collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship Data deluge Photocourtesyof http://modis.gsfc.nasa.gov/ Photocourtesyof http://www.futurlec.com CCimagebytajaionFlickr CCimagebyCIMMYTonFlickr ImagecollectedbyVivHutchinson Slide credit: http://www.dataone.org/education-modules
  12. 12. Federal movement toward open data 1985: National Research Council 1999: OMB Circular A-110 revisions 2003: NIH Data Sharing Policy 2008: NIH Public Access Policy 2011: NSF DMP requirement 2012: NEH, Office of Digital Humanities DMP requirement 2013: NSF bio- sketch change 2013: OSTP memo on public access to results of federally funded data
  13. 13. More funder mandates are coming 22 Feb. 2013
  14. 14. The memorandum states that, “digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” To this end, federal agencies must create a public access plan that includes the following mandates: • Maximize public access to data while protecting personal privacy and confidentiality, intellectual property, and balancing costs with long-term benefits; • Ensure that investigators create data management plans that describe strategies for long-term preservation of and access to data; • Costs of data management are included in proposal budgets; • Ensure that the merits of data management plans are properly evaluated; • Implement mechanisms to ensure that investigators comply with their data management plans and policies; • Promote deposition of data into publicly accessible repositories; • Encourage private and public cooperation to improve data access and interoperability; • Develop and standardize approaches to data citation/attribution; • Support training in data management best practices; • Assess needs and strategies for the long-term preservation of data.
  15. 15. Journal data policies
  16. 16. Information propagation tales: The researcher’s perspective
  17. 17. Data isn’t always what it seems
  18. 18. Assertion: “β amyloid, known for its role in injuring brain in Alzheimer’s disease, is also produced by and injures skeletal muscle fibres in the muscle disease sporadic inclusion body myositis.” Greenberg 2009
  19. 19. BMJ 2009;339:b2680 doi:10.1136/bmj.b2680 All 242 papers point to 4 from same lab, and very few to the ones with negative results Greenberg, 2009
  20. 20. How do we believe what we think we know?  Is it true or do we just believe it because everyone else does?  How do we transcend “follow the leader”? What tools can we build to help us?
  21. 21. How reproducible is science? Let’s start simple. Do we know what the ingredients were?
  22. 22. Journal guidelines for methods are often poor and space is limited “All companies from which materials were obtained should be listed.” - A well-known journal Reproducibility is dependent at a minimum, on using the same resources. But…
  23. 23. How identifiable are resources in the published literature? An experiment in reproducibility Gather journal articles 5 domains: Immunology Cell biology Neuroscience Developmental biology General biology 3 impact factors: High Medium Low 84 Journals 248 papers 707 antibodies 104 cell lines 258 constructs 210 knockdown reagents 437 model organisms
  24. 24. Only ~50% of resources were identifiable Vasilevsky et al, 2013, PeerJ
  25. 25. There is no correlation between impact factor and resource identification Journal Impact Factor 0 10 20 30 40 Fractionofresourcesidentified 0.0 0.2 0.4 0.6 0.8 1.0 Antibodies Cell Lines Constructs Knockdown reagents Organisms
  26. 26. Maybe labs are just disorganized?
  27. 27. Meet the Urban Lab Meet the Urban Lab
  28. 28. A+ organization! The Urban lab antibodies
  29. 29. Of 9 antibodies published in 5 articles, only 44% were identifiable Percentidentifiable 0% 25% 50% 75% 100% Commerical Ab identifiable Catalog number reported Source organism reported Target uniquely identifiable
  30. 30. Resource information is not adequately getting into the literature, EVEN THOUGH IT IS READILY AVAILABLE The problem is a lack of standards, review, and tools LIBRARIES CAN HELP!!!!!!
  31. 31. http://www.force11.org/Resource_Identification_Initiative Numerous endorsers https://www.force11.org/RII/SignUp Implementation of the new standard http://biosharing.org/bsg-000532
  32. 32. Sample citation: Polyclonal rabbit anti- MAPK3 antibody, Abgent, Cat# AP7251E, RRID:AB_2140114 1. Research er submits a manuscri pt for publicatio n 2. Editor or Publisher OR LIBRARIA N! asks for inclusion of RRID 3. Author goes to Research Identification Portal to locate RRID 4. RRID is included in Methods section and as Keyword Publishing Workflow
  33. 33. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming- degree-it-not-trouble
  34. 34. $1.3 million grant from the Laura and John Arnold Foundation to validate 50 landmark cancer biology studies Partnership between Science Exchange, PLoS, FigShare, Mendelay, and some of us scientists
  35. 35. Librarians can help researchers understand:  How to be critical of data and where it came from  Data provenance and meeting data standards  That there is a need to reinterpret data when new information comes to light  That reproducibility depends on many things, including very basic things  Why both retrospective and prospective efforts are needed to ensure data quality, consistency, and utility
  36. 36. Amanda’s dissertation The spectral backscattering properties of marine particles Observations ship-based sampling & moored instruments Simulation results scattering & absorption of light Experimental optical properties of phytoplankton cultures Derived variables endless things Compiled observations global oceanic bio- optical observations [self + from peers] Reference global oceanic bio- optical observations [NASA]
  37. 37. Why libraries? OSU Libraries Digital Collections | http://oregondigital.org/u?/archives,31
  38. 38. image: http://www.beautiful-libraries.com/7200-1.html
  39. 39. Agricultural Sciences Engineering Education Business Liberal Arts Public Health & Human Sciences Veterinary Medicine Science Pharmacy Forestry Earth, Ocean & Atmospheric Sci. Libraries
  40. 40. Libraries
  41. 41. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/whitepapers/Tenopir_Birch_Allard.pdf “Only a small minority of academic libraries in the United States and Canada currently offer research data services (RDS), but a quarter to a third of all academic libraries are planning to offer some services within the next two years.” “Few academic libraries are responsible for developing research data policies. Being able to serve as a clearinghouse of ideas and to provide expertise to build these policies is an opportunity for libraries to be members of the knowledge creation process.” “Reassigning existing library staff is the most common tactic for offering RDS.”
  42. 42. Our experiences http://clubads.com/photos/custom/fish-OutOfWAter.jpg
  43. 43. Timeline of data services at OSU UL & library admin. recognize need for role of RDS on campus that requires a dedicated FTE late 2011 Sept. 2012 Data Management Specialist starts Oct. 2013 Data survey launches Strategic Agenda in place* Jan. 2013 GRAD 521 launches Jan. 2014 *Sutton, Shan; Barber, David; Whitmire, Amanda L. (2013): Oregon State University Libraries and Press Strategic Agenda for Research Data Services. Oregon State University Libraries. http://hdl.handle.net/1957/38794. ESI
  44. 44. OSU Data stewardship survey Interview by Sarah Abraham from The Noun Project
  45. 45. Responses to the question, “Please indicate whether or not you generate each of the following data format(s) as a part of your research process. Select Yes or No for each.” Color scale indicates what percentage of respondents in each college or unit selected ‘Yes’ for each data type. The number in each tile shows the number of faculty responses for that data type and college/unit.
  46. 46. Scope of Data Services at OSU
  47. 47. Research Analysis of data management plans as a means to inform and empower academic librarians in providing research data support. National Leadership Grant LG-07-13-0328, Oct 2014 – Sept 2015 Data management plans As a Research Tool The DART Project
  48. 48. Consultations
  49. 49. Teaching: GRAD 521 Logistical Details • http://bit.ly/GRAD521 • All course materials on figshare • 2 credits • Discipline-agnostic • Offered annually, winter quarter Topics covered • Overview of RDM • Types, formats & stages of data • RDM planning • Storage, backup & security • Documentation & metadata • Legal & ethical considerations • Sharing & reuse • Archive and preservation
  50. 50. Timeline of data activities at OHSU OHSU library awarded eagle-i late 2009 Sept. 2012 Monarch Initiative awarded Oct. 2013 Data survey launches Beyond the PDF 1K challenge award April 2013 OHSU hiring CRIO position Now ESI NIH BD2K program
  51. 51. OHSU Data stewardship survey Interview by Sarah Abraham from The Noun Project
  52. 52. 0% 10% 20% 30% 40% 50% 60% Specific Uniform Resource Identifier (URI) or other URL where data is held Contact information of the data steward Reference to a public repository where the data is held Provide supplementary data to the journal SPARQL endpoint and/or Linked Open Data Digital Object Identifier (DOI) I don't know Other (please specify) How do you reference your data when you publish, either in the context of a journal publication, or by direct publication of data sets?
  53. 53. Are there any professional community standards in your research area regarding data management, sharing, storage, archiving, and/or producing metadata or other descriptive information that would apply to your research data? Answer Instructor Assistant Professor, Research Assistant Professor, or Assistant Scientist Associate Professor or Associate Scientist Professor or Senior Scientist Director, Division Head, Department Head PostDoc/ ResAssoc/ PhD Yes 1 9 5 16 6 13 No 1 8 9 15 1 10 I don't know 1 19 13 14 4 19
  54. 54. Scope of Data Services at OHSU Open houses, Lib Guides, NIH proposals to improve data education, hosting fellows New IR, research profiling tools Participation in national efforts: BD2K, Force11, Galaxy, Biocuration Society Data consults, collaborations
  55. 55. Consultations
  56. 56. NIH Big Data to Knowledge Initiative http://bd2k.nih.gov/
  57. 57. 1 | Can facilitate the creation of a smarter body of literature for future research 2 | Train researchers to utilize metadata standards to enable data reuse 3 | Facilitate researchers understanding of available resources Libraries, in summary…
  58. 58. Members from: Oregon Health & Science University Oregon State University University of Oregon University of Idaho University of Washington Portland State University Reed College Join us @ bit.ly/pnwdatalibs Also we need a logo: Free data science training for good suggestions! PNW Research Data Geeks Group http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
  59. 59. How do you think libraries can best facilitate best practices in data management?

×