Towards reusable experiments: making metadata while you measure

489 views

Published on

Slides from my short talk at INCF 2013 (neuroinformatics annual meeting) in Stockholm. I talk about realities of data sharing and a proposal to make it easier through use and adoption of electronic lab notebooks. Project a collaboration between carnegie mellon university and elsevier research data services.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Towards reusable experiments: making metadata while you measure

  1. 1. Towards Reusable Experiments: Making Metadata While You Measure Shreejoy Tripathy PhD student, Carnegie Mellon Email: stripat3@gmail.com Twitter: @neuronJoy
  2. 2. Lots of great tools for data sharing…
  3. 3. Barriers to data sharing • Social – “What’s in it for me? How will I get credit?” – “It’s my data, not yours” – “The benefit to me isn’t worth the time I put into it” – “What if I get scooped?” • Methodological – “How do I share data? What do I share?” – “Going back and annotating my files to share is super- time consuming” – Specifying file formats, data standards – Building FTP servers and nice user interfaces
  4. 4. Project idea • How can we make a standard neuroscience wet lab more data-sharing savvy? • Incorporate structured workflows into the daily practice of a typical electrophysiology lab (the Urban Lab at CMU) – What does it take? – Where are points of conflict?
  5. 5. Key insights/motivations 1. Effective data sharing includes raw data files + experimental metadata (typically stored in a lab notebook) SDB_MC_12_voltages.mat
  6. 6. Key insights/motivations 1. Share raw data files + experimental metadata 2. You know the most about an experiment when you’re performing it
  7. 7. Key insights/motivations 1. Share raw data files + experimental metadata 2. You know the most about an experiment when you’re performing it 3. Improved data practices should make labs more productive
  8. 8. Project schematic
  9. 9. Project schematic
  10. 10. Metadata data app • Electronic lab notebook models sequential slice- electrophysiology workflow – Replaces pen-and- paper lab notebook
  11. 11. Metadata data entry • Electronic lab notebook allows structured data entry Animal Strain
  12. 12. Metadata data entry • Electronic lab notebook allows structured data entry (i.e., dropdown menus) – Allows incorporation of semantic ontologies • Important to strike a balance between structure and flexibility MGI:3719486
  13. 13. Metadata data entry MGI:3719486 • Electronic lab notebook facilitates entry of new content, like registration of recorded neurons to brain atlas
  14. 14. Data integration • Syncing of metadata app and electrophysiology data acquisition via server – Each trace of experimental data annotated with metadata • IGOR-Pro specific, support pClamp, other acquisition packages as needed later
  15. 15. Data dashboard (web-based)
  16. 16. Data dashboard (future-steps) • Use collected metadata to sort experiments – Like mouse strain, neuron type, animal age • Enable in-browser analyses – Track provenance of analyzed data back to raw data
  17. 17. Next steps • Use built tools – Populate data server with many experiments • Is use of e-notebook too prohibitive? – If yes, continue to iterate – What can we ask now that we couldn’t before? • It is much easier to ask exploratory questions, like – How is the cell type that Shawn records different from the one that Matt records? • Exposing data to neuroscience databases – NIF, INCF Dataspace, neuroelectro.org • How adaptable are these solutions for use in other labs? • Who is going to pay for this?
  18. 18. Acknowledgements • Carnegie Mellon – Shreejoy Tripathy – Nathan Urban – Shawn Burton – Rick Gerkin – Santosh Chandrasekaran – Matthew Geramita • Elsevier Research Data Services – Anita de Waard – Mark Harviston – Jez Alder – Sarah Tyrchniewicz – David Marques – (funding!)
  19. 19. Next steps • Roll out updated app to experimentalists • Populate database with the contents of many experiments • Flesh out Data dashboard functionality • Investigate the new things that we can achieve given these tools
  20. 20. Effective data sharing is… • Not just experimental data file – But also the experimental metadata: what was done? What does this variable mean? This is usually stored in PHYSICAL lab notebooks, understandable by only the experimenter • Effective data sharing – someone who is not the person who collected the data can understand the experiment and data
  21. 21. App user testing • “I don’t like the way the app forces me through a specific workflow, I want to enter experimental data when I see fit” • “I’m not opposed to the idea of dropdowns, but I want more flexibility, more text fields” • “When I use a lab notebook, I only write down the absolute minimum. Can the app’s fields be prepolated with the results of an old experiment?”
  22. 22. What is effective data sharing? • Effective data sharing – someone who is not the person who collected the data can understand the experiment and data – i.e., datasets should be more or less self- describing – >90% of data sharing use cases are an experimentalist sharing data with a future version of herself or with a labmate
  23. 23. Neuroinformatics successes don’t come from large-scale multi-lab data sharing • NeuroSynth • NeuroElectro?

×