Towards Incidental Collaboratories; Research Data Services
Research Data Services: Towards a Framework for Incidental Collaboratories Anita de Waard VP Research Data Collabora@ons, Elsevier RDS Jericho, VT, USA
Brief bio: • Background: – Low-‐temperature physics (Leiden & Moscow) – Joined Elsevier in 1988 as publisher in solid state physics – 1991: ArXiV => publishers will go out of business very soon! • 1997-‐ now: Disrup@ve Technologies Director, focus on beXer representa@on of scien@ﬁc knowledge: – Iden@fying key knowledge elements in ar@cles (linguis@cs thesis) – Building claim-‐evidence networks (through collabora@ons) – Help build communi@es to accelerate rate of change (Force11) • Star@ng 1/1/2013: VP Research Data Collabora@ons -‐ why? – Douglas Engelbart’s thinking: connect minds! – My (non-‐biologists) understanding of biology:
The big problem in biology: Interspecies variability: A specimen is not a species Gene expression variability: Knowing genes is not knowing how they are expressed Microbiome: An animal is an ecosystem Systems biology: A whole is more than the sum of its parts Reduc@onist science doesn’t work for living systems! hXp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
Sta@s@cs to the rescue! With enough observa@ons, trends and anomalies can be detected: • “Here we present resources from a popula@on of 242 healthy adults sampled at 15 or 18 body sites up to three @mes, which have generated 5,177 microbial taxonomic proﬁles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.” The Human Microbiome Project Consor@um, Structure, func@on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234 • “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.” Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resolu@on sequencing study emphasizes importance of rare variants in disease. • “A proﬁle unique for a DNA sample source is obtained … a series of numbers are generated which can be used as a bar code for that DNA source. A registry of bar codes would make it easy to compare DNA samples” Roland M. Nardone, Ph.D., Eradica@on of Cross-‐Contaminated Cell Lines: A Call for Ac@on, hXp://www.sivb.org/publicPolicy_Eradica@on.pdf
Enable ‘incidental collaboratories’: • Collect: store data at the level of the experiment: – Accessible through a single interface – Add enough metadata to know what was done/seen • Connect: allow analyses over: – Similar experiment types – Experiments done with/on similar biological ‘things’: • Species, strains, systems, cells • Anatomical components (e.g. spleen, hypothalamus) • An@bodies, biomarkers, bioac@ve chemicals, etc • Keep: – Long-‐term preserva@on of data and sosware (Olive) – Fulﬁll Data Management Plan requirements – Allow gated access, if needed
Problem: biological research is quite insular • Biology is small: because objects/ equipment are 10^-‐5 – 10^2 m, you can work alone (‘King’ and ‘subjects’). Prepare • Biology is messy: it doesn’t happen behind a terminal. Ponder Observe • Biology is compe@@ve: diﬀerent Communicate people with similar skill sets, vying for the same grants. Analyze • In summary: it does not promote inherent collabora@on (vs., for instance, big physics or astronomy).
Try to pop the ‘lab bubble’! Prepare Observa@ons Labs go from being Analyze Communicate Think Observa@ons informa@on islands, to being ‘sensors in a Observa@ons network’. Prepare Prepare Analyze Communicate Analyze Communicate
Some objec@ons, and rebuXals: Objec&on: Rebu-al: “But our lab notebooks are all on Develop smart phone/tablet apps for data paper” input “I need to see a direct beneﬁt from Develop ‘data manipula@on dashboard’ something I spend my @me on” for PI to allow beXer access to full experimental output for his/her lab “I am afraid other people might Develop intra-‐lab data communica@on scoop my discoveries” systems ﬁrst and allow @med/granular data export “I want things to be peer reviewed Allow reviewers access to experimental before I expose them” database before publica@on (of data or paper) “I don’t really trust anyone else’s Add a social networking component to data – well, except for the guys I this data repository so you know who (to went to Grad School with…” the individual) created that data point.
Elsevier Research Data Services: Goals 1. Help increase the amount of data shared from the lab, enabling incidental collaboratories 2. Help increase the value of the data shared by increasing annota@on, normaliza@on, provenance enabling enhanced interoperability 3. Help measure and deliver credit for shared data, the researchers, the ins@tute, and the funding body, enabling more sustainable pla;orms
RDS Guiding Principles: • In principle, all open data stays open and URLs, front end etc. stay where they are (i.e. with repository) • Collabora@on is tailored to data repositories’ unique needs/interests and of a ‘service-‐model’ type: – Aspects where collabora@on is needed are discussed – A collabora@on plan is drawn up using a Service-‐Level Agreement: agree on @me, condi@ons, etc. – All communica@on, ﬁnance, IPR etc. is completely transparent at all @mes. • Very small (2/3 people) department; immediate communica@on; instant deployment of ideas
RDS Approach: • Collaborate and build on rela@onships with data repositories (life science, earth science, others) • Integrate with other content sources, if possible • Build annota@on and standardisa@on tools and processes to implement this • Develop next-‐genera@on infrastructure solu@ons for back-‐end integra@on • Explore crea@ve revenue opportuni@es
NIF An@body Registry: Problem: • 95 an@bodies were iden@ﬁed in 8 papers • 52 did not contain enough informa@on to determine the an@body used • Some provided details in another paper • Failed to give species, vendor, catalog # Solu@on # 1: • Journals ask authors to provide an@body catalog nr • Link to NIF Registry from manufacturers/ vendors’ sites Solu@on #2: • Pilot with a lab:
Let’s start with the Urban Lab • Geyng an@bodies • And messy bits • From the notebook • Into Nathan Urban’s command center • By providing – 7” Tablets – Links to IgorPro – A dashboard UI
My ques@ons to you: • Thoughts on this approach: – In principle? – In prac@ce? • Do you see serious hurdles: – Are we overlapping with other ini@a@ves; if so, are we complementary? – How does this connect to libraries/local repositories? – Are there sensi@vi@es/pain points we are overlooking? • Where to start: – How to collaborate? – Who to talk to – funding agencies, socie@es: who else? – Thoughts on data repositories/plazorms to connect to?
Your ques@ons to me? firstname.lastname@example.org hXp://elsatglabs.com/labs/anita/ hXp://www.slideshare.net/anitawaard Thanks go to: • Anita Bandrowski and Maryann Martone, NIF • Nathan Urban, Shreejoy Tripathy, CMU • David Marques, SVP RDS