Incidental Collaboratories For Experimental Data, Or: Why life is so complicated (and what we might be able to do about it) Anita de Waard VP Research Data Collabora?ons, Elsevier RDS Jericho, VT, USA
Outline • Brief bio • The problem: life is complicated • What we can do to understand it • About Elsevier Research Data Services • A pilot project • Some ques?ons.
Brief bio: • Background: – Low-‐temperature physics (Leiden & Moscow) – Joined Elsevier in 1988 as publisher in solid state physics – 1991: ArXiV => publishers will go out of business very soon! • 1997-‐ now: Disrup?ve Technologies Director, focus on beZer representa?on of scien?ﬁc knowledge: – Iden?fying key knowledge elements in ar?cles (linguis?cs thesis) – Building claim-‐evidence networks (through collabora?ons) – Help build communi?es to accelerate rate of change (Force11) • Star?ng 1/1/2013: VP Research Data Collabora?ons -‐ why? – Douglas Engelbart’s thinking: connect minds! – My (non-‐biologists) understanding of biology:
Problem: a rose is not a rose: • “Single specimens of C. ermineus show unchanged injected venom mass spectra and HPLC proﬁles over ?me. However, there was signiﬁcant variability of the injected venom composi?on from specimen to specimen, in spite of their common biogeographic origin.” Jose A. Rivera-‐Or?z, Herminsul Cano, Frank Marí, Intraspecies variability of the injected venom of Conus ermineus, doi:10.1016/j.pep?des.2010.11.014 • “D. desulfuricans CFA proﬁles for all intes?nal strains (group 1) were approximately iden?cal (98.2 to 99.8% similarity). A 92.4% similarity was evaluated in a group 2, containing six soil strains. The members of this group had 87% similarity with the type soil strain. All intes?nal strains and soil strains were similar at the 85.5% level. Strains DV-‐3/84 DV-‐7/84 (group 3) showed 76.6% similarity to each other and were similar to all other strains at the 67.6% level.” Zoﬁa Dzierżewicz et al., Intraspecies variability of Desulfovibrio desulfuricans strains determined by the gene?c proﬁles, FEMS Microbiology LeZers, Volume 219, Issue 1, 14 February 2003, Pages 69–74, doi:10.1016/ S0378-‐1097(02)01199-‐0 => A specimen is not a species!
Problem: gene expression varies with: Age: “SIRT1-‐Associated genes are deregulated in the aged brain” Philipp Oberdoerﬀer et al., SIRT1 RedistribuDon on ChromaDn Promotes Genomic Stability but Alters Gene Expression during Aging, Cell, Volume 135, Issue 5, 28 November 2008, Pages 907–918, doi:10.1016/j.cell.2008.10.025 Smell: “…major urinary proteins […] mediate the pregnancy blocking eﬀects of male urine” P.A. Brennan, et al, PaIerns of expression of the immediate-‐early gene egr-‐1 in the accessory olfactory bulb of female mice exposed to pheromonal consDtuents of male urine, Neuroscience, Volume 90, Issue 4, June 1999, P 1463–1470, doi:10.1016/S0306-‐4522(98)00556-‐9 Hunger: “Out of the ~30K genes, about 10K are diﬀeren?ally expressed in liver cells when an animal is in diﬀerent states of sa?ety.“ Zhang F, Xu X, Zhou B, He Z, Zhai Q (2011) Gene Expression Proﬁle Change and Associated Physiological and Pathological Eﬀects in Mouse Liver Induced by Fas?ng and Refeeding. PLoS ONE 6(11): e27553. doi:10.1371/journal.pone.002755 Light: “Longer-‐term enrichment training also altered the mRNA levels of many genes associated with structural changes that occur during neuronal growth.” CailoZo C., et al. (2009) Eﬀects of Nocturnal Light on (Clock) Gene Expression in Peripheral Organs: A Role for the Autonomic Innerva?on of the Liver. PLoS ONE 4(5): e5650. doi:10.1371/journal.pone.0005650: => Knowing genes is not knowing how they are expressed !
Problem: no man (or mouse) is an island… • “We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specializa?on both within and among individuals.” The Human Microbiome Project Consor?um, Structure, func?on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234 • “Coloniza?on of an infant’s gastrointes?nal tract begins at birth. The acquisi?on and normal development of the neonatal microﬂora is vital for the healthy matura?on of the immune system.” Mackie RI, Sghir A, Gaskins HR., Developmental microbial ecology of the neonatal gastrointes?nal tract. Am J Clin Nutr. 1999 May;69(5):1035S-‐1045S => An animal is an ecosystem!
Problem: system interac?ons create even greater complexity: • Compu?ng cancer: “No amount of informa?on about what happens inside a single cell can ever tell you what a ?ssue is going to do,” [Glazier] says. “Much of the informa?on and complexity of ?ssues and life is embedded in the way cells talk to each other and the extracellular environment.” • Megadata: “These complex emergent systems are impossible to understand,” [Agus] says. “Our level of understanding is just so cursory that we have to start to look for what they call, in physics, coarse-‐grained elements.”,”[we] founded Applied Proteomics to create a protein diagnos?c that reveals not just where a cancer is, but how it interacts with the body” Nature Special Issue Vol. 491 No. 7425 ‘Physical Scien?sts Take On Cancer’ : => The whole is more than the sum of its parts!
Big problem: => A specimen is not a species => Knowing genes is not knowing how they are expressed => An animal is an ecosystem => The whole is more than the sum of its parts LIFE IS COMPLICATED!! hZp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
Sta?s?cs to the rescue! With enough observa?ons, trends and anomalies can be detected: • “Here we present resources from a popula?on of 242 healthy adults sampled at 15 or 18 body sites up to three ?mes, which have generated 5,177 microbial taxonomic proﬁles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.” The Human Microbiome Project Consor?um, Structure, func?on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234 • “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.” Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resolu?on sequencing study emphasizes importance of rare variants in disease. • “A proﬁle unique for a DNA sample source is obtained … a series of numbers are generated which can be used as a bar code for that DNA source. A registry of bar codes would make it easy to compare DNA samples” Roland M. Nardone, Ph.D., Eradica?on of Cross-‐Contaminated Cell Lines: A Call for Ac?on, hZp://www.sivb.org/publicPolicy_Eradica?on.pdf
We need ‘incidental collaboratories’ • Collect: store data at the level of the experiment: – Accessible through a single interface – With enough metadata to know what was done/seen • Connect: allow analyses over: – Similar experiment types – Experiments done with/on similar biological ‘things’: • Species, strains, systems, cells • Anatomical components (e.g. spleen, hypothalamus) • An?bodies, biomarkers, bioac?ve chemicals, etc
Problem: biological research is quite insular: • Biology is small: because objects/ equipment are 10^-‐5 – 10^2 m, you can work alone (‘King’ and ‘subjects’). Prepare • Biology is messy: it doesn’t happen behind a terminal. Ponder Observe • Biology is compe??ve: diﬀerent Communicate people with similar skill sets, vying for the same grants. Analyze • In summary: it does not promote inherent collabora?on (vs., for instance, big physics or astronomy).
We need to pop the lab bubble! Prepare Observa?ons Labs go from being Analyze Communicate Think Observa?ons informa?on islands, to being ‘sensors in a Observa?ons network’. Prepare Prepare Analyze Communicate Analyze Communicate
Some objec?ons, and rebuZals: Objec&on: Rebu-al: “But our lab notebooks are all on Develop smart phone/tablet apps for data paper” input “I need to see a direct beneﬁt from Develop ‘data manipula?on dashboard’ something I spend my ?me on” for PI to allow beZer access to full experimental output for his/her lab “I am afraid other people might Develop intra-‐lab data communica?on scoop my discoveries” systems ﬁrst and allow ?med/granular data export “I want things to be peer reviewed Allow reviewers access to experimental before I expose them” database before publica?on (of data or paper) “I don’t really trust anyone else’s Add a social networking component to data – well, except for the guys I this data repository so you know who (to went to Grad School with…” the individual) created that data point.
Elsevier Research Data Services: Goals 1. Help add more data into (exis?ng, open) data repositories: more data in, annotated, available 2. Make them more interoperable: work towards collaboratory model by connec?ng databases 3. Find ways to make them sustainable, e.g.: – Service-‐level agreements: to funders/ins?tutes – With Lab notebook: subscrip?ons to projects – Back-‐end analy?cs: to companies
RDS Guiding Principles: • In principle, all open data stays open and URLs, front end etc. stay where they are (i.e. with repository) • Collabora?on is tailored to data repositories’ unique needs/interests and of a ‘service-‐model’ type: – Aspects where collabora?on is needed are discussed – A collabora?on plan is drawn up using a Service-‐Level Agreement: agree on ?me, condi?ons, etc. – All communica?on, ﬁnance, IPR etc. is completely transparent at all ?mes. • Very small (2/3 people) department; immediate communica?on; instant deployment of ideas
RDS Approach: • Collaborate and build on rela?onships with data repositories • Integrate with other content sources, if possible • Build annota?on and standardisa?on tools and processes to implement this • Develop next-‐genera?on infrastructure solu?ons for back-‐end integra?on • Explore crea?ve revenue opportuni?es
NIF An?body Registry: Problem: • 95 an?bodies were iden?ﬁed in 8 papers • 52 did not contain enough informa?on to determine the an?body used • Some provided details in another paper • Failed to give species, vendor, catalog # Solu?on # 1: • Journals ask authors to provide an?body catalog nr • Link to NIF Registry from manufacturers/ vendors’ sites Solu?on #2: • Pilot with a lab:
Let’s start with the Urban Lab • Ge•ng an?bodies • And messy bits • From the notebook • Into Nathan Urban’s command center • By providing – 7” Tablets – Links to IgorPro – A dashboard UI
My ques?ons to you: • Thoughts on this approach: – In principle? – In prac?ce? • Do you see serious hurdles: – Are we overlapping with other ini?a?ves; if so, are we complementary? – How does this connect to libraries/local repositories? – Are there sensi?vi?es/pain points we are overlooking? • Where to start: – Is an?bodies ok? – Is a neuroscience lab ok? – Thoughts on data repositories/pla‚orms to connect to?
Your ques?ons to me? email@example.com hZp://elsatglabs.com/labs/anita/ hZp://www.slideshare.net/anitawaard Thanks go to: • Anita Bandrowski and Maryann Martone, NIF • Nathan Urban, Shreejoy Tripathy, CMU • David Marques, SVP RDS