Towards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental Data Thanks: Maryann Martone, Anita Bandrowski, Anita de Waard NIF, UCSD VP Research Data Collabora>ons Nathan Urban, Shreejoy Thripathy, CMU Elsevier RDS, Jericho, VT, USA Ed Hovy, Gully Burns, ISI/CMU; Phil Bourne, UCSD
Problem: a rose is not a rose: • “…there was signiﬁcant variability of the injected venom composi>on from specimen to specimen, in spite of their common biogeographic origin.” Jose A. Rivera-‐Or>z, Herminsul Cano, Frank Marí, Intraspecies variability of the injected venom of Conus ermineus, doi:10.1016/j.pep>des.2010.11.014 • “…Strains DV-‐3/84 DV-‐7/84 (group 3) showed 76.6% similarity to each other and were similar to all other strains at the 67.6% level.” Zoﬁa Dzierżewicz et al., Intraspecies variability of Desulfovibrio desulfuricans strains determined by the gene>c proﬁles, FEMS Microbiology Leeers, Volume 219, Issue 1, 14 February 2003, Pages 69–74, doi:10.1016/ S0378-‐1097(02)01199-‐0 => A specimen is not a species!
Problem: gene expression varies with: Age: “SIRT1-‐Associated genes are deregulated in the aged brain” Philipp Oberdoerﬀer et al., SIRT1 RedistribuJon on ChromaJn Promotes Genomic Stability but Alters Gene Expression during Aging, Cell, Volume 135, Issue 5, 28 November 2008, Pages 907–918, doi:10.1016/j.cell.2008.10.025 Smell: “…major urinary proteins […] mediate the pregnancy blocking eﬀects of male urine” P.A. Brennan, et al, PaOerns of expression of the immediate-‐early gene egr-‐1 in the accessory olfactory bulb of female mice exposed to pheromonal consJtuents of male urine, Neuroscience, Volume 90, Issue 4, June 1999, P 1463–1470, doi:10.1016/S0306-‐4522(98)00556-‐9 Hunger: “Out of the ~30K genes, about 10K are diﬀeren>ally expressed in liver cells when an animal is in diﬀerent states of sa>ety.“ Zhang F, Xu X, Zhou B, He Z, Zhai Q (2011) Gene Expression Proﬁle Change and Associated Physiological and Pathological Eﬀects in Mouse Liver Induced by Fas>ng and Refeeding. PLoS ONE 6(11): e27553. doi:10.1371/journal.pone.002755 Light: “Longer-‐term enrichment training also altered the mRNA levels of many genes associated with structural changes that occur during neuronal growth.” Cailoeo C., et al. (2009) Eﬀects of Nocturnal Light on (Clock) Gene Expression in Peripheral Organs: A Role for the Autonomic Innerva>on of the Liver. PLoS ONE 4(5): e5650. doi:10.1371/journal.pone.0005650: => Knowing genes is not knowing how they are expressed!
Problem: no man (or mouse) is an island… • “We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specializa>on both within and among individuals.” The Human Microbiome Project Consor>um, Structure, func>on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234 • “Coloniza>on of an infant’s gastrointes>nal tract begins at birth. The acquisi>on and normal development of the neonatal microﬂora is vital for the healthy matura>on of the immune system.” Mackie RI, Sghir A, Gaskins HR., Developmental microbial ecology of the neonatal gastrointes>nal tract. Am J Clin Nutr. 1999 May;69(5):1035S-‐1045S => An animal is an ecosystem!
Interac>ons create more complexity: • Compu>ng cancer: “No amount of informa,on about what happens inside a single cell can ever tell you what a ,ssue is going to do,” [Glazier] said. “Much of the informa>on and complexity of >ssues and life is embedded in the way cells talk to each other and the extracellular environment.” • Megadata:“These complex emergent systems are impossible to understand,”,”[we] founded Applied Proteomics to create a protein diagnos>c that reveals not just where a cancer is, but how it interacts with the body..” Nature Special Issue Vol. 491 No. 7425 ‘Physical Scien>sts Take On Cancer’ : => The whole is more than the sum of its parts!
Big problems in biology: • Interspecies variability > A specimen is not a species! • Gene expression variability > Knowing genes is not knowing how they are expressed! • Microbiome > An animal is an ecosystem! • Systems biology > Whole is more than the sum of its parts! • Models vs. experiment > Are we talking about the same things? In a way we can all use? • Dynamics > Life is not in equilibrium! Life is complicated! Reduc>onism doesn’t work for living systems. hep://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
Sta>s>cs to the rescue! With enough observa>ons, trends and anomalies can be detected: • “Here we present resources from a popula>on of 242 healthy adults sampled at 15 or 18 body sites up to three >mes, which have generated 5,177 microbial taxonomic proﬁles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far.” The Human Microbiome Project Consor>um, Structure, func>on and diversity of the healthy human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/ nature11234 • “The large sample size — 4,298 North Americans of European descent and 2,217 African Americans — has enabled the researchers to mine down into the human genome.” Nidhi Subbaraman, Nature News, 28 November 2012, High-‐resolu>on sequencing study emphasizes importance of rare variants in disease.
Enable ‘incidental collaboratories’: • Collect: store data at the level of the experiment: – Accessible through a single interface – Add enough metadata to know what was done/seen • Connect: allow analyses over: – Similar experiment types – Experiments done with/on similar biological ‘things’ (species, strains, systems, cells etc.) – In a way that can be used by modelers! • Keep: – Long-‐term preserva>on of data and so}ware – Fulﬁll Data Management Plan requirements – Allow ‘gated’ access when and to whom researcher wants
Problem: biological research is quite insular • Biology is small: size 10^-‐5 – 10^2 m, scien>st can work alone (‘King’ and ‘subjects’). • Biology is messy: it doesn’t happen Prepare behind a terminal. • Biology is compe>>ve: many Ponder Observe people with similar skill sets, Communicate vying for the same grants Analyze • In summary: the structure of biological research does not inherently promote collabora>on (vs., for instance, big physics or astronomy).
Let’s look at a typical lab: • How to get the right an>body IDs • And messy bits • From the lab notebook • Into the PI’s command center?
Objec>ons and rebueals re. data sharing Objec,on: Rebu=al: “But our lab notebooks are all on Develop smart phone/tablet apps for data paper” input “I need to see a direct beneﬁt from Develop ‘data manipula,on dashboard’ for something I spend my >me on” PI to allow beeer access to full experimental output for his/her lab “I want things to be peer reviewed Allow reviewers access to experimental before I expose them” database before publica>on (of data or paper) “I don’t really trust anyone else’s Add a social networking component to this data – well, except for the guys I data repository so you know who (to the went to Grad School with…” individual) created that data point. “I am afraid other people => Reward system moves from a might scoop my discoveries” compe,,on to a ‘shared mission’
Data sharing enables collaboratories: Labs go from being Prepare informa>on islands to being ‘sensors in a network’ Observa>ons ‘Conglomera>on of Analyze Communicate Observa>ons evidence’ can happen Allow place to share Think Observa>ons nega>ve data – reproducing experiments. Prepare Prepare Analyze Communicate Analyze Communicate
So we can do joint experiments: Across labs, experiments: track reagents and how they are used Observa>ons Observa>ons Observa>ons Prepare Prepare Analyze Communicate Analyze Communicate
So we can do joint experiments: Compare outcome of interac>ons with these en>>es Observa>ons Observa>ons Observa>ons Prepare Prepare Analyze Communicate Analyze Communicate
So we can do joint experiments: Build a ‘virtual reagent spectrogram’ by comparing how diﬀerent en>>es Observa>ons interacted in diﬀerent experiments Observa>ons Observa>ons Prepare Prepare Analyze Communicate Analyze Communicate
A single environment to perform, store, share and report on experiments: metadata 1. Store metadata on all materials metadata metadata 2. Track the methods while doing them 3. Write papers that ‘wrap around’ this metadata 4. Don’t ‘send’ your papers – just metadata expose them to the outside world 5. Invite reviews; open data to trusted par>es, at trusted >me Rats were subjected to two 6. Allow apps/tools to integrate grueling tests (click on ﬁg 2 to see underlying data). These results suggest that the neurological pain pro-‐ Calculate, coordinate… Review Revise Compile, comment, Edit compare…
Elsevier Research Data Services: 1. Help increase the amount of data shared from the lab, enabling incidental collaboratories 2. Help increase the value of the data shared by increasing annota>on, normaliza>on, provenance enabling enhanced interoperability 3. Help measure and deliver credit for shared data, the researchers, the ins>tute, and the funding body, enabling more sustainable pla€orms
Plans with CMU/Neuroelectro.org: • Do a pilot in Q3 2013, using: – 7” Tablets for data input – Can we link to barcodes for AB-‐s, scan on tablet (so we can include the batch’s provenance?) – Links to local so}ware to connect to runs – Dashboard for the PI to keep track/play with experiments – Gated exports to • Neuroelectro.org • NIF – Address NSF Data Management Plan requirements?
In summary: • Life is complicated! • We need to connect experiments • To do so, overcome technical barriers and social barriers (more diﬃcult) • Maybe 3D VC can help deﬁne a common mission? firstname.lastname@example.org