This document discusses the need for open science and data sharing to advance genomics research and drug development. It outlines the vision and mission of Sage Bionetworks to create a "Commons" where integrative models of disease are built and evolved through data sharing and collaborative research. Some key barriers to open science are addressed, such as standards, tools, recognition of contributors, and changing incentives around data hoarding. The opportunity of networks to identify disease drivers and therapies from large genomic datasets is highlighted.
18. Opportunity The stunning technologies coming will generate heaps of genomic data Bionetworks using integrative genomic approaches can highlight the non-redundant components- can find drivers of the disease and of therapies Need to develop ways to host massive amounts of data, evolving representations of disease as represented by these probabilistic causal disease models
19. Drivers Recognition that the benefits of bionetwork based molecular models of diseases are powerful but that they require significant resources Appreciation that it will require decades of evolving representations as real complexity emerges and needs to be integrated with therapeutic interventions Realizing the donation by Merck might seed a “commons” allowing a potential long term gain to the whole community provided by evolving models of disease built via a contributor network
20. Mission 14 Sage Bionetworks is a non-profit organization with a vision to create a “Commons” where integrative bionetworks are evolved by contributor scientists with a shared vision to accelerate the elimination of human disease
21. Sage Bionetworks:a busy first year $5m LSDF Grant Partnership with Merck 14 Staff move into Sage Offices at FHCRC 1st Sage Commons Congress in SF $8m NCI grant for new CCSB Partnership with Pfizer 2009 2010 First Board of Directors Meeting 501(c)(3) determination Catalyst Funding from Listwin, CHDI and Quintiles NIH New Institution Review First NIH grant payment
24. Global Coherent Data Sets A data set containing genome-wide DNA variation and intermediate trait, as well as physiological phenotype data across a population of individuals large enough to power association or linkage studies, typically 50 or more individuals. To be coherent, the data needs to be matched with consistent identifiers. Intermediate traits are typically gene expression, but may also include proteomic, metabolomic, and other molecular data. GCDs are current state of knowledge and subject to change as more information becomes available to Sage
27. Barriers: Designing a simple-to-use model for uploading and processing data Data interoperability Data standartization, Data Quality consistent data format and metadata Tools and standards: allow the reosuce to gown and evolve, capture metadata in a standardized way and quality measures and quality control The Commons will need to resolve issues surrounding protection of human subjects data if the information is to be widely shared. IRB and protection of human subjects platform independence Visualization tools building the critical mass of contributors legal/licensing framework enormous curation effort needed to correct for incompatible study designs, incomplete data gathering Ability to capture structured content
34. Need for multi-layer mega datasets and the vanishing ‘price’ for genes provides incentive for pre-competitive space for genomics
35. Incentives: Sociology and policy. Getting people to share and building trust. IMHO, the central challenge will be community adoption. This is a social (political) experiment/ entreprise as much as a scientific challenge. How to motivate individuals not community inclined might be key. Researcher "Turf" /lack of experience sharing politic: competitive funding versus communal goal Willingness by the community to share data and key ancillary information (e.g. pathology/clinical data for profiled samples) We need a team that will take the time to make sure we create a set of tools that can interoperate , rather than a set of tools that perform discrete independent tasks. Changing culture of individual recognition, publication, rewards, incentives Business case for contributing and sharing resources and information is unclear to many, while business case for hoarding them is well articulated and obvious. Buy-in from tool developers, data producers and data users The theory is great, the practice needs commitment from a wide variety of players
36. The Federation Experiment Stephen FriendSage Bionetworks Andrea CalifanoColumbia U. Eric SchadtPacBio - UCSF Atul ButteStanford Med Trey IdekerUCSD
37. Sage Bionetworks Focused on improving treatment of disease Working through extensive partnerships to enable research and drug development Cultural challenges may eclipse technical and operational hurtles www.sagebase.org
Editor's Notes
This is us (data on 230 attendees from ~100 survey respondents) [[with black background]]
Funny thing about collaborations… “IT Experts” claim to collaborate with everyone, but only 3 other groups acknowledge collaborating with “IT Experts”[[No return arrows from “Funders”, “Media” and “Policy”]]