The document discusses using "data intensive science" and building better disease maps through comprehensive monitoring of disease and molecular traits in large populations. It describes constructing co-expression networks from gene expression measures across hundreds of samples to identify modules of genes that interact. Preliminary probabilistic models have been built using these networks to directly identify genes that are causal for disease.
Virscidian Poster Asms2010 Final Version LetterMark Bayliss
ASMS 2010 Poster - Mark Bayliss, Virscidian Inc - Towards automated evaluation of result accuracy for LC/MS/UV/ELSD/CLND substance screening – supporting Library Management and Medicinal Chemistry
Virscidian Poster Asms2010 Final Version LetterMark Bayliss
ASMS 2010 Poster - Mark Bayliss, Virscidian Inc - Towards automated evaluation of result accuracy for LC/MS/UV/ELSD/CLND substance screening – supporting Library Management and Medicinal Chemistry
Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry
This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
Substructrual surrogates for learning decomposable classification problems: i...kknsastry
This paper presents a learning methodology based on a substructural classification model to solve decomposable classification problems. The proposed method consists of three important components: (1) a structural model that represents salient interactions between attributes for a given data, (2) a surrogate model which provides a functional approximation of the output as a function of attributes, and (3) a classification model which predicts the class for new inputs. The structural model is used to infer the functional form of the surrogate and its coefficients are estimated using linear regression methods. The classification model uses a maximally-accurate, least-complex surrogate to predict the output for given inputs. The structural model that yields an optimal classification model is searched using an iterative greedy search heuristic. Results show that the proposed method successfully detects the interacting variables in hierarchical problems, group them in linkages groups, and build maximally accurate classification models. The initial results on non-trivial hierarchical test problems indicate that the proposed method holds promise and have also shed light on several improvements to enhance the capabilities of the proposed method.
Hinf6210 Project Classification Of Breast Cancer Dataset Abel Gebreyesus
Breast cancer treatment is one of the medical mysteries, yet unresolved challenge for medical practitioners. The key for better treatment is early diagnosis and treatment. However, even after early diagnosis and treatment, there is high chance of recurrence. By making early prognosis, thus, patients can get better treatment. Data mining, as a knowledge mining field, can contribute on better prognosis with better accuracy rate of prediction. In this report, working on WEKA software, we are trying to show on how to get a decision tree with better accuracy rate. Dealing with the Wisconsin Breast Cancer Database, collected by Dr. William H. Wolberg, University of Wisconsin Hospitals, we will discuss on how we a decision tree data mining technique gives better prediction tool.
*Watch the video at the end of the presentation
Seminar led by Dr. Xavier de la Cruz, ICREA Research Professor. Head of the Translational Bioinformatics in Neuroscience group of VHIR, at VHIR (22nd November 2012).
Content: The need to identify the pathological character of mutations may arise in different contexts in biomedical research. However, the methods available to address this problem essentially depend on the number of cases under analysis. When we work with only a few mutations we can use an artisan-like approach, where all information available on protein sequence, structure and function is manually retrieved and studied. However, when we need to characterize many variants, as can be the case in exome projects, faster methods are required to assess their pathogenicity. In my talk I will illustrate the principles underlying these two approaches with examples from the study of Fabry disease mutations, resulting from our collaborative work at the VHIR.
Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry
This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
Substructrual surrogates for learning decomposable classification problems: i...kknsastry
This paper presents a learning methodology based on a substructural classification model to solve decomposable classification problems. The proposed method consists of three important components: (1) a structural model that represents salient interactions between attributes for a given data, (2) a surrogate model which provides a functional approximation of the output as a function of attributes, and (3) a classification model which predicts the class for new inputs. The structural model is used to infer the functional form of the surrogate and its coefficients are estimated using linear regression methods. The classification model uses a maximally-accurate, least-complex surrogate to predict the output for given inputs. The structural model that yields an optimal classification model is searched using an iterative greedy search heuristic. Results show that the proposed method successfully detects the interacting variables in hierarchical problems, group them in linkages groups, and build maximally accurate classification models. The initial results on non-trivial hierarchical test problems indicate that the proposed method holds promise and have also shed light on several improvements to enhance the capabilities of the proposed method.
Hinf6210 Project Classification Of Breast Cancer Dataset Abel Gebreyesus
Breast cancer treatment is one of the medical mysteries, yet unresolved challenge for medical practitioners. The key for better treatment is early diagnosis and treatment. However, even after early diagnosis and treatment, there is high chance of recurrence. By making early prognosis, thus, patients can get better treatment. Data mining, as a knowledge mining field, can contribute on better prognosis with better accuracy rate of prediction. In this report, working on WEKA software, we are trying to show on how to get a decision tree with better accuracy rate. Dealing with the Wisconsin Breast Cancer Database, collected by Dr. William H. Wolberg, University of Wisconsin Hospitals, we will discuss on how we a decision tree data mining technique gives better prediction tool.
*Watch the video at the end of the presentation
Seminar led by Dr. Xavier de la Cruz, ICREA Research Professor. Head of the Translational Bioinformatics in Neuroscience group of VHIR, at VHIR (22nd November 2012).
Content: The need to identify the pathological character of mutations may arise in different contexts in biomedical research. However, the methods available to address this problem essentially depend on the number of cases under analysis. When we work with only a few mutations we can use an artisan-like approach, where all information available on protein sequence, structure and function is manually retrieved and studied. However, when we need to characterize many variants, as can be the case in exome projects, faster methods are required to assess their pathogenicity. In my talk I will illustrate the principles underlying these two approaches with examples from the study of Fabry disease mutations, resulting from our collaborative work at the VHIR.
An introduction to the species-people correlation, species, people and networks, ramorum leaf blight, sudden oak death, complex networks, network epidemiology, network theory, scale-free degree distribution, epidemic threshold and final size, clustering coefficient, stream macro-invertebrates, Phytophthora ramorum, Sudden Oak Death
Presented during the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'12). Part of the workshop 'New Models and Modes for Data Sharing: Experiences from Neuroscience'. Presented by Jeffrey S. Grethe, Ph.D. from the Center for Research in Biological Systems at the University of California, San Diego.
This workshop featured several large scale efforts to establish data sharing platforms, standards and tools to promote data intensive analysis in the neurosciences. As we head into the second decade of the 21st century, many scientists realize that current methods for publishing and accessing data are outmoded and inefficient. Neuroscience, with its large diverse and highly competitive community, has been slow to adopt more open sharing of data and has lacked effective tools to do so. There has been a significant investment in databases and tools for biological science, and frequent calls for more of them, but few calls to the biological community to adopt practices and frameworks for making their resources more easily discoverable and data more accessible. Data are contained within diverse sources, from web pages, databases, literature to personal lab systems, making for a haphazard mechanism for data and tool discovery. Although these mechanisms are effective for small communities, they are parochial for the totality of resources available, leading to fragmentation in the resource ecosystem. Neuroscience, with its diverse subdisciplines, complex data types and broad domain, presents the perfect exemplar of the current practices, bottlenecks and issues surrounding open access to data. This situation is changing, however, as groups have started to work together to define new models and tools for sharing and analyzing neuroscience data on an international scale. In this workshop, we bring together experts from national and international projects to discuss issues of data access and progress towards establishing platforms and best practices for effective sharing of neuroscience data in support of basic and clinical neuroscience.
BIOBASE, the leader in data annotation and curation for genomics, took part in the Genome Informatics Alliance 2012: Logistics meeting in Oregon, and had an opportunity to present on trends in annotation of genomic data.
Complete Sequencing – Clifford Reid, PhD; CEO, Complete Genomics as presented at the Personalized Health Care Conference at Ohio State. Dr. Reid discussed what complete human sequencing looks like and costs now and in the near future.
Similar to Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15 (20)
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdfAnujkumaranit
Artificial intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. It encompasses tasks such as learning, reasoning, problem-solving, perception, and language understanding. AI technologies are revolutionizing various fields, from healthcare to finance, by enabling machines to perform tasks that typically require human intelligence.
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
- Video recording of this lecture in English language: https://youtu.be/lK81BzxMqdo
- Video recording of this lecture in Arabic language: https://youtu.be/Ve4P0COk9OI
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...i3 Health
i3 Health is pleased to make the speaker slides from this activity available for use as a non-accredited self-study or teaching resource.
This slide deck presented by Dr. Kami Maddocks, Professor-Clinical in the Division of Hematology and
Associate Division Director for Ambulatory Operations
The Ohio State University Comprehensive Cancer Center, will provide insight into new directions in targeted therapeutic approaches for older adults with mantle cell lymphoma.
STATEMENT OF NEED
Mantle cell lymphoma (MCL) is a rare, aggressive B-cell non-Hodgkin lymphoma (NHL) accounting for 5% to 7% of all lymphomas. Its prognosis ranges from indolent disease that does not require treatment for years to very aggressive disease, which is associated with poor survival (Silkenstedt et al, 2021). Typically, MCL is diagnosed at advanced stage and in older patients who cannot tolerate intensive therapy (NCCN, 2022). Although recent advances have slightly increased remission rates, recurrence and relapse remain very common, leading to a median overall survival between 3 and 6 years (LLS, 2021). Though there are several effective options, progress is still needed towards establishing an accepted frontline approach for MCL (Castellino et al, 2022). Treatment selection and management of MCL are complicated by the heterogeneity of prognosis, advanced age and comorbidities of patients, and lack of an established standard approach for treatment, making it vital that clinicians be familiar with the latest research and advances in this area. In this activity chaired by Michael Wang, MD, Professor in the Department of Lymphoma & Myeloma at MD Anderson Cancer Center, expert faculty will discuss prognostic factors informing treatment, the promising results of recent trials in new therapeutic approaches, and the implications of treatment resistance in therapeutic selection for MCL.
Target Audience
Hematology/oncology fellows, attending faculty, and other health care professionals involved in the treatment of patients with mantle cell lymphoma (MCL).
Learning Objectives
1.) Identify clinical and biological prognostic factors that can guide treatment decision making for older adults with MCL
2.) Evaluate emerging data on targeted therapeutic approaches for treatment-naive and relapsed/refractory MCL and their applicability to older adults
3.) Assess mechanisms of resistance to targeted therapies for MCL and their implications for treatment selection
Knee anatomy and clinical tests 2024.pdfvimalpl1234
This includes all relevant anatomy and clinical tests compiled from standard textbooks, Campbell,netter etc..It is comprehensive and best suited for orthopaedicians and orthopaedic residents.
micro teaching on communication m.sc nursing.pdfAnurag Sharma
Microteaching is a unique model of practice teaching. It is a viable instrument for the. desired change in the teaching behavior or the behavior potential which, in specified types of real. classroom situations, tends to facilitate the achievement of specified types of objectives.
Basavarajeeyam is an important text for ayurvedic physician belonging to andhra pradehs. It is a popular compendium in various parts of our country as well as in andhra pradesh. The content of the text was presented in sanskrit and telugu language (Bilingual). One of the most famous book in ayurvedic pharmaceutics and therapeutics. This book contains 25 chapters called as prakaranas. Many rasaoushadis were explained, pioneer of dhatu druti, nadi pareeksha, mutra pareeksha etc. Belongs to the period of 15-16 century. New diseases like upadamsha, phiranga rogas are explained.
MANAGEMENT OF ATRIOVENTRICULAR CONDUCTION BLOCK.pdfJim Jacob Roy
Cardiac conduction defects can occur due to various causes.
Atrioventricular conduction blocks ( AV blocks ) are classified into 3 types.
This document describes the acute management of AV block.
These simplified slides by Dr. Sidra Arshad present an overview of the non-respiratory functions of the respiratory tract.
Learning objectives:
1. Enlist the non-respiratory functions of the respiratory tract
2. Briefly explain how these functions are carried out
3. Discuss the significance of dead space
4. Differentiate between minute ventilation and alveolar ventilation
5. Describe the cough and sneeze reflexes
Study Resources:
1. Chapter 39, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 34, Ganong’s Review of Medical Physiology, 26th edition
3. Chapter 17, Human Physiology by Lauralee Sherwood, 9th edition
4. Non-respiratory functions of the lungs https://academic.oup.com/bjaed/article/13/3/98/278874
Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15
1. Use of Bionetworks to Build Maps of Diseases
Stephen Friend MD PhD
Sage Bionetworks (Non-Profit Organization)
Seattle/ Beijing/ San Francisco
MIPS Seminar Series
August 15th, 2011
2. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
3. Alzheimer’s Diabetes
Treating Symptoms v.s. Modifying Diseases
Cancer Obesity
Will it work for me?
8. WHY NOT USE
“DATA INTENSIVE” SCIENCE
TO BUILD BETTER DISEASE MAPS?
9. “Data Intensive Science”- “Fourth Scientific Paradigm”
For building: “Better Maps of Human Disease”
Equipment capable of generating
massive amounts of data
IT Interoperability
Open Information System
Evolving Models hosted in a
Compute Space- Knowledge Expert
10. It is now possible to carry out comprehensive
monitoring of many traits at the population level
Monitor disease and molecular traits in
populations
Putative causal gene
Disease trait
11. what will it take to understand disease?
DNA RNA PROTEIN (dark matter)
MOVING BEYOND ALTERED COMPONENT LISTS
13. How is genomic data used to understand biology?
RNA amplification
Tumors
Microarray hybirdization
Tumors
Gene Index
!Standard"GWAS Approaches Profiling Approaches
Identifies Causative DNA Variation but Genome scale profiling provide correlates of disease
provides NO mechanism Many examples BUT what is cause and effect?
Provide unbiased view of
molecular physiology as it
relates to disease phenotypes
trait
Insights on mechanism
Provide causal relationships
and allows predictions
Integrated"
! Genetics Approaches
14. Integration of Genotypic, Gene Expression & Trait Data
Schadt et al. Nature Genetics 37: 710 (2005)
Millstein et al. BMC Genetics 10: 23 (2009)
Causal Inference
“Global Coherent Datasets”
• population based
• 100s-1000s individuals
Chen et al. Nature 452:429 (2008) Zhu et al. Cytogenet Genome Res. 105:363 (2004)
Zhang & Horvath. Stat.Appl.Genet.Mol.Biol. 4: article 17 (2005) Zhu et al. PLoS Comput. Biol. 3: e69 (2007)
15. Constructing Co-expression Networks
Start with expression measures for genes most variant genes across 100s ++ samples
1 2 3 4 Note: NOT a gene
expression heatmap
1
1 0.8 0.2 -0.8
Establish a 2D correlation matrix 2
for all gene pairs
expression
0.8 1 0.1 -0.6
3
0.2 0.1 1 -0.1
4
-0.8 -0.6 -0.1 1
Brain sample
Correlation Matrix
Define Threshold
eg >0.6 for edge
1 2 4 3 1 2 3 4
1 1
1 4 1 1 1 0 1 1 0 1
2 2
1 1 1 0 1 1 0 1
1 1 1 0 Hierarchically 3
Identify modules 4 0 0 1 0
2 3 cluster
4
3 0 0 0 1 1 1 0 1
Network Module Clustered Connection Matrix Connection Matrix
sets of genes for which many
pairs interact (relative to the
total number of pairs in that
set)
16. Preliminary Probabalistic Models- Rosetta /Schadt
Networks facilitate direct identification of
genes that are causal for disease
Evolutionarily tolerated weak spots
Gene symbol Gene name Variance of OFPM Mouse Source
explained by gene model
expression*
Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics
Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics
Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg
Mirochnitchenko (University of
Medicine and Dentistry at New
Jersey, NJ) [12]
Lactb Lactamase beta 52% tg Constructed using BAC transgenics
Me1 Malic enzyme 1 52% ko Naturally occurring KO
Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple
(UCLA) [13]
Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg
(Columbia University, NY) [11]
C3ar1 Complement component 46% ko Purchased from Deltagen, CA
3a receptor 1
Tgfbr2 Transforming growth 39% ko Purchased from Deltagen, CA
Nat Genet (2005) 205:370 factor beta receptor 2
17. List of Influential Papers in Network Modeling
50 network papers
http://sagebase.org/research/resources.php
19. Recognition that the benefits of bionetwork based molecular
models of diseases are powerful but that they require
significant resources
Appreciation that it will require decades of evolving
representations as real complexity emerges and needs to be
integrated with therapeutic interventions
20. Sage Mission
Sage Bionetworks is a non-profit organization with a vision to
create a commons where integrative bionetworks are evolved by
contributor scientists with a shared vision to accelerate the
elimination of human disease
Building Disease Maps Data Repository
Commons Pilots Discovery Platform
Sagebase.org
22. Engaging Communities of Interest
NEW MAPS
Disease Map and Tool Users-
( Scientists, Industry, Foundations, Regulators...)
PLATFORM
Sage Platform and Infrastructure Builders-
( Academic Biotech and Industry IT Partners...)
RULES AND GOVERNANCE
Data Sharing Barrier Breakers-
(Patients Advocates, Governance
M
and Policy Makers, Funders...)
APS
FOR
M
NEW TOOLS
PLAT
NEW
Data Tool and Disease Map Generators-
(Global coherent data sets, Cytoscape,
RULES GOVERN Clinical Trialists, Industrial Trialists, CROs…)
PILOTS= PROJECTS FOR COMMONS
Data Sharing Commons Pilots-
(Federation, CCSB, Inspire2Live....)
23. Platform Commons Research
Cancer
Neurological Disease
Metabolic Disease
Curation/Annotation
Building
Data Disease
Repository Maps
CTCAP
Public Data Pfizer
Merck Data Outposts Merck
TCGA/ICGC Federation Takeda
CCSB Astra Zeneca
CHDI
Commons Gates
NIH
Pilots
LSDF-WPP
Inspire2Live
Hosting Data POC
Hosting Tools Bayesian Models
Co-expression Models
Hosting Models
Discovery Tools &
Platform Methods
KDA/GSVA
LSDF
24. Example 1: Breast Cancer
Coexpression Networks
Module combination
Partition BN
Bayesian Network
Survival Analysis
25
Zhang B et al., manuscript
25. Generation of Co-expression & Bayesian Networks from
published Breast Cancer Studies
4 Public Breast Cancer Datasets
NKI: van de Vijver et al. A gene-expression
signature as a predictor of survival in breast
cancer. N Engl J Med. 2002 Dec 19;347
295 samples
(25):1999-2009.
Wang Y et al. Gene-expression profiles to
predict distant metastasis of lymph-node-
negative primary breast cancer. Lancet. 286 samples
2005 Feb 19-25;365(9460):671-9.
Miller: Pawitan Y et al. Gene expression
profiling spares early breast cancer patients
from adjuvant therapy: derived and 159 samples
validated in two population-based cohorts.
Breast Cancer Res. 2005;7(6):R953-64.
Christos: Sotiriou C et al.. Gene
expression profiling in breast cancer:
understanding the molecular basis of 189 samples
histologic grade to improve prognosis. J
Natl Cancer Inst. 2006 Feb 15;98(4):
262-72.
26. Recovery of EGFR and Her2 oncoproteins
downstream pathways by super modules
28. Key Driver Analysis
• Identify key regulators for a list of genes h and a network N
• Check the enrichment of h in the downstream of each node in N
• The nodes significantly enriched for h are the candidate drivers
29
29. A) Cell Cycle (blue) B) Chromatin modification (black)
C) Pre-mRNA Processing (brown) D) mRNA Processing (red)
Global driver
Global driver & RNAi
validation
30
31. Example 2. The Sage Non-Responder Project in Cancer
• To identify Non-Responders to approved drug regimens so
Purpose: we can improve outcomes, spare patients unnecessary
toxicities from treatments that have no benefit to them, and
reduce healthcare costs
Leadership: • Co-Chairs Stephen Friend, Todd Golub, Charles Sawyers &
Rich Schilsky
Initial • AML (at first relapse)
Studies: • Non-Small Cell Lung Cancer
• Ovarian Cancer (at first relapse)
• Breast Cancer
• Renal Cell
• Multiple Myeloma
Sage Bionetworks • Non-Responder Project
32. Bin Zhang
Model of Alzheimer’s Disease Jun Zhu
AD
normal
AD
normal
AD
normal
Cell
cycle
http://sage.fhcrc.org/downloads/downloads.php
33. Anders
New Type II Diabetes Disease Models Rosengren
Global expression data
340 genes in islet-specific
from 64 human islet donors
open chromatin regions
Blue module: 3000 genes
Associated with
Type 2 diabetes
Elevated HbA1c
Reduced insulin secretion
168 overlapping genes, which have
• Higher connectivity
• Markedly stronger association with
• Type 2 diabetes
• Elevated HbA1c
• Reduced insulin secretion
• Enrichment for beta-cell transcription
factors and exocytotic proteins
34. New Type II Diabetes Disease Models Anders
Rosengren
• Search across 1300 datasets in MetaGEO at Sage for similar expression profiles
Top hit: Islet dedifferentiation study where the 168 genes were upregulated in
mature islets and downregulated in dedifferentiated islets (Kutlu et al., Phys Gen 2009)
• Analyses of expression-SNPs and clinical SNPs as well as Causal Inference Test
• Identification of candidate key genes affecting beta-cell differentiation and chromatin
Working hypothesis:
Normal beta-cell: open chromatin in islet-specific regions,
high expression of beta-cell transcription factors,
differentiated beta-cells and normal insulin secretion
Diabetic beta-cell: lower expression of beta-cell transcription
factors affecting the identified module, dedifferentiation,
reduced insulin secretion and hyperglycemia
Next steps: Validation of hypothesis and suggested key genes in human islets
35. Clinical Trial Comparator Arm
Partnership (CTCAP)
Description: Collate, Annotate, Curate and Host Clinical Trial Data
with Genomic Information from the Comparator Arms of Industry and
Foundation Sponsored Clinical Trials: Building a Site for Sharing
Data and Models to evolve better Disease Maps.
Public-Private Partnership of leading pharmaceutical companies,
clinical trial groups and researchers.
Neutral Conveners: Sage Bionetworks and Genetic Alliance
[nonprofits].
Initiative to share existing trial data (molecular and clinical) from
non-proprietary comparator and placebo arms to create powerful
new tool for drug development.
36. Examples: The Sage Federation
• Founding Lab Groups
– Seattle- Sage Bionetworks
– New York- Columbia: Andrea Califano
– Palo Alto- Stanford: Atul Butte
– San Diego- UCSD: Trey Ideker
– San Francisco: UCSF/Sage: Eric Schadt
• Initial Projects
– Aging
– Diabetes
– Warburg
• Goals: Share all datasets, tools, models
Develop interoperability for human data
38. Federation s Genome-wide Network and
Modeling Approach
Califano group at Columbia Sage Bionetworks Butte group at Stanford
39. Human Aging Project
Data Transformations Machine Learning
Brain A
(n=363)
Interactome Elastic Net
Brain B
(n=145)
Brain C TF Activity Profile Age
(n=400) Network Prior Model
Models
Blood A
(n=~1000) Gene Set / Pathway
Variation Analysis
Blood B Tree Classifiers
(n=~1000)
Adipose
(n=~700)
41. Inferring Prostate Cancer Regulatory Modules for Glycolysis
&Glycogenesis Metabolism Pathway
Sage bionetworks approach
Prostate cancer global coherent
data set (GSE21032) Taylor BS. et al (2010) Cancer Cell 18(1):11-22
Integrated Bayesian Approach
Zhu J. et al (2008) Nature Genetics 40(7):
854-61
Glycolysis and Inferred Transcriptional
Glycogenesis Metablism Regulatory Network in Prostate
Gene Set (GGMSE) Cancer
Cox Proportional-Hazards
Prostate Cancer Regulatory Regression model based on
Modules for GGMSE and Other individual gene for recurrence free
Metabolism Pathways survival
Duarte N. et al (2006) PNAS 107(6):1777-1782
Metabolism pathways with regulatory
modules enriched by poor prognosis genes
for prostate cancer
42. Genes Associated with Poor Prognosis are disproportionally
found among the networks regulating the !glycolysis" Genes
P-Value<0.005 Size of the node proportional to -log10 P value for recurrence free survival.
Inferred regulatory module for GGMSE Inferred regulatory module for Oxidative
Phosphorylation and Sphingolipid
>5 fold enrichment of recurrence free prognostic genes with
Metabolism genes
the Glycolysis BN module than random selection (p<1e-100)
43. Federated Aging Project :
Combining analysis + narrative
=Sweave Vignette
Sage Lab
R code + PDF(plots + text + code snippets)
narrative
HTML
Data objects
Califano Lab Ideker Lab Submitted
Paper
Shared Data JIRA: Source code repository & wiki
Repository
44. Why not share clinical /genomic data and model building in the
ways currently used by the software industry
(power of tracking workflows and versioning
45. Synapse as a Github for building models of disease
61. Absurdity of Current R&D Ecosystem
• $200B per year in biomedical and drug discovery R&D
• Handful of new medicines approved each year
• Productivity in steady decline since 1950
• 90% of novel drugs entering clinical trials fail
• NIH and EU just started spending billions to duplicate process
• Significant pharma revenues going off patent in next 5 years
• >30,000 pharma employees fired in each of last four years
• Number of R&D sites in Europe down from 29 to 16 since 2009
62. What is the problem?
• Regulatory hurdles too high?
• Low hanging fruit picked?
• Payers unwilling to pay?
• Genome has not delivered?
• Valley of death?
• Companies not large enough to execute on strategy?
• Internal research costs too high?
• Clinical trials in developed countries too expensive?
In fact, all are true but none is the real problem
63. What is the problem?
• The current system is designed as if every new program is destined to
deliver an approved drug
• Past 20 years prove this assumption wrong (again and again)
• Why do promising early results rarely translate into approved drugs?
• Bottom line: we have poor understanding of biology
• Lack of early-data sharing within closed information systems dooms
drug discovery for frequent avoidable failure
64. What is the problem?
We need to rebuild the drug discovery process so that we
better understand disease biology before testing proprietary
compounds on sick patients
65. The solution – Arch2POCM
1. Create an Archipelago of clinicians and scientists from public
and private sectors to take projects from ideas to Proof of
Clinical Mechanism (POCM)
2. Arch2POCM is a collaborative, data-sharing network of
scientists, whose drug discovery objective is to use robust
compounds against new targets to disentangle the complexity
of human biology, not to create a medicine
3. Success?
• A compound that provides proof of concept for a novel target-
allowing companies to use this common information to compete,
with dramatic increased chances of success
• Culling targets with doomed mechanisms before multiple companies
waste money exploring them - at $50M a pop
66. Why data sharing through to Phase IIb?
• Most rapidly reveals limitations and opportunities associated with the
target
• Increases probability of success for internal proprietary programs
• Scientific decisions are not influenced by market considerations or
biased internal thinking
• Target mechanism is only properly tested at Phase IIb
67. Why no IP on “Common Stream” compounds?
• Allows multiple groups to test diverse indications without funds
from Arch2POCM- crowdsourcing drug discovery
• Broader and faster data dissemination
• Far fewer legal agreements to negotiate
• Generates “freedom to operate” on target because there are
no patent thickets to wade through
• Efficient way to access world’s top scientists and doctors
without hassle
69. First major milestones
2013- First Compound in clinical trials
2014- Go and No-Go Decisions from common stream of targets driving
Proprietary Programs
2014- Full complement of target programs activated
2014- Core Clinical Programs joined by crowdsourced clinical trials
70. why consider the fourth paradigm- data intensive science
thinking beyond the narrative, beyond pathways
advantages of an open innovation compute space
it is more about how than what
71. OPPORTUNITIES FOR MIPS COMMUNITY
Data sets, Tools and Models
Joining Synapse Communities
Joining Federation Projects
Joinig Arch2POCM
Change reward structures for sharing data
(patients and academics)