SlideShare a Scribd company logo
Branch: An interactive, web-based tool for building decision tree 
classifiers 
Benjamin M. Good, Karthik Gangavarapu, Vyshakh Babji, Max Nanis, Andrew I. Su 
ABSTRACT 
A crucial task in modern biology is the prediction of complex 
phenotypes, such as breast cancer prognosis, from genome-wide 
measurements. Machine learning algorithms can sometimes infer 
predictive patterns, but there is rarely enough data to train and test 
them effectively and the patterns that they identify are often 
expressed in forms (e.g. support vector machines, neural networks, 
random forests composed of 10s of thousands of trees) that are 
highly difficult to understand. In addition, it is generally unclear 
how to include prior knowledge in the course of their construction. 
Decision trees provide an intuitive visual form that can capture 
complex interactions between multiple variables. Effective methods 
exist for inferring decision trees automatically but it has been shown 
that these techniques can be improved upon via the manual 
interventions of experts. Here, we introduce Branch, a new Web-based 
tool for the interactive construction of decision trees from 
genomic datasets. Branch offers the ability to: (1) upload and share 
datasets intended for classification tasks (in progress), (2) construct 
decision trees by manually selecting features such as genes for a 
gene expression dataset, (3) collaboratively edit decision trees, (4) 
create feature functions that aggregate content from multiple 
independent features into single decision nodes (e.g. pathways) and 
(5) evaluate decision tree classifiers in terms of precision and recall. 
The tool is optimized for genomic use cases through the inclusion of 
gene and pathway-based search functions. 
Branch enables expert biologists to easily engage directly with high-throughput 
datasets without the need for a team of 
bioinformaticians. The tree building process allows researchers to 
rapidly test hypotheses about interactions between biological 
variables and phenotypes in ways that would otherwise require 
extensive computational sophistication. In so doing, this tool can 
both inform biological research and help to produce more accurate, 
more meaningful classifiers. 
A prototype of Branch is available at http://biobranch.org/ 
The Scripps Research Institute 
Background 
Feature types 
REFERENCES 
CONTACT 
Benjamin Good: bgood@scripps.edu @bgood 
Andrew Su: asu@scripps.edu @andrewsu 
Dataset library 
http://biobranch.org/ 
Building a decision tree 
Research reported in this poster was supported by the National Institute of General Medical Sciences 
of the National Institutes of Health under award numbers R01GM089820 and R01GM083924, and by 
the National Center for Advancing Translational Sciences of the National Institute of Health under 
award number UL1TR001114. 
Goals 
(1) Find patterns 
(2) make predictions 
on new samples 
< 10 year >10 year 
< 10 year ? 
> 10 year ? 
1. Griffith et al (2013) A robust prognostic signature for hormone-positive node-negative 
breast cancer. Genome Medicine. 
2. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and 
Cancer. PLoS Computational Biology 
3. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer 
Patients by Network-Based Ranking of Marker Genes. PLoS Computational Biology 
4. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction 
networks. BMC Bioinformatics 
5. Paik et al (2004) A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node- 
Negative Breast Cancer. The New England Journal of Medicine 
6. Mihael et al. (1999) Visual classification: an interactive approach to decision tree 
construction. Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining. 
7. Malcolm W. (2002) Interactive machine learning: letting users build classifiers. 
International Journal of Human-Counter Studies. 
Example: breast cancer survival prediction 
Gene Expression Data 
(+CNVs, SNPs, etc..) (3) Understand the biology that 
the pattern indicates 
Statistics and machine learning 
• Example, Random Forests [1] 
• Good at (1) finding patterns 
• Have mixed results at (2) identifying patterns that 
generalize well across cohorts 
• Sometimes offer little help for (3) increasing 
understanding of the underlying biology 
Prior knowledge 
• Known relationships between the data elements 
(e.g. genes) can be used to improve predictor 
accuracy and generalizability. 
• Examples of inputs to automated methods: protein-protein 
interactions [2,3], pathway databases [4] 
• Manual consideration by domain experts is a vital 
aspect to the inference of new classifiers and is 
fundamental to the formation of understanding. 
See for example the creation of the OncoTypeDx 
predictor for breast cancer prognosis [5] 
Funding 
Decision Trees 
• Can be inferred automatically but.. 
• Engaging domain experts in their creation: 
• (1) provides access to prior knowledge, (2) results in 
smaller, more understandable trees, (3) can improve 
predictive performance, (4) can increase user’s 
comprehension of both the classifier and the data [6,7] 
Clicking on a node shows the 
percentage of the dataset that 
passes through it and its 
accuracy. 
View/use trees shared by community 
• Gene (e.g. expression) 
• Non-gene (e.g. clinical data) 
• Custom feature (manually created feature 
combination) 
• Classifier node (e.g. a trained SVM) 
• Pre-existing tree 
• Visual (manually defined decision 
boundary using GUI) 
• Create a classifier node. 
Iteratively select feature to create each split (If, Then rule) 
Transplant 
rejection 
HIV-1 coreceptor 
usage 
• Test datasets loaded: 
• Breast cancer survival (gene expression) 
• Kidney transplant rejection (gene expression) 
• HIV coreceptor usage (amino acid sequences) 
• Coming soon: upload your own data 
The number of colored squares indicate the 
number of samples that pass through the 
node. The colors are associated with the 
classes to be predicted. Ideal leaf nodes are 
‘pure’ in that they only contain one kind of 
class. 
Breast cancer 
survival 
Decision trees can be made 
private or shared with the 
public when saved. Public 
trees may be used as a 
starting point for others. 
For collaboratively authored 
trees, the author associated 
with each node is tracked. 
http://biobranch.org/

More Related Content

What's hot

Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
Pragya Pai
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
Pistoia Alliance
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
Pistoia Alliance
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicine
Pranavathiyani G
 
NRNB Annual Report 2011
NRNB Annual Report 2011NRNB Annual Report 2011
NRNB Annual Report 2011
Alexander Pico
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
Alexander Pico
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
Alexander Pico
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
IJTET Journal
 
Final report
Final reportFinal report
Final reportTian Hao
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
Research Information Network
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
Alexander Pico
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
Pistoia Alliance
 
Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...
Wisdom Dlamini
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
Alexander Pico
 
BTIS
BTISBTIS
BTIS
samhati27
 
An Introduction to Biology with Computers
An Introduction to Biology with ComputersAn Introduction to Biology with Computers
An Introduction to Biology with Computers
Brittany Lasseigne, Ph.D.
 
Systems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineSystems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicine
improvemed
 

What's hot (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
Application of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicineApplication of blockchain technology in healthcare and biomedicine
Application of blockchain technology in healthcare and biomedicine
 
NRNB Annual Report 2011
NRNB Annual Report 2011NRNB Annual Report 2011
NRNB Annual Report 2011
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
 
Final report
Final reportFinal report
Final report
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
NetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David AmarNetBioSIG2014-Talk by David Amar
NetBioSIG2014-Talk by David Amar
 
ThaddeusBerger_Poster
ThaddeusBerger_PosterThaddeusBerger_Poster
ThaddeusBerger_Poster
 
B.3.5
B.3.5B.3.5
B.3.5
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...
 
Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020Overall Vision for NRNB: 2015-2020
Overall Vision for NRNB: 2015-2020
 
BTIS
BTISBTIS
BTIS
 
An Introduction to Biology with Computers
An Introduction to Biology with ComputersAn Introduction to Biology with Computers
An Introduction to Biology with Computers
 
Systems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicineSystems biology for medical students/Systems medicine
Systems biology for medical students/Systems medicine
 

Viewers also liked

Why digital communications for seniors suck? By Agata Kukwa. #RockitWAW
Why digital communications for seniors suck? By Agata Kukwa. #RockitWAWWhy digital communications for seniors suck? By Agata Kukwa. #RockitWAW
Why digital communications for seniors suck? By Agata Kukwa. #RockitWAW
DigiComNet
 
Finanças e Investimentos para Startups - Startup Pirates Foz ´14
Finanças e Investimentos para Startups - Startup Pirates Foz ´14Finanças e Investimentos para Startups - Startup Pirates Foz ´14
Finanças e Investimentos para Startups - Startup Pirates Foz ´14
Ricardo Moraes
 
RAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenza
RAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenzaRAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenza
RAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenza
BTO Educational
 
Burlando win xp original
Burlando win xp originalBurlando win xp original
Burlando win xp original
Emanuel Francisco
 
Isha Arogya Information
Isha Arogya InformationIsha Arogya Information
Isha Arogya InformationIsha Outreach
 
Prolapso mitral
Prolapso mitralProlapso mitral
Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013
Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013
Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013
Leonardo Naressi
 
Search Intelligence - Social Media e Search Marketing - Proxxima 2011
Search Intelligence - Social Media e Search Marketing - Proxxima 2011Search Intelligence - Social Media e Search Marketing - Proxxima 2011
Search Intelligence - Social Media e Search Marketing - Proxxima 2011
Leonardo Naressi
 
Casslyn Tan - American Decline
Casslyn Tan - American DeclineCasslyn Tan - American Decline
Casslyn Tan - American Declinecynrx
 
O show de Paul McCartney no Brasil nas redes sociais
O show de Paul McCartney no Brasil nas redes sociaisO show de Paul McCartney no Brasil nas redes sociais
O show de Paul McCartney no Brasil nas redes sociais
Leonardo Naressi
 
REGIONE TOSCANA - Rapporto partecipazione 2009
REGIONE TOSCANA - Rapporto partecipazione 2009REGIONE TOSCANA - Rapporto partecipazione 2009
REGIONE TOSCANA - Rapporto partecipazione 2009
BTO Educational
 
Social Media for Artists
Social Media for ArtistsSocial Media for Artists
Social Media for Artists
SOMArts
 
Database automated build and test - SQL In The City Cambridge
Database automated build and test - SQL In The City CambridgeDatabase automated build and test - SQL In The City Cambridge
Database automated build and test - SQL In The City Cambridge
Red Gate Software
 
Glossario de Metricas e Midias Interativas
Glossario de Metricas e Midias InterativasGlossario de Metricas e Midias Interativas
Glossario de Metricas e Midias Interativas
Leonardo Naressi
 

Viewers also liked (20)

World Diabetes Day 2011 posters
World Diabetes Day 2011 postersWorld Diabetes Day 2011 posters
World Diabetes Day 2011 posters
 
Diabetes Poster
Diabetes PosterDiabetes Poster
Diabetes Poster
 
Why digital communications for seniors suck? By Agata Kukwa. #RockitWAW
Why digital communications for seniors suck? By Agata Kukwa. #RockitWAWWhy digital communications for seniors suck? By Agata Kukwa. #RockitWAW
Why digital communications for seniors suck? By Agata Kukwa. #RockitWAW
 
Finanças e Investimentos para Startups - Startup Pirates Foz ´14
Finanças e Investimentos para Startups - Startup Pirates Foz ´14Finanças e Investimentos para Startups - Startup Pirates Foz ´14
Finanças e Investimentos para Startups - Startup Pirates Foz ´14
 
RAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenza
RAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenzaRAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenza
RAPPORTO 2009 - Toscana, la società dell’informazione e della conoscenza
 
Burlando win xp original
Burlando win xp originalBurlando win xp original
Burlando win xp original
 
Isha Arogya Information
Isha Arogya InformationIsha Arogya Information
Isha Arogya Information
 
Prolapso mitral
Prolapso mitralProlapso mitral
Prolapso mitral
 
Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013
Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013
Modelagem e Análise de Dados em PPC - Search Masters Brasil 2013
 
Search Intelligence - Social Media e Search Marketing - Proxxima 2011
Search Intelligence - Social Media e Search Marketing - Proxxima 2011Search Intelligence - Social Media e Search Marketing - Proxxima 2011
Search Intelligence - Social Media e Search Marketing - Proxxima 2011
 
Cartaopostal
CartaopostalCartaopostal
Cartaopostal
 
Apresentacao
ApresentacaoApresentacao
Apresentacao
 
Você tem...
Você tem...Você tem...
Você tem...
 
Casslyn Tan - American Decline
Casslyn Tan - American DeclineCasslyn Tan - American Decline
Casslyn Tan - American Decline
 
O show de Paul McCartney no Brasil nas redes sociais
O show de Paul McCartney no Brasil nas redes sociaisO show de Paul McCartney no Brasil nas redes sociais
O show de Paul McCartney no Brasil nas redes sociais
 
REGIONE TOSCANA - Rapporto partecipazione 2009
REGIONE TOSCANA - Rapporto partecipazione 2009REGIONE TOSCANA - Rapporto partecipazione 2009
REGIONE TOSCANA - Rapporto partecipazione 2009
 
Social Media for Artists
Social Media for ArtistsSocial Media for Artists
Social Media for Artists
 
Database automated build and test - SQL In The City Cambridge
Database automated build and test - SQL In The City CambridgeDatabase automated build and test - SQL In The City Cambridge
Database automated build and test - SQL In The City Cambridge
 
Neem je mee
Neem je meeNeem je mee
Neem je mee
 
Glossario de Metricas e Midias Interativas
Glossario de Metricas e Midias InterativasGlossario de Metricas e Midias Interativas
Glossario de Metricas e Midias Interativas
 

Similar to Branch: An interactive, web-based tool for building decision tree classifiers

Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
mikaelhuss
 
C0344023028
C0344023028C0344023028
C0344023028
inventionjournals
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
Human Variome Project
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
AriyoAgbajeGbeminiyi
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Sage Base
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
Datamining Tools
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
DataminingTools Inc
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
Dr. Naveen Gaurav srivastava
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
ijscai
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
ExternalEvents
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...
cseij
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
ijaia
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
gerogepatton
 
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
gerogepatton
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
ijdmtaiir
 

Similar to Branch: An interactive, web-based tool for building decision tree classifiers (20)

Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
UNMSymposium2014
UNMSymposium2014UNMSymposium2014
UNMSymposium2014
 
C0344023028
C0344023028C0344023028
C0344023028
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
 
Updated proposal powerpoint.pptx
Updated proposal powerpoint.pptxUpdated proposal powerpoint.pptx
Updated proposal powerpoint.pptx
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
Data Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer DiagnosisData Mining Techniques In Computer Aided Cancer Diagnosis
Data Mining Techniques In Computer Aided Cancer Diagnosis
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Izant openscience
Izant openscienceIzant openscience
Izant openscience
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
 

More from Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
Benjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Benjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
Benjamin Good
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
Benjamin Good
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
Benjamin Good
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
Benjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
Benjamin Good
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
Benjamin Good
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
Benjamin Good
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
Benjamin Good
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
Benjamin Good
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
Benjamin Good
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
Benjamin Good
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
Benjamin Good
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
Benjamin Good
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
Benjamin Good
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
Benjamin Good
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
Benjamin Good
 

More from Benjamin Good (20)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
2016 mem good
2016 mem good2016 mem good
2016 mem good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Branch: An interactive, web-based tool for building decision tree classifiers

  • 1. Branch: An interactive, web-based tool for building decision tree classifiers Benjamin M. Good, Karthik Gangavarapu, Vyshakh Babji, Max Nanis, Andrew I. Su ABSTRACT A crucial task in modern biology is the prediction of complex phenotypes, such as breast cancer prognosis, from genome-wide measurements. Machine learning algorithms can sometimes infer predictive patterns, but there is rarely enough data to train and test them effectively and the patterns that they identify are often expressed in forms (e.g. support vector machines, neural networks, random forests composed of 10s of thousands of trees) that are highly difficult to understand. In addition, it is generally unclear how to include prior knowledge in the course of their construction. Decision trees provide an intuitive visual form that can capture complex interactions between multiple variables. Effective methods exist for inferring decision trees automatically but it has been shown that these techniques can be improved upon via the manual interventions of experts. Here, we introduce Branch, a new Web-based tool for the interactive construction of decision trees from genomic datasets. Branch offers the ability to: (1) upload and share datasets intended for classification tasks (in progress), (2) construct decision trees by manually selecting features such as genes for a gene expression dataset, (3) collaboratively edit decision trees, (4) create feature functions that aggregate content from multiple independent features into single decision nodes (e.g. pathways) and (5) evaluate decision tree classifiers in terms of precision and recall. The tool is optimized for genomic use cases through the inclusion of gene and pathway-based search functions. Branch enables expert biologists to easily engage directly with high-throughput datasets without the need for a team of bioinformaticians. The tree building process allows researchers to rapidly test hypotheses about interactions between biological variables and phenotypes in ways that would otherwise require extensive computational sophistication. In so doing, this tool can both inform biological research and help to produce more accurate, more meaningful classifiers. A prototype of Branch is available at http://biobranch.org/ The Scripps Research Institute Background Feature types REFERENCES CONTACT Benjamin Good: bgood@scripps.edu @bgood Andrew Su: asu@scripps.edu @andrewsu Dataset library http://biobranch.org/ Building a decision tree Research reported in this poster was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award numbers R01GM089820 and R01GM083924, and by the National Center for Advancing Translational Sciences of the National Institute of Health under award number UL1TR001114. Goals (1) Find patterns (2) make predictions on new samples < 10 year >10 year < 10 year ? > 10 year ? 1. Griffith et al (2013) A robust prognostic signature for hormone-positive node-negative breast cancer. Genome Medicine. 2. Dutkowski and Ideker (2011) Protein Networks as Logic Functions in Development and Cancer. PLoS Computational Biology 3. Winter et al (2012) Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes. PLoS Computational Biology 4. Liu et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics 5. Paik et al (2004) A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node- Negative Breast Cancer. The New England Journal of Medicine 6. Mihael et al. (1999) Visual classification: an interactive approach to decision tree construction. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. 7. Malcolm W. (2002) Interactive machine learning: letting users build classifiers. International Journal of Human-Counter Studies. Example: breast cancer survival prediction Gene Expression Data (+CNVs, SNPs, etc..) (3) Understand the biology that the pattern indicates Statistics and machine learning • Example, Random Forests [1] • Good at (1) finding patterns • Have mixed results at (2) identifying patterns that generalize well across cohorts • Sometimes offer little help for (3) increasing understanding of the underlying biology Prior knowledge • Known relationships between the data elements (e.g. genes) can be used to improve predictor accuracy and generalizability. • Examples of inputs to automated methods: protein-protein interactions [2,3], pathway databases [4] • Manual consideration by domain experts is a vital aspect to the inference of new classifiers and is fundamental to the formation of understanding. See for example the creation of the OncoTypeDx predictor for breast cancer prognosis [5] Funding Decision Trees • Can be inferred automatically but.. • Engaging domain experts in their creation: • (1) provides access to prior knowledge, (2) results in smaller, more understandable trees, (3) can improve predictive performance, (4) can increase user’s comprehension of both the classifier and the data [6,7] Clicking on a node shows the percentage of the dataset that passes through it and its accuracy. View/use trees shared by community • Gene (e.g. expression) • Non-gene (e.g. clinical data) • Custom feature (manually created feature combination) • Classifier node (e.g. a trained SVM) • Pre-existing tree • Visual (manually defined decision boundary using GUI) • Create a classifier node. Iteratively select feature to create each split (If, Then rule) Transplant rejection HIV-1 coreceptor usage • Test datasets loaded: • Breast cancer survival (gene expression) • Kidney transplant rejection (gene expression) • HIV coreceptor usage (amino acid sequences) • Coming soon: upload your own data The number of colored squares indicate the number of samples that pass through the node. The colors are associated with the classes to be predicted. Ideal leaf nodes are ‘pure’ in that they only contain one kind of class. Breast cancer survival Decision trees can be made private or shared with the public when saved. Public trees may be used as a starting point for others. For collaboratively authored trees, the author associated with each node is tracked. http://biobranch.org/