SlideShare a Scribd company logo
1 of 1
Download to read offline
Heat map of gene expression data for
22,225 genes and 442 subjects
Introduction
Infectious diseases are a primary contributor to morbidity and
mortality. Each unique pathogen elicits a relatively distinct set of
signs and symptoms from its host, allowing physicians to diagnose,
treat, and often cure patients.
As these signs and symptoms are produced by the body’s response
to the pathogen, this must be produced by some fundamental
change at the host’s cellular level. Thus, infected cells must have
changes at the genetic level.
This project seeks to classify a host’s infection based on its gene
expression data. Further, in building a classification model,
individual genes which form the relevant decision boundary can be
identified.
Results
Model Validation
Feature Elimination
Predictive Genes
Enrichment Analysis (influenza)
Methods
Data acquisition
Data was acquired from the NCBI’s Gene Expression Omnibus.
Studies were included if they were:
‣conducted in humans;
‣used an Affymetrix Human Genome assay; and
‣originally studying disease expression.
Data was collected from studies examining eight conditions:
‣Human Immunodeficiency Virus (HIV);
‣Tuberculosis (TB);
‣Hepatitis C Virus (HCV);
‣measles;
‣influenza;
‣rhinovirus;
‣S. pneumoniae; and
‣malaria.
Data was standardized and normalized by log transformation,
using internal controls, and with quintile normalization. Each
pathogen was matched to a different pathogen or healthy control
from the same tissue type.
Model building
Support vector machines (SVMs) for each individual disease were
trained using linear basis functions, with 10-fold cross-validation
to calculate sensitivity, specificity, and ROC/AUC. The models were
further validated by using two studies whose entire data was held
out.
Feature elimination
For each disease, the 30% of genes which contributed the least to
the SVM (based on the absolute value of the weight) were
eliminated. SVMs were iteratively re-run until only one gene
remained.
Enrichment Analysis
Genes were selected as predictive of a disease if they were
included in the model with the fewest number of features by
disease, trained during the feature elimination process, while still
attaining at least 90% sensitivity.
Genes predictive of a disease were submitted to GO’s online
enrichment analysis tool, and significant results were determined
with Bonferroni correction.
Classifying Disease from Host Gene Expression Patterns
John Schrom
HCV ASB9
216822_x_at
HIV MEOX1
SLC16A5
HIST1H1D
Malaria CUX1
ITPK1
LAP3
UBQLN2
rhinovirus IL24
IFI44
FA2H
pneumoniae RGS4
STC1
AL137403
ADD3
PNMAL1
TB SECISBP2L
Biological/Molecular Function p-value
CXCR3 chemokine receptor binding 0.0081
Positive regulation of response to stimulus 0.0068
Positive regulation of cAMP-mediated signal 0.0076
cell-cell signaling 0.0083
Negative regulation of cAMP biosynth process 0.0102

More Related Content

What's hot

A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
Antoaneta Vladimirova
 
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerBioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
PreveenRamamoorthy
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Human Variome Project
 
Cervical cancer prevention current scenario
Cervical cancer prevention  current scenarioCervical cancer prevention  current scenario
Cervical cancer prevention current scenario
Nilesh Kucha
 
Enfermedades causadas por priones
Enfermedades causadas por prionesEnfermedades causadas por priones
Enfermedades causadas por priones
Cinthy Franco
 

What's hot (20)

Andrew Hudgens - Antimicrobial Resistance Surveillance and Reporting at Food ...
Andrew Hudgens - Antimicrobial Resistance Surveillance and Reporting at Food ...Andrew Hudgens - Antimicrobial Resistance Surveillance and Reporting at Food ...
Andrew Hudgens - Antimicrobial Resistance Surveillance and Reporting at Food ...
 
Interpreting genomic variation and phylogenetic trees to understand disease t...
Interpreting genomic variation and phylogenetic trees to understand disease t...Interpreting genomic variation and phylogenetic trees to understand disease t...
Interpreting genomic variation and phylogenetic trees to understand disease t...
 
SOJ Genetic Science
SOJ Genetic ScienceSOJ Genetic Science
SOJ Genetic Science
 
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
 
6 - Allen Poultry PFGE
6 - Allen Poultry PFGE6 - Allen Poultry PFGE
6 - Allen Poultry PFGE
 
Genomics in Public Health
Genomics in Public HealthGenomics in Public Health
Genomics in Public Health
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
Advances in Gene therapy for type 1 diabetes
Advances in Gene therapy for type 1 diabetesAdvances in Gene therapy for type 1 diabetes
Advances in Gene therapy for type 1 diabetes
 
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung CancerBioinformatics-driven discovery of EGFR mutant Lung Cancer
Bioinformatics-driven discovery of EGFR mutant Lung Cancer
 
JCV2010
JCV2010JCV2010
JCV2010
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 
Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases
 
Overview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategyOverview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategy
 
Dr. David Baumert - Swine Production Infection Chain
Dr. David Baumert - Swine Production Infection ChainDr. David Baumert - Swine Production Infection Chain
Dr. David Baumert - Swine Production Infection Chain
 
Cervical cancer prevention current scenario
Cervical cancer prevention  current scenarioCervical cancer prevention  current scenario
Cervical cancer prevention current scenario
 
Genomics in Society: Genomics, Preventive Medicine, and Society
Genomics in Society: Genomics, Preventive Medicine, and SocietyGenomics in Society: Genomics, Preventive Medicine, and Society
Genomics in Society: Genomics, Preventive Medicine, and Society
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015
 
Enfermedades causadas por priones
Enfermedades causadas por prionesEnfermedades causadas por priones
Enfermedades causadas por priones
 
Whole genome sequencing as a starting point to understanding antimicrobial re...
Whole genome sequencing as a starting point to understanding antimicrobial re...Whole genome sequencing as a starting point to understanding antimicrobial re...
Whole genome sequencing as a starting point to understanding antimicrobial re...
 
Dr. Jayaveeramuthu Nirmala - Vaccination as One of the Drivers of Influenza G...
Dr. Jayaveeramuthu Nirmala - Vaccination as One of the Drivers of Influenza G...Dr. Jayaveeramuthu Nirmala - Vaccination as One of the Drivers of Influenza G...
Dr. Jayaveeramuthu Nirmala - Vaccination as One of the Drivers of Influenza G...
 

Similar to poster-final

Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing PanelsAlgorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Thermo Fisher Scientific
 
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Mutiple Sclerosis
 
16. investigation of infection
16. investigation of infection16. investigation of infection
16. investigation of infection
Ahmad Hamadi
 
Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...
KrishMendapara1
 

Similar to poster-final (20)

Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing PanelsAlgorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
Algorithmically Optimized Gene Selection for Targeted Clinical Sequencing Panels
 
Advanced biotechnological tools for human health care dr shiv om pratap
Advanced biotechnological tools for human health care dr shiv om pratapAdvanced biotechnological tools for human health care dr shiv om pratap
Advanced biotechnological tools for human health care dr shiv om pratap
 
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
 
7. hcv in mexico
7. hcv in mexico7. hcv in mexico
7. hcv in mexico
 
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014
Human Disease Ontology Project presented at ISB's Biocurator meeting April 2014
 
16. investigation of infection
16. investigation of infection16. investigation of infection
16. investigation of infection
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Gene medicine by kk sahu sir
Gene medicine by kk sahu sirGene medicine by kk sahu sir
Gene medicine by kk sahu sir
 
patho.ppt
patho.pptpatho.ppt
patho.ppt
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
 
NGS-report-amir.pdf
NGS-report-amir.pdfNGS-report-amir.pdf
NGS-report-amir.pdf
 
Advances in diagnostic technology
Advances in diagnostic technologyAdvances in diagnostic technology
Advances in diagnostic technology
 
IOSR Journal of Pharmacy (IOSRPHR), www.iosrphr.org, call for paper, research...
IOSR Journal of Pharmacy (IOSRPHR), www.iosrphr.org, call for paper, research...IOSR Journal of Pharmacy (IOSRPHR), www.iosrphr.org, call for paper, research...
IOSR Journal of Pharmacy (IOSRPHR), www.iosrphr.org, call for paper, research...
 
Molecular epidemiology and Disease causation.pptx
Molecular epidemiology and Disease causation.pptxMolecular epidemiology and Disease causation.pptx
Molecular epidemiology and Disease causation.pptx
 
Gene Therapy - A Novel Approch in Medical Treatment
Gene Therapy - A Novel Approch in Medical TreatmentGene Therapy - A Novel Approch in Medical Treatment
Gene Therapy - A Novel Approch in Medical Treatment
 
Trends in viral diseases and diagnosis
Trends in viral diseases and diagnosisTrends in viral diseases and diagnosis
Trends in viral diseases and diagnosis
 
The 'omics' revolution: How will it improve our understanding of infections a...
The 'omics' revolution: How will it improve our understanding of infections a...The 'omics' revolution: How will it improve our understanding of infections a...
The 'omics' revolution: How will it improve our understanding of infections a...
 
Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...Establishment and analysis of a disease risk prediction model for chronic kid...
Establishment and analysis of a disease risk prediction model for chronic kid...
 
( Journal Club ) Procalcitonin as a diagnostic biomarker of sepsis: A tertiar...
( Journal Club ) Procalcitonin as a diagnostic biomarker of sepsis: A tertiar...( Journal Club ) Procalcitonin as a diagnostic biomarker of sepsis: A tertiar...
( Journal Club ) Procalcitonin as a diagnostic biomarker of sepsis: A tertiar...
 
EuroBioForum 2013 - Day 1 | Sergey Suchkov
EuroBioForum 2013 - Day 1 | Sergey SuchkovEuroBioForum 2013 - Day 1 | Sergey Suchkov
EuroBioForum 2013 - Day 1 | Sergey Suchkov
 

More from John Schrom

More from John Schrom (7)

wcms1p-100935
wcms1p-100935wcms1p-100935
wcms1p-100935
 
poster-final
poster-finalposter-final
poster-final
 
Abstract
AbstractAbstract
Abstract
 
sample-nbme-report
sample-nbme-reportsample-nbme-report
sample-nbme-report
 
2013-APHA-Slides
2013-APHA-Slides2013-APHA-Slides
2013-APHA-Slides
 
2013-AMIA-Slides
2013-AMIA-Slides2013-AMIA-Slides
2013-AMIA-Slides
 
2013-AMIA-Abstract
2013-AMIA-Abstract2013-AMIA-Abstract
2013-AMIA-Abstract
 

poster-final

  • 1. Heat map of gene expression data for 22,225 genes and 442 subjects Introduction Infectious diseases are a primary contributor to morbidity and mortality. Each unique pathogen elicits a relatively distinct set of signs and symptoms from its host, allowing physicians to diagnose, treat, and often cure patients. As these signs and symptoms are produced by the body’s response to the pathogen, this must be produced by some fundamental change at the host’s cellular level. Thus, infected cells must have changes at the genetic level. This project seeks to classify a host’s infection based on its gene expression data. Further, in building a classification model, individual genes which form the relevant decision boundary can be identified. Results Model Validation Feature Elimination Predictive Genes Enrichment Analysis (influenza) Methods Data acquisition Data was acquired from the NCBI’s Gene Expression Omnibus. Studies were included if they were: ‣conducted in humans; ‣used an Affymetrix Human Genome assay; and ‣originally studying disease expression. Data was collected from studies examining eight conditions: ‣Human Immunodeficiency Virus (HIV); ‣Tuberculosis (TB); ‣Hepatitis C Virus (HCV); ‣measles; ‣influenza; ‣rhinovirus; ‣S. pneumoniae; and ‣malaria. Data was standardized and normalized by log transformation, using internal controls, and with quintile normalization. Each pathogen was matched to a different pathogen or healthy control from the same tissue type. Model building Support vector machines (SVMs) for each individual disease were trained using linear basis functions, with 10-fold cross-validation to calculate sensitivity, specificity, and ROC/AUC. The models were further validated by using two studies whose entire data was held out. Feature elimination For each disease, the 30% of genes which contributed the least to the SVM (based on the absolute value of the weight) were eliminated. SVMs were iteratively re-run until only one gene remained. Enrichment Analysis Genes were selected as predictive of a disease if they were included in the model with the fewest number of features by disease, trained during the feature elimination process, while still attaining at least 90% sensitivity. Genes predictive of a disease were submitted to GO’s online enrichment analysis tool, and significant results were determined with Bonferroni correction. Classifying Disease from Host Gene Expression Patterns John Schrom HCV ASB9 216822_x_at HIV MEOX1 SLC16A5 HIST1H1D Malaria CUX1 ITPK1 LAP3 UBQLN2 rhinovirus IL24 IFI44 FA2H pneumoniae RGS4 STC1 AL137403 ADD3 PNMAL1 TB SECISBP2L Biological/Molecular Function p-value CXCR3 chemokine receptor binding 0.0081 Positive regulation of response to stimulus 0.0068 Positive regulation of cAMP-mediated signal 0.0076 cell-cell signaling 0.0083 Negative regulation of cAMP biosynth process 0.0102