SlideShare a Scribd company logo
1 of 1
Coordinated Gene Activity in Pattern Sets (CoGAPS)
Conor Kelton and John Stansfield; Faculty Mentor: Dr. Michael F. Ochs
Department of Math/Statistics, Department of Biology, The College of New Jersey
Future Work
Acknowledgements
We would like to thank Dr. Ochs and his team, Elana Fertig, Wai Lee, and Alexander Favorov,
for mentoring us on this project. We also would like to thank Ondrej Maxian for his work on
modularizing and adding to the c++ code.
References
1.Hanahan, Douglas, and Robert A. Weinberg. "Hallmarks of Cancer: The Next Generation." Cell 144.5 (2011): 646-74.
2.Kossenkov, Andrew V., Aidan J. Peterson, and Michael F. Ochs. "Determining Transcription Factor Activity from Microarray Data Using Bayesian Markov Chain Monte Carlo Sampling." MEDINFO (2007)
3.Ochs, Michael F., et Al. "CoGAPS: Stochastic Nonnegative Matrix Factorization by Bayesian Markov Chain Monte Carlo.“
4.Ochs, M. F. "Knowledge-based Data Analysis Comes of Age." Briefings in Bioinformatics 11.1 (2010): 30-39.
5.Ochs, Michael F., and Et Al. "CoGAPS: Stochastic Nonnegative Matrix Factorization by Bayesian Markov Chain Monte Carlo."
Presented at TCNJ MUSE 2015 Poster Session, September 12, 2015Funded by Nation Library of Medicine Grant R01LM011000
Biology of Cancer
Integration of Signaling Networks
• Various biological databases for cell signaling data were
searched for gene interactions in order to compile relevant
networks for head and neck cancer.
• The biological goal of the project is to integrate the
information from cell signaling networks as prior statistical
information.
The source of many cancers is a mutation or multiple
mutations in the cellular signaling networks that
eukaryotic cells have to control growth and
metabolism. Common pathways involved in the onset
of cancer are the RAS-RAF, JAK-STAT, and TGFbR
signaling pathways. Mutations in these pathways can
cause proliferative growth signals or block signals that
inhibit growth and proliferation. CoGAPS uses gene
expression data from cancer studies as input.
In order to aid in statistical analysis
and data visualization, as well as to
ease prospective research
reproducibility, R and MATLAB
interfaces were created to pass data
to and receive results from, the main
C++ algorithm. The RCPP package
and MEX files were used to allow for
data conversions and piping on their
respective software.
The CoGAPS Algorithm Visualizations of CoGAPS:
• Atomic domain representation of the
decomposed A and P matrices. Samples
are randomly generated on the atomic
domains and added to the matrices.
Computational Reliabilities and Results
• Signals reach a cell at a receptor on the cell membrane. The
signal is relayed and activates/deactivates transcription factors
which then change the gene expression of the cell (See above
figure).
• This prior comes in the form of changing the probability
weightings of certain genes in the MCMC algorithm. It will allow
CoGAPS to more accurately find the correct solutions to the
decomposition problem, i.e. finding which transcription factors
are most related to HNSCC.
• Further work needs to be done to produce a final, more accurate
signaling network with extraneous information removed.
• Once the final network is constructed we will analyze the
network to build additional probability models to integrate into
the CoGAPS algorithm.
• Since CoGAPS is currently O(hours) compared to other
competitors on the order of O(minutes) a future goal is to
parallelize the algorithm in C++. The original data matrix can be
broken up into smaller matrices, of which can then be run
through the algorithm. The results from each will be compared.
• Graph a shows a heatmap of the decomposed A matrix. This
allows us to see the expressions of the genes across individual
patterns.
• Graph b shows the patters in the P matrix. Each color is a row
and the x axis is the column number. This shows us where the
patterns lie in each row.
• CoGAPS’s approach to this problem is to use Markov
Chain Monte Carlo (MCMC), a Gibbs Sampler that
uses an atomic domain representation of A and P.
This allows for exploration of the posterior
probability space (the data matrix) to produce a
biologically relevant solution to the decomposition.
• A major advantage to CoGAPS over other
techniques are that it produces a mean and
standard deviation for every cell of A and P. By not
obtaining a simple point estimate we can show how
well we are associating each gene and each sample
with each pattern.

More Related Content

What's hot (8)

Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Rasmol
RasmolRasmol
Rasmol
 
Molecular visualization
Molecular visualizationMolecular visualization
Molecular visualization
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Applications of Matrix
Applications of MatrixApplications of Matrix
Applications of Matrix
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Data handling metabolomics
Data handling metabolomicsData handling metabolomics
Data handling metabolomics
 
AWS Customer Presentation- Pathwork Diagnostics
AWS Customer Presentation- Pathwork DiagnosticsAWS Customer Presentation- Pathwork Diagnostics
AWS Customer Presentation- Pathwork Diagnostics
 

Similar to MUSEPosterCoGAPS

Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...Deakin University
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Manuel Martín
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Anubis Hosein
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Alexander Decker
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtaiSirris
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Christos Argyropoulos
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuAlexander Pico
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM​Iván Rodríguez
 

Similar to MUSEPosterCoGAPS (20)

Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...Comparative study of artificial neural network based classification for liver...
Comparative study of artificial neural network based classification for liver...
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
 
Cray HPC + D + A = HPDA
Cray HPC + D + A = HPDACray HPC + D + A = HPDA
Cray HPC + D + A = HPDA
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
 

MUSEPosterCoGAPS

  • 1. Coordinated Gene Activity in Pattern Sets (CoGAPS) Conor Kelton and John Stansfield; Faculty Mentor: Dr. Michael F. Ochs Department of Math/Statistics, Department of Biology, The College of New Jersey Future Work Acknowledgements We would like to thank Dr. Ochs and his team, Elana Fertig, Wai Lee, and Alexander Favorov, for mentoring us on this project. We also would like to thank Ondrej Maxian for his work on modularizing and adding to the c++ code. References 1.Hanahan, Douglas, and Robert A. Weinberg. "Hallmarks of Cancer: The Next Generation." Cell 144.5 (2011): 646-74. 2.Kossenkov, Andrew V., Aidan J. Peterson, and Michael F. Ochs. "Determining Transcription Factor Activity from Microarray Data Using Bayesian Markov Chain Monte Carlo Sampling." MEDINFO (2007) 3.Ochs, Michael F., et Al. "CoGAPS: Stochastic Nonnegative Matrix Factorization by Bayesian Markov Chain Monte Carlo.“ 4.Ochs, M. F. "Knowledge-based Data Analysis Comes of Age." Briefings in Bioinformatics 11.1 (2010): 30-39. 5.Ochs, Michael F., and Et Al. "CoGAPS: Stochastic Nonnegative Matrix Factorization by Bayesian Markov Chain Monte Carlo." Presented at TCNJ MUSE 2015 Poster Session, September 12, 2015Funded by Nation Library of Medicine Grant R01LM011000 Biology of Cancer Integration of Signaling Networks • Various biological databases for cell signaling data were searched for gene interactions in order to compile relevant networks for head and neck cancer. • The biological goal of the project is to integrate the information from cell signaling networks as prior statistical information. The source of many cancers is a mutation or multiple mutations in the cellular signaling networks that eukaryotic cells have to control growth and metabolism. Common pathways involved in the onset of cancer are the RAS-RAF, JAK-STAT, and TGFbR signaling pathways. Mutations in these pathways can cause proliferative growth signals or block signals that inhibit growth and proliferation. CoGAPS uses gene expression data from cancer studies as input. In order to aid in statistical analysis and data visualization, as well as to ease prospective research reproducibility, R and MATLAB interfaces were created to pass data to and receive results from, the main C++ algorithm. The RCPP package and MEX files were used to allow for data conversions and piping on their respective software. The CoGAPS Algorithm Visualizations of CoGAPS: • Atomic domain representation of the decomposed A and P matrices. Samples are randomly generated on the atomic domains and added to the matrices. Computational Reliabilities and Results • Signals reach a cell at a receptor on the cell membrane. The signal is relayed and activates/deactivates transcription factors which then change the gene expression of the cell (See above figure). • This prior comes in the form of changing the probability weightings of certain genes in the MCMC algorithm. It will allow CoGAPS to more accurately find the correct solutions to the decomposition problem, i.e. finding which transcription factors are most related to HNSCC. • Further work needs to be done to produce a final, more accurate signaling network with extraneous information removed. • Once the final network is constructed we will analyze the network to build additional probability models to integrate into the CoGAPS algorithm. • Since CoGAPS is currently O(hours) compared to other competitors on the order of O(minutes) a future goal is to parallelize the algorithm in C++. The original data matrix can be broken up into smaller matrices, of which can then be run through the algorithm. The results from each will be compared. • Graph a shows a heatmap of the decomposed A matrix. This allows us to see the expressions of the genes across individual patterns. • Graph b shows the patters in the P matrix. Each color is a row and the x axis is the column number. This shows us where the patterns lie in each row. • CoGAPS’s approach to this problem is to use Markov Chain Monte Carlo (MCMC), a Gibbs Sampler that uses an atomic domain representation of A and P. This allows for exploration of the posterior probability space (the data matrix) to produce a biologically relevant solution to the decomposition. • A major advantage to CoGAPS over other techniques are that it produces a mean and standard deviation for every cell of A and P. By not obtaining a simple point estimate we can show how well we are associating each gene and each sample with each pattern.

Editor's Notes

  1. New layout