SlideShare a Scribd company logo
1 of 1
Download to read offline
http://www.mrsymbiomath.eu
This work has been partially supported by the Mr.Symbiomath IAPP (Project Code: 324554); the ‘Plataforma de Recursos Biomoleculares y Bioinformaticos (ISCIII-PT13.0001.0012)’ and ‘Proyecto de Excelencia Junta de Andalucia
(P10-TIC-6108)’
Alex Upton1, Priscill Orue1, Oswaldo Trelles1,2
1 Computer Architecture Department, University of Malaga (UMA), Spain
2 RISC Software GmbH. Hagenberg, Austria
Abstract
It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation [1]. Multi-SNP interactions, also known as epistatic
interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. However, epistatic
analysis methods are both computationally expensive, and have limited accessibility for biologists wanting to analyse GWAS datasets due to being command line based.
Here we present APPistatic, a prototype desktop version of a pipeline for epistatic analysis of GWAS datasets. This application combines ease-of-use, via a GUI, with accelerated implementation of
BOOST [2] and FaST-LMM [3] epistatic analysis methods.
Pipeline
Conclusions
• Implementation of the analysis methods via a GUI results in improved accessibility, thereby making epistatic analysis tools a viable option for end users such as biologists that are not comfortable
with command line based tools. This allows further analysis of GWAS data sets, potentially building on existing analysis and resulting in additional genetic information being discovered.
• Notable improvement in execution time also obtained, compared to default execution of epistatic analysis tools. Future HPC deployment makes typical GWAS data set analysis feasible; a relatively
small GWAS dataset, with 100,000 SNPs that pass quality control, has 5x10-9 pairwise interactions, that would take approximately two years to calculate on a desktop computer. Using HPC, this
can be executed in a number of days, aiding in the analysis of genetic variants of disease.
• In addition, a cloud-based version of the pipeline could also be developed using Web services, which could be accessed via a client such as jORCA [6]. Cloud Computing allows researchers to rent
computational and storage resources on an ad-hoc basis for large scale data processing, allowing access to High Performance Computing. Furthermore, this implementation could join up with
existing cloud-based pipelines to create an all-in-one process. Additionally, we are exploring the option of exporting results directly to visualisation software for visual inspection of the results.
Accessibility
Analysis of GWAS Data
The application provides an easy-to-use all-in-one analysis of GWAS data by
incorporating a number of analysis steps which are shown in Figure 1 below.
Steps Involved
(1) End user loads GWAS files of interest. These can be either in VCF or PLINK format.
For end users with raw .CEL files, one recommended tool for obtaining VCF files is
the Cloud-based GWAS Analysis Pipeline for Clinical Researchers [4].
(2) Prior to epistatic analysis, it is of interest to carry out single SNP association analysis.
This is performed using the widely used tool PLINK [5].
(3) The next step is to carry out an epistatic analysis using an optimised implementation
of BOOST that takes advantage of the multi-core environment of modern computers.
(4) The next step is to use the FaST-LMM [3] analysis tools. Prior to using these, the user
files have to be converted to ensure compatibility. This is carried out in this step.
(5) The next step is to carry out a single SNP association analysis with FaST-LMM, that
corrects for population structure.
(6) The final step is to carry out an epistatic analysis using FaST-LMM. As with BOOST,
implementation has been optimised to take advantage of multiple cores.
Acceleration
Desktop PC Implementation
The execution of APPistatic on a typical desktop PC results in a speedup of between 4
and 8 times for epistatic analysis, depending on the number of cores. The screenshot
above shows the default acceleration, using 4 tasks and 256MB RAM per task.
HPC Implementation
Greater speedup making the analysis of typical GWAS datasets feasible is obtained by
using High Performance Computing (HPC). Initial HPC deployment using 100 cores
shows a promising speedup of over 114 times. Table 2 below shows the execution times
for BOOST and FaST-LMM epistatic analysis for a demo data set for both a typical
desktop PC running Windows, and initial HPC deployment. It should be noted that the
demo data set contains 10,000 SNPs. The faster execution time of BOOST is due to the
use of a linear regression model, compared to the linear mixed method model used by
FaST-LMM.
Computational Environment BOOST Epistatic
Execution Time (s)
FaST-LMM Epistatic
Execution Time (s)
Standard Implementation (a) 25.4 15123
Appistatic Deployed on Desktop PC (b) 4.8 1903
Deployment on HPC (c) 1.2 132
(a) Default execution of applications from command line on Desktop PC (detailed below)
(b) Desktop PC with Intel Core 2 Quad 2.66 GHz CPU and 4GB RAM running Windows 7
(c) Split into 100 tasks with 4 cores and 8GB ram assigned to each task
References
[1] Anunciação, Orlando, Susana Vinga, and Arlindo L. Oliveira. "Using Information Interaction to Discover Epistatic Effects in Complex Diseases." PloS one 8, no. 10 (2013): e76300.
[2] Wan, Xiang, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson LS Tang, and Weichuan Yu. "BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies." The American Journal of Human Genetics 87, no. 3 (2010):
325-340.
[3] Lippert, Christoph, Jennifer Listgarten, Ying Liu, Carl M. Kadie, Robert I. Davidson, and David Heckerman. "FaST linear mixed models for genome-wide association studies." Nature Methods 8, no. 10 (2011): 833-835.
[4] P. Heinzlreiter, J. Perkins, O. Torreñno Tirado, J. Karlsson, A. Mitterecker, M. Blanca and O. Trelles. "A Cloud-based GWAS Analysis Pipeline for Clinical Researchers" 4th International Conference on Cloud Computing and Services Science, CLOSER 2014.
[5] Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel AR Ferreira, David Bender, Julian Maller et al. "PLINK: a tool set for whole-genome association and population-based linkage analyses." The American Journal of Human Genetics 81, no.
3 (2007): 559-575.
[6] Martín-Requena, Victoria, Javier Ríos, Maximiliano García, Sergio Ramírez, and Oswaldo Trelles. "jORCA: easily integrating bioinformatics Web Services." Bioinformatics 26, no. 4 (2010): 553-559.
Figure 1: Overview of Pipeline
Graphical User Interface
Providing GUI access to epistatic analysis
methods, along with single SNP association
methods, improves their accessibility as multiple
tools are accessed in the same manner,
allowing targeted non-expert computer users,
e.g. biologists, to easily analyse their GWAS
datasets without having to learn different
commands for each tool. The GUI is shown in
Figure 2 on the left. Note the easily configurable
options for acceleration. The prototype version
of APPistatic can be downloaded from:
Figure 2: Implementation Results
http://chirimoyo.ac.uma.es/appistaticFigure 2: APPistatic GUI

More Related Content

What's hot

Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Peter van Heusden
 
(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
Akram Pasha
 

What's hot (18)

Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible research
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 
Integrative data management for reproducibility of microscopy experiments
Integrative data management for reproducibility of microscopy experimentsIntegrative data management for reproducibility of microscopy experiments
Integrative data management for reproducibility of microscopy experiments
 
(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...(2016)application of parallel glowworm swarm optimization algorithm for data ...
(2016)application of parallel glowworm swarm optimization algorithm for data ...
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Rethinking data intensive science using scalable analytics systems
 Rethinking data intensive science using scalable analytics systems Rethinking data intensive science using scalable analytics systems
Rethinking data intensive science using scalable analytics systems
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Method for conducting a combined analysis of grid environment’s fta and gwa t...
Method for conducting a combined analysis of grid environment’s fta and gwa t...Method for conducting a combined analysis of grid environment’s fta and gwa t...
Method for conducting a combined analysis of grid environment’s fta and gwa t...
 
VariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitVariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn Langit
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 

Viewers also liked

Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
mailund
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
mailund
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
jsrep91
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSA
heathermerk
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
Varsha Gayatonde
 
Genetic Linkage
Genetic LinkageGenetic Linkage
Genetic Linkage
Jolie Yu
 

Viewers also liked (20)

Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
 
GEE & GLMM in GWAS
GEE & GLMM in GWASGEE & GLMM in GWAS
GEE & GLMM in GWAS
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
 
linkage
linkagelinkage
linkage
 
SNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti LabSNPs Presentation Cavalcanti Lab
SNPs Presentation Cavalcanti Lab
 
Intro gwas
Intro gwasIntro gwas
Intro gwas
 
Measures of Linkage Disequilibrium
Measures of Linkage DisequilibriumMeasures of Linkage Disequilibrium
Measures of Linkage Disequilibrium
 
Ch5 linkage
Ch5 linkageCh5 linkage
Ch5 linkage
 
Estimation of Linkage Disequilibrium using GGT2 Software
Estimation of Linkage Disequilibrium using GGT2 SoftwareEstimation of Linkage Disequilibrium using GGT2 Software
Estimation of Linkage Disequilibrium using GGT2 Software
 
Lecture 3 l dand_haplotypes_full
Lecture 3 l dand_haplotypes_fullLecture 3 l dand_haplotypes_full
Lecture 3 l dand_haplotypes_full
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSA
 
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
Mapping and Applications of Linkage Disequilibrium and Association Mapping in...
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talk
 
Genelinkagemap
GenelinkagemapGenelinkagemap
Genelinkagemap
 
Introduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tasselIntroduction to association mapping and tutorial using tassel
Introduction to association mapping and tutorial using tassel
 
GWAS
GWASGWAS
GWAS
 
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
Application of Genome-Wide Association Study (GWAS) and transcriptomics to st...
 
How to solve linkage map problems
How to solve linkage map problemsHow to solve linkage map problems
How to solve linkage map problems
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Genetic Linkage
Genetic LinkageGenetic Linkage
Genetic Linkage
 

Similar to Accelerating GWAS epistatic interaction analysis methods

Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
Ian Foster
 
Chapter 5 applications of neural networks
Chapter 5           applications of neural networksChapter 5           applications of neural networks
Chapter 5 applications of neural networks
Punit Saini
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
Oladokun Sulaiman
 

Similar to Accelerating GWAS epistatic interaction analysis methods (20)

Poster (1)
Poster (1)Poster (1)
Poster (1)
 
2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx11-Big Data Application in Biomedical Research and Health Care.pptx
11-Big Data Application in Biomedical Research and Health Care.pptx
 
REAL-TIME BLEEDING DETECTION IN GASTROINTESTINAL TRACT ENDOSCOPIC EXAMINATION...
REAL-TIME BLEEDING DETECTION IN GASTROINTESTINAL TRACT ENDOSCOPIC EXAMINATION...REAL-TIME BLEEDING DETECTION IN GASTROINTESTINAL TRACT ENDOSCOPIC EXAMINATION...
REAL-TIME BLEEDING DETECTION IN GASTROINTESTINAL TRACT ENDOSCOPIC EXAMINATION...
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Chapter 5 applications of neural networks
Chapter 5           applications of neural networksChapter 5           applications of neural networks
Chapter 5 applications of neural networks
 
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
 
healthcare application using cloud platform
healthcare  application using cloud platformhealthcare  application using cloud platform
healthcare application using cloud platform
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
 
IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016 IEEE Fuzzy system Title and Abstract 2016
IEEE Fuzzy system Title and Abstract 2016
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
 
BIOMED_presentation.ppt
BIOMED_presentation.pptBIOMED_presentation.ppt
BIOMED_presentation.ppt
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 

More from Priscill Orue Esquivel

More from Priscill Orue Esquivel (9)

WiSANCloud: a set of UML-based specifications for the integration of Wireless...
WiSANCloud: a set of UML-based specifications for the integration of Wireless...WiSANCloud: a set of UML-based specifications for the integration of Wireless...
WiSANCloud: a set of UML-based specifications for the integration of Wireless...
 
IA conexionista-RNA --Prueba y entrenamiento con modelos de RNA (2)
IA conexionista-RNA --Prueba y entrenamiento con modelos de RNA (2)IA conexionista-RNA --Prueba y entrenamiento con modelos de RNA (2)
IA conexionista-RNA --Prueba y entrenamiento con modelos de RNA (2)
 
IA conexionista-RNA -- Prueba y entrenamiento con modelos de RNA
IA conexionista-RNA -- Prueba y entrenamiento con modelos de RNAIA conexionista-RNA -- Prueba y entrenamiento con modelos de RNA
IA conexionista-RNA -- Prueba y entrenamiento con modelos de RNA
 
IA conexionista-Redes Neuronales Artificiales: introducción
IA conexionista-Redes Neuronales Artificiales: introducciónIA conexionista-Redes Neuronales Artificiales: introducción
IA conexionista-Redes Neuronales Artificiales: introducción
 
Plan de curso
Plan de cursoPlan de curso
Plan de curso
 
Aplicación de las Redes Hopfield al Problema de Asignación
Aplicación de las Redes Hopfield al Problema de AsignaciónAplicación de las Redes Hopfield al Problema de Asignación
Aplicación de las Redes Hopfield al Problema de Asignación
 
Análisis estáticos y dinámicos en la aplicación de pruebas de intrusión (Pene...
Análisis estáticos y dinámicos en la aplicación de pruebas de intrusión (Pene...Análisis estáticos y dinámicos en la aplicación de pruebas de intrusión (Pene...
Análisis estáticos y dinámicos en la aplicación de pruebas de intrusión (Pene...
 
Aprendizaje Computacional: Valoraciones personales sobre métodos de etiquetad...
Aprendizaje Computacional: Valoraciones personales sobre métodos de etiquetad...Aprendizaje Computacional: Valoraciones personales sobre métodos de etiquetad...
Aprendizaje Computacional: Valoraciones personales sobre métodos de etiquetad...
 
Perspectiva docente del diseño de contenidos y evaluación para cursos a dista...
Perspectiva docente del diseño de contenidos y evaluación para cursos a dista...Perspectiva docente del diseño de contenidos y evaluación para cursos a dista...
Perspectiva docente del diseño de contenidos y evaluación para cursos a dista...
 

Recently uploaded

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 

Recently uploaded (20)

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 

Accelerating GWAS epistatic interaction analysis methods

  • 1. http://www.mrsymbiomath.eu This work has been partially supported by the Mr.Symbiomath IAPP (Project Code: 324554); the ‘Plataforma de Recursos Biomoleculares y Bioinformaticos (ISCIII-PT13.0001.0012)’ and ‘Proyecto de Excelencia Junta de Andalucia (P10-TIC-6108)’ Alex Upton1, Priscill Orue1, Oswaldo Trelles1,2 1 Computer Architecture Department, University of Malaga (UMA), Spain 2 RISC Software GmbH. Hagenberg, Austria Abstract It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation [1]. Multi-SNP interactions, also known as epistatic interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. However, epistatic analysis methods are both computationally expensive, and have limited accessibility for biologists wanting to analyse GWAS datasets due to being command line based. Here we present APPistatic, a prototype desktop version of a pipeline for epistatic analysis of GWAS datasets. This application combines ease-of-use, via a GUI, with accelerated implementation of BOOST [2] and FaST-LMM [3] epistatic analysis methods. Pipeline Conclusions • Implementation of the analysis methods via a GUI results in improved accessibility, thereby making epistatic analysis tools a viable option for end users such as biologists that are not comfortable with command line based tools. This allows further analysis of GWAS data sets, potentially building on existing analysis and resulting in additional genetic information being discovered. • Notable improvement in execution time also obtained, compared to default execution of epistatic analysis tools. Future HPC deployment makes typical GWAS data set analysis feasible; a relatively small GWAS dataset, with 100,000 SNPs that pass quality control, has 5x10-9 pairwise interactions, that would take approximately two years to calculate on a desktop computer. Using HPC, this can be executed in a number of days, aiding in the analysis of genetic variants of disease. • In addition, a cloud-based version of the pipeline could also be developed using Web services, which could be accessed via a client such as jORCA [6]. Cloud Computing allows researchers to rent computational and storage resources on an ad-hoc basis for large scale data processing, allowing access to High Performance Computing. Furthermore, this implementation could join up with existing cloud-based pipelines to create an all-in-one process. Additionally, we are exploring the option of exporting results directly to visualisation software for visual inspection of the results. Accessibility Analysis of GWAS Data The application provides an easy-to-use all-in-one analysis of GWAS data by incorporating a number of analysis steps which are shown in Figure 1 below. Steps Involved (1) End user loads GWAS files of interest. These can be either in VCF or PLINK format. For end users with raw .CEL files, one recommended tool for obtaining VCF files is the Cloud-based GWAS Analysis Pipeline for Clinical Researchers [4]. (2) Prior to epistatic analysis, it is of interest to carry out single SNP association analysis. This is performed using the widely used tool PLINK [5]. (3) The next step is to carry out an epistatic analysis using an optimised implementation of BOOST that takes advantage of the multi-core environment of modern computers. (4) The next step is to use the FaST-LMM [3] analysis tools. Prior to using these, the user files have to be converted to ensure compatibility. This is carried out in this step. (5) The next step is to carry out a single SNP association analysis with FaST-LMM, that corrects for population structure. (6) The final step is to carry out an epistatic analysis using FaST-LMM. As with BOOST, implementation has been optimised to take advantage of multiple cores. Acceleration Desktop PC Implementation The execution of APPistatic on a typical desktop PC results in a speedup of between 4 and 8 times for epistatic analysis, depending on the number of cores. The screenshot above shows the default acceleration, using 4 tasks and 256MB RAM per task. HPC Implementation Greater speedup making the analysis of typical GWAS datasets feasible is obtained by using High Performance Computing (HPC). Initial HPC deployment using 100 cores shows a promising speedup of over 114 times. Table 2 below shows the execution times for BOOST and FaST-LMM epistatic analysis for a demo data set for both a typical desktop PC running Windows, and initial HPC deployment. It should be noted that the demo data set contains 10,000 SNPs. The faster execution time of BOOST is due to the use of a linear regression model, compared to the linear mixed method model used by FaST-LMM. Computational Environment BOOST Epistatic Execution Time (s) FaST-LMM Epistatic Execution Time (s) Standard Implementation (a) 25.4 15123 Appistatic Deployed on Desktop PC (b) 4.8 1903 Deployment on HPC (c) 1.2 132 (a) Default execution of applications from command line on Desktop PC (detailed below) (b) Desktop PC with Intel Core 2 Quad 2.66 GHz CPU and 4GB RAM running Windows 7 (c) Split into 100 tasks with 4 cores and 8GB ram assigned to each task References [1] Anunciação, Orlando, Susana Vinga, and Arlindo L. Oliveira. "Using Information Interaction to Discover Epistatic Effects in Complex Diseases." PloS one 8, no. 10 (2013): e76300. [2] Wan, Xiang, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson LS Tang, and Weichuan Yu. "BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies." The American Journal of Human Genetics 87, no. 3 (2010): 325-340. [3] Lippert, Christoph, Jennifer Listgarten, Ying Liu, Carl M. Kadie, Robert I. Davidson, and David Heckerman. "FaST linear mixed models for genome-wide association studies." Nature Methods 8, no. 10 (2011): 833-835. [4] P. Heinzlreiter, J. Perkins, O. Torreñno Tirado, J. Karlsson, A. Mitterecker, M. Blanca and O. Trelles. "A Cloud-based GWAS Analysis Pipeline for Clinical Researchers" 4th International Conference on Cloud Computing and Services Science, CLOSER 2014. [5] Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel AR Ferreira, David Bender, Julian Maller et al. "PLINK: a tool set for whole-genome association and population-based linkage analyses." The American Journal of Human Genetics 81, no. 3 (2007): 559-575. [6] Martín-Requena, Victoria, Javier Ríos, Maximiliano García, Sergio Ramírez, and Oswaldo Trelles. "jORCA: easily integrating bioinformatics Web Services." Bioinformatics 26, no. 4 (2010): 553-559. Figure 1: Overview of Pipeline Graphical User Interface Providing GUI access to epistatic analysis methods, along with single SNP association methods, improves their accessibility as multiple tools are accessed in the same manner, allowing targeted non-expert computer users, e.g. biologists, to easily analyse their GWAS datasets without having to learn different commands for each tool. The GUI is shown in Figure 2 on the left. Note the easily configurable options for acceleration. The prototype version of APPistatic can be downloaded from: Figure 2: Implementation Results http://chirimoyo.ac.uma.es/appistaticFigure 2: APPistatic GUI