GenomeSnip: Fragmenting the
Genomic Wheel to augment discovery
in cancer research
Maulik R. Kamdar, Aftab Iqbal, Muhammad ...
Outline
 Introduction
 Integrative Genomics
 Genomic Visualization
 Motivation

 Methodology

 Integration of Data S...
Integrative Genomics
 Advanced computation analyses of genomics datasets –
Genome-wide Association studies (GWAS), identi...
Genomic Visualization
Genome browsers navigate across the human genome in an intuitive,
linear fashion. Data points are ma...
Motivation
 With the rise of high-throughput gene sequencing technologies, data
analysis has replaced data generation as ...
GenomeSnip Platform
 A semantic, visual analytics prototype devised to expedite knowledge
exploration and discovery in ca...
Integration of Data Sources
UCSC Genome Browser:
UCSC Genome Browser:
Chromosome Bands
Chromosome Bands
(Ideograms) in the...
Integration of Data Sources
CellBase: Coordinates and
CellBase: Coordinates and
descriptions of the proteindescriptions of...
Integration of Data Sources
Cancer Gene Census:
Cancer Gene Census:
Catalogued oncogenes,
Catalogued oncogenes,
which bear...
Integration of Data Sources
UniProt: Disease-gene
UniProt: Disease-gene
mappings exposed by the
mappings exposed by the
EB...
Integration of Data Sources
Kyoto Encyclopedia of
UniProt: Disease-gene
UniProt: Disease-gene
Kyoto Encyclopedia of
Genes ...
Integration of Data Sources
Pubmed2Ensembl:
UniProt: Disease-gene
UniProt: Disease-gene
Pubmed2Ensembl:
Customized exposed...
Integration of Data Sources
Linked TCGA:Informationof
Pubmed2Ensembl: of
KyotoTCGA:Coordinates
UniProt: Disease-gene
Cance...
Co-occurrence Data Cubes
 Genes mapped with diseases, pathways and publications are
extracted a priori to instantiate co-...
Similarity Measures
 Inspired from Tversky's feature-based similarity measure.
 Similarity between two entities (chromos...
Similarity Measures (contd.)
 The maximum similarity for any chromosome, is calculated by using
the evaluating Equation 1...
Genomic Wheel
 The human genome is laid in a
circular layout.
 Chromosomes form the arcs
on the perimeter, whose
length ...
Genomic Wheel (contd.)
 Hovering the mouse over each
chord displays slice of
co-occurrence data cube.
 Hierarchical cate...
Genomic Wheel (contd.)
 Clicking on each arc, the
represented
segment
is
highlighted and flares out to
display subsequent...
Genomic Wheel (contd.)
 Genes catalogued in the
Cancer Gene Census, whose
gene sequences bear somatic
and
germline
mutati...
Genomic Tracks Display

Select any tumor and
Select any tumor and
associated patients from
associated patients from
Linked...
Genomic Tracks Display

Stacked Display of
Stacked Display of
patients’ Exon
patients’ Exon
Expression (Green) and
Express...
Genomic Tracks Display

Catalogued Cancer-related
Catalogued Cancer-related
mutations extracted from
mutations extracted f...
Genomic Tracks Display
Simultaneous analysis on different genome tracks
Simultaneous analysis on different genome tracks
Genomic Tracks Display
Zooming and Automated Scrolling controls
Zooming and Automated Scrolling controls
Comparative Evaluation
 Empirical Evaluation conducted through aaquestionnaire (http://goo.gl/vnLtX4)
 Empirical Evaluat...
Comparative Evaluation
Genome
Genome
Circos Regulome
Maps
Snip

UCSC

IGV

Savant

Linear
Coordinates













...
Comparative Evaluation
Genome
Genome
Circos Regulome
Maps
Snip
Integration with JBrowse or
Integration with JBrowse or
Gen...
Comparative Evaluation
Genome
Genome
Circos Regulome
Maps
Snip

UCSC

IGV

Savant

Linear
Coordinates







Circular P...
Comparative Evaluation
Genome
Genome
Circos Regulome
Maps
Snip

UCSC

IGV

Savant

Linear
Coordinates













...
Applicability of GenomeSnip
Formulating Improved Hypotheses:
 Integration of the knowledge extracted from GWAS, available...
Future Work on GenomeSnip
 Integrate prostate cancer prediction models into the platform, for
validation using Linked TCG...
Thank You! Questions?
Upcoming SlideShare
Loading in …5
×

GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research

855 views

Published on

Presentation in the Conference on Semantics in Healthcare and Life Sciences (CSHALS) 2014, Boston.
Abstract: Cancer genomics researchers have greatly benefited from high-throughput technologies for the characterization of genomic alterations in patients. These voluminous genomics datasets when supplemented with the appropriate computational tools have led towards the identification of 'oncogenes' and cancer pathways. However, if a researcher wishes to exploit the datasets in conjunction with this extracted knowledge his cognitive abilities need to be augmented through advanced visualizations. In this paper, we present GenomeSnip, a visual analytics platform, which facilitates the intuitive exploration of the human genome and displays the relationships between different genomic features. Knowledge, pertaining to the hierarchical categorization of the human genome, oncogenes and abstract, co-occurring relations, has been retrieved from multiple data sources and transformed a priori. We demonstrate how cancer experts could use this platform to interactively isolate genes or relations of interest and perform a comparative analysis on the 20.4 billion triples Linked Cancer Genome Atlas (TCGA) datasets.

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
855
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Most of the popular genome browsers provide navigation across the human genome in a linear fashion. Whereas automated mining tools are more adept towards the linear genomic analysis, the perceptive faculties of humans are more developed towards interpreting patterns in depth compared to length. Moreover, linear visualizations fail to account for inter- and intra-chromosomal relations, which could be easily interpreted and used by humans but difficult for machines.
  • ‘Snip’ means ‘Clip’ – IN genomics Snipping is a common verb used in conjunction with the chromosome.
    We have combined the salient features of the linear genome browsers and circular plots, and present an interactive alternative displaying only those genomic regions and its inherent relations which are of actual interest to the cancer expert.
    Technologies Why??:
    No dependence on proprietary frameworks like Adobe Flash and Silverlight for interactivity, whereas the support across traditional browsers improves the interoperability.
    D3JS primarily relies on SVG, are suitable for developing interactive visualizations for smaller datasets - the functionality is deeply impacted when rendering larger datasets as SVG stores the rendered objects directly in the browser DOM.
    HTML5 Canvas, which creates a raster graphic of the entire visualization prior to rendering in the browser window.
  • Approach:
    Knowledge retrieval from various data sources
    Determine the co-occurrence matrices between genomic features
    Assemble the generated insights into an aggregated, interactive visualization ‘Genomic Wheel’.
    SNP – Single Nucleotide Polymorphism
    CNV – Copy Number Variation
    LinkedTCGA – 20.4 billion triples
  • Gene identifier conversions were done using HGNC table – this step is required because different data sources use different nomenclatures and ids (EntrezGene, Ensembl or KEGG) to represent the gene.
  • HGTotalDis = 1435, HGTotalPath = 83970, HGTotalPub = 129035
    Total number of co-occurrence pairs extracted from the entire human genome in current version
    = 1,  = 0.4,  = 0.1 (In current version)
    We hope to allow users to decide the weights (, , ) using input controls later.
  • The relative similarity impact is a visualization parameter used in the Genomic Wheel to represent the thickness of the chord connecting two chromosomes on the side of one chromosomal arc.
  • We hope to allow users to decide the weights (, , ) using input controls later.
  • PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX tcga: <http://tcga.deri.ie/schema/>
    SELECT DISTINCT * WHERE {
    <PatientID> tcga:result ?exonResult.
    ?exonResult tcga:chromosome ``17''; tcga:RPKM ?value;
    tcga:start ?start; tcga:stop ?stop
    FILTER(xsd:double(?start) > 37844393 &&
    xsd:double(?stop) < 37884915) }
  • PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX tcga: <http://tcga.deri.ie/schema/>
    SELECT DISTINCT * WHERE {
    <PatientID> tcga:result ?exonResult.
    ?exonResult tcga:chromosome ``17''; tcga:RPKM ?value;
    tcga:start ?start; tcga:stop ?stop
    FILTER(xsd:double(?start) > 37844393 &&
    xsd:double(?stop) < 37884915) }
  • PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX tcga: <http://tcga.deri.ie/schema/>
    SELECT DISTINCT * WHERE {
    <PatientID> tcga:result ?exonResult.
    ?exonResult tcga:chromosome ``17''; tcga:RPKM ?value;
    tcga:start ?start; tcga:stop ?stop
    FILTER(xsd:double(?start) > 37844393 &&
    xsd:double(?stop) < 37884915) }
  • PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX tcga: <http://tcga.deri.ie/schema/>
    SELECT DISTINCT * WHERE {
    <PatientID> tcga:result ?exonResult.
    ?exonResult tcga:chromosome ``17''; tcga:RPKM ?value;
    tcga:start ?start; tcga:stop ?stop
    FILTER(xsd:double(?start) > 37844393 &&
    xsd:double(?stop) < 37884915) }
  • PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX tcga: <http://tcga.deri.ie/schema/>
    SELECT DISTINCT * WHERE {
    <PatientID> tcga:result ?exonResult.
    ?exonResult tcga:chromosome ``17''; tcga:RPKM ?value;
    tcga:start ?start; tcga:stop ?stop
    FILTER(xsd:double(?start) > 37844393 &&
    xsd:double(?stop) < 37884915) }
  • Jbrowse and GenomeMaps provide a very interactive web-based Genomic Tracks Viewer with the ability to load and remove tracks related to SNPs, exons, etc. in a better way.
    We hope to provide a Data Upload Utility for allowing cancer researchers to upload their own high-throughput data to visually compare against Linked TCGA Datasets.
  • Jbrowse and GenomeMaps provide a very interactive web-based Genomic Tracks Viewer with the ability to load and remove tracks related to SNPs, exons, etc. in a better way.
    We hope to provide a Data Upload Utility for allowing cancer researchers to upload their own high-throughput data to visually compare against Linked TCGA Datasets.
  • Jbrowse and GenomeMaps provide a very interactive web-based Genomic Tracks Viewer with the ability to load and remove tracks related to SNPs, exons, etc. in a better way.
    We hope to provide a Data Upload Utility for allowing cancer researchers to upload their own high-throughput data to visually compare against Linked TCGA Datasets.
  • Jbrowse and GenomeMaps provide a very interactive web-based Genomic Tracks Viewer with the ability to load and remove tracks related to SNPs, exons, etc. in a better way.
    We hope to provide a Data Upload Utility for allowing cancer researchers to upload their own high-throughput data to visually compare against Linked TCGA Datasets.
  • Jbrowse and GenomeMaps provide a very interactive web-based Genomic Tracks Viewer with the ability to load and remove tracks related to SNPs, exons, etc. in a better way.
    We hope to provide a Data Upload Utility for allowing cancer researchers to upload their own high-throughput data to visually compare against Linked TCGA Datasets.
  • Prostate adenocarcinoma, which is one of the most common malignancy to affect men, could be diagnosed through a combined evaluation of an individual's genomic and clinical data on a personalized scale [Boyd, 2012].
  • GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research

    1. 1. GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer research Maulik R. Kamdar, Aftab Iqbal, Muhammad Saleem, Helena F. Deus and Stefan Decker Conference on Semantics in Healthcare and Life Sciences (CSHALS), February 26-28, 2014, Boston
    2. 2. Outline  Introduction  Integrative Genomics  Genomic Visualization  Motivation  Methodology  Integration of Data Sources  Co-occurrence Data Cubes  Similarity Measures  GenomeSnip Platform  Genomic Wheel  Genomic Tracks Display  Comparative Evaluation  Discussion  Applicability  Future Work
    3. 3. Integrative Genomics  Advanced computation analyses of genomics datasets – Genome-wide Association studies (GWAS), identify several susceptible loci (point alterations and oncogene) associated with various types of cancer, and are catalogued.  Network-based approaches isolate genes implicated together (‘Co-occurrence’) in any disease or pathway, i.e. macro-molecular associations on a higher, abstract level.  Linked Data and Semantic Web Technologies address challenges on integration of multiple knowledgebases.  Advantages: Representation using Resource Description Framework (RDF), structured querying, integration with other biomedical datasets, data mining and knowledge extraction.
    4. 4. Genomic Visualization Genome browsers navigate across the human genome in an intuitive, linear fashion. Data points are mapped to the genomic coordinates and datasets are visualized as charts or heatmaps. Disadvantages: The perceptive faculties of humans interpret patterns in depth compared to length. Fail to display inter- and intra-chromosomal relations, which could be easily interpreted and used by humans. Circular plots overcome the cognitive barriers to grasp relations between disjoint genomic features on an abstract level. Disadvantages: Focus on the entire human genome or chromosome. Visualizations rendered as static images and lack interactivity.
    5. 5. Motivation  With the rise of high-throughput gene sequencing technologies, data analysis has replaced data generation as the rate-limiting step for the interpretation of genomic patterns and discovery.  Analyzing genomic datasets using the knowledge extracted through GWAS could help discovering newer tumour risk hypotheses and diagnosing cancer on a personalized basis.  Studying Co-occurrence networks of genes could predict hidden protein-protein interactions (PPI) and functions.  Improved, interactive, intuitive, genomic visualization solutions, which infuse knowledge insights into cancer research, need to be perfected to augment discovery.
    6. 6. GenomeSnip Platform  A semantic, visual analytics prototype devised to expedite knowledge exploration and discovery in cancer research.  Idea: ‘Snip’ the human genome informatively in fragments through interaction with an aggregative, circular visualization, the ‘Genomic Wheel’ (circular) and introspectively analyze the snipped fragments in a ‘Genomic Tracks’ (linear) display.  Technologies: Web-based client application developed using native technologies like HTML5 Canvas, JavaScript and JSON. KineticJS library, an HTML5 Canvas JavaScript framework, is used for node nesting, layering, caching and event handling.  Availability: http://srvgal78.deri.ie/genomeSnip/
    7. 7. Integration of Data Sources UCSC Genome Browser: UCSC Genome Browser: Chromosome Bands Chromosome Bands (Ideograms) in the Human (Ideograms) in the Human Genome Assembly Genome Assembly (GRCh37/hg19, (GRCh37/hg19, Feb 2009) Feb 2009)
    8. 8. Integration of Data Sources CellBase: Coordinates and CellBase: Coordinates and descriptions of the proteindescriptions of the proteincoding genes, genomic coding genes, genomic variants like cancer-related variants like cancer-related mutations (COSMIC) and mutations (COSMIC) and SNPs (dbSNP). SNPs (dbSNP).
    9. 9. Integration of Data Sources Cancer Gene Census: Cancer Gene Census: Catalogued oncogenes, Catalogued oncogenes, which bear somatic and which bear somatic and germline mutations in their germline mutations in their sequences sequences
    10. 10. Integration of Data Sources UniProt: Disease-gene UniProt: Disease-gene mappings exposed by the mappings exposed by the EBI RDF platform. EBI RDF platform.
    11. 11. Integration of Data Sources Kyoto Encyclopedia of UniProt: Disease-gene UniProt: Disease-gene Kyoto Encyclopedia of Genes and Genomes by mappingsGenomes by mappings exposed Genes and exposed the EBI RDF (KEGG): Pathway-gene the EBI RDF (KEGG): Pathway-gene platform.exposedas a Mappings exposed as a Mappings platform. SPARQL Endpoint. SPARQL Endpoint.
    12. 12. Integration of Data Sources Pubmed2Ensembl: UniProt: Disease-gene UniProt: Disease-gene Pubmed2Ensembl: Customized exposed by mappings service mappings service Customized exposed by the EBI to provide geneextended RDF the EBI to provide geneextended RDF platform. related publication related publication platform. information. information.
    13. 13. Integration of Data Sources Linked TCGA:Informationof Pubmed2Ensembl: of KyotoTCGA:Coordinates UniProt: Disease-gene CancerGenome CellBase: Disease-gene UCSC Genome Pubmed2Ensembl: UniProt: Information CancerEncyclopedia Linked Gene Census: Kyoto Gene Census: CellBase: Coordinates UCSC Encyclopedia related andexposed Customized genomic Genes to theGenomesthe mappings Chromosome Cataloguedexposed by and descriptions ofby Browser:theGenomesthe Customized service mappings genomic Catalogued service related and Genes to and descriptions of Browser: Chromosome extended RDF genes, the EBI RDFDNA oncogenes, provide in alterations likewhich (KEGG): Pathway-gene protein-coding genes, Bands (Ideograms) in extended likeDNA the EBI to which oncogenes, provide alterations to (KEGG): Pathway-gene protein-coding Bands (Ideograms) gene-relatedGenome platform.variants like bearHumanSNPs, CNVs methylations, SNPs, as Mappings exposedlike genomic variantsCNVs a the Humanexposed as a methylations, gene-related and Mappings platform. bear somatic and genomic the somaticGenome publication germlineexpression and gene expression SPARQL Endpoint. in cancer-related Assembly mutations in and gene publication SPARQL germline Endpoint. cancer-related Assembly mutations changes, in cancer information. their sequences mutationscancerpatients, (GRCh37/hg19,patients, changes, in (COSMIC) information. their sequences mutations (COSMIC) (GRCh37/hg19, mapped to (dbSNP). and SNPsthe genomic loci Feb 2009)thegenomic loci mapped to and SNPs Feb 2009) (dbSNP).
    14. 14. Co-occurrence Data Cubes  Genes mapped with diseases, pathways and publications are extracted a priori to instantiate co-occurrence pairs between all possible combinations of genes, after identifier conversions.  A co-occurrence data cube of 3 dimensions (segment 1, segment 2 and data source type) and 2 measures (co-occurrence count and names of mapped resources) is created between the genes, ideograms and chromosomes.  The data cubes are transformed to RDF (N-triples syntax) using RDF Data Cube Vocabulary and are stored in a Triple Store. Chr1 Chr2 Chr3 … Total Dis Chr1 Chr2 Chr3 Path Pub Dis Path Pub Dis Path Pub … Dis Path Pub 28 4049 4522 15 4251 2915 19 4811 2667 … 226 75044 49827 15 4251 2915 14 1662 3589 14 2532 1748 … 155 42513 31424 19 4811 2667 14 2532 1748 16 1338 1390 … 224 44310 26817
    15. 15. Similarity Measures  Inspired from Tversky's feature-based similarity measure.  Similarity between two entities (chromosomes) is a weighted summation of their common features (total number of co-occurrence pairs of contained genes). Sim12 = α * Chr1 I Chr 2 HGTotalDis Dis +β* Chr1 I Chr 2 HGTotalPath Path +γ * Chr 1 I Chr 2 Pub HGTotalPub  HGTotalDis = Total number of gene-gene co-occurrence pairs extracted from the entire Human Genome (HG) for a particular data source. HGTotalDis = 1435, HGTotalPath = 83970, HGTotalPub = 129035 15 4251 2915 Sim12 = α * +β* +γ * 1435 83970 129035
    16. 16. Similarity Measures (contd.)  The maximum similarity for any chromosome, is calculated by using the evaluating Equation 1, with the total number of gene-gene cooccurrence pairs, in which genes of Chromosome 1 participate. Chr1TotalDis Chr1TotalPath Chr 1TotalPub Sim1Max = α * +β* +γ * HGTotalDis HGTotalPath HGTotalPub  Evaluating with the values of last column in the data cube 226 75044 49827 Sim1Max = α * +β* +γ * 1435 83970 129035  The relative similarity impact on the side of any chromosome is calculated by dividing the similarity measure with the maximum similarity for the chromosome (Sim12/Sim1max).
    17. 17. Genomic Wheel  The human genome is laid in a circular layout.  Chromosomes form the arcs on the perimeter, whose length is proportional to the size of each chromosome.  The thickness of the connecting chords is proportional to relative similarity impact between connected chromosomes.  α = 1, β = 0.4, γ = 0.1
    18. 18. Genomic Wheel (contd.)  Hovering the mouse over each chord displays slice of co-occurrence data cube.  Hierarchical categorization chromosome, ideogram, gene and cancer point mutations forms layers in the representative arc.  Only the ‘chromosome’ and ‘ideogram’ layer are shown in the initial layout. Chord’s thickness is Chord’s thickness is proportional to relative proportional to relative similarity impact similarity impact between Chromosome 11 between Chromosome and 22on the connected and on the connected side (tapering). side (tapering).
    19. 19. Genomic Wheel (contd.)  Clicking on each arc, the represented segment is highlighted and flares out to display subsequent layers.  At the genetic level the relation chords are represented using distinct colors (Red, green and blue for diseases, pathways and publications respectively) to enable visual discernibility.
    20. 20. Genomic Wheel (contd.)  Genes catalogued in the Cancer Gene Census, whose gene sequences bear somatic and germline mutations responsible for cancer, are represented by shades of red.  Hovering the mouse pointer over any gene displays additional information.
    21. 21. Genomic Tracks Display Select any tumor and Select any tumor and associated patients from associated patients from LinkedTCGA datasets LinkedTCGA datasets
    22. 22. Genomic Tracks Display Stacked Display of Stacked Display of patients’ Exon patients’ Exon Expression (Green) and Expression (Green) and DNA Methylation (Red) DNA Methylation (Red) Results Results
    23. 23. Genomic Tracks Display Catalogued Cancer-related Catalogued Cancer-related mutations extracted from mutations extracted from COSMIC COSMIC
    24. 24. Genomic Tracks Display Simultaneous analysis on different genome tracks Simultaneous analysis on different genome tracks
    25. 25. Genomic Tracks Display Zooming and Automated Scrolling controls Zooming and Automated Scrolling controls
    26. 26. Comparative Evaluation  Empirical Evaluation conducted through aaquestionnaire (http://goo.gl/vnLtX4)  Empirical Evaluation conducted through questionnaire (http://goo.gl/vnLtX4) embedded within the GenomeSnip platform. embedded within the GenomeSnip platform.  44PhD students and 55researchers researching in Biotechnology (primarily protein  PhD students and researchers researching in Biotechnology (primarily protein engineering) and Bioinformatics responded. engineering) and Bioinformatics responded.
    27. 27. Comparative Evaluation Genome Genome Circos Regulome Maps Snip UCSC IGV Savant Linear Coordinates        Circular Plot        Web Interface        Data Upload        Third-party modules TCGA Demonstratio n GWAS Insights Chromosomal Relations Simultaneous Analysis Knowledge Integration                                           Heatma p    Network        Heatma Other p Visualizations Dynamicity 
    28. 28. Comparative Evaluation Genome Genome Circos Regulome Maps Snip Integration with JBrowse or Integration with JBrowse or GenomeMaps for aabetter     GenomeMaps for better Genome Tracks View Genome Tracks View     UCSC IGV Savant Linear Coordinates    Circular Plot    Web Interface        Data Upload        Third-party modules TCGA Demonstratio n GWAS Insights Chromosomal Relations Simultaneous Analysis Knowledge Integration                                           Heatma p    Network        Heatma Other p Visualizations Dynamicity 
    29. 29. Comparative Evaluation Genome Genome Circos Regulome Maps Snip UCSC IGV Savant Linear Coordinates    Circular Plot    Web Interface    Data Upload    Third-party modules TCGA Demonstratio n GWAS Insights Chromosomal Relations Simultaneous Analysis Knowledge Integration                                       Heatma p    Network        Heatma Other p Visualizations Dynamicity        Data Upload  genomicsdata Data Uploadof genomicsdata of  of new patients for visual of new patients for visual    comparison comparison        
    30. 30. Comparative Evaluation Genome Genome Circos Regulome Maps Snip UCSC IGV Savant Linear Coordinates        Circular Plot        Web Interface        Data Upload        Third-party modules TCGA Demonstratio n GWAS Insights Chromosomal Relations Simultaneous Analysis Knowledge Integration                                       Heatma p    Heatma Other p Visualizations Dynamicity    Introduce Heatmap plotsfor Introduce Heatmap plotsfor exon expression data Network   exon expression data      
    31. 31. Applicability of GenomeSnip Formulating Improved Hypotheses:  Integration of the knowledge extracted from GWAS, available literature and pathways, and interpretation through an intuitive visualization, allowing isolation and comparative analysis of interesting genomic segments using the Linked TCGA datasets. Discovering Protein Interactions:  Usage and improved visual discernibility of co-occurrence networks of genes, along with analysis of highly co-occurrent gene pairs using gene/protein expression datasets. Predicting Tumour Risk on a personalized level:  Evaluation of the genomic alterations, expression levels and clinical data of a new patient against those patients registered under the TCGA project, enabling personalized medicine.
    32. 32. Future Work on GenomeSnip  Integrate prostate cancer prediction models into the platform, for validation using Linked TCGA datasets as a training set, and predicting tumour risk in new patients.  Provide an ‘interaction’ overlay by integrating knowledge on proteinprotein interactions (BioGrid), gene co-expression (CoExpresDB) and functions (Gene Ontology).  Extract information like disease variant mutations from UniProt to interlink the ‘point mutations’ layer in the ‘Genomic Wheel’ with other segments.  Improve the granularity of the ‘Genomic Wheel’ and extensively test the user experience and the usability of this platform by conducting a user-driven evaluation.
    33. 33. Thank You! Questions?

    ×