• Like
Investigating The Role of Non Coding Mutations in Cancer - Dilmi Perera
Upcoming SlideShare
Loading in...5
×

Investigating The Role of Non Coding Mutations in Cancer - Dilmi Perera

  • 431 views
Uploaded on

Medical research has focused primarily on protein-coding mutations owing to the difficulty of interpreting noncoding mutations. However, projects like ENCODE [1] and the Epigenome Atlas [2] have made …

Medical research has focused primarily on protein-coding mutations owing to the difficulty of interpreting noncoding mutations. However, projects like ENCODE [1] and the Epigenome Atlas [2] have made large data sets from various genome wide experiments publicly available which has helped enhance the annotation of the non-coding genome. Integration of these types of data sets have allowed us to better understand the functions of non coding sequences, the rich regulatory models and the epistatic interactions underlying disease associations.

Another reason for interest in non-coding mutations to arise only in the recent past is due to the fact that sequencing cost has dropped so rapidly through out the past few years that it is now feasible to sequence whole genomes of cancer samples for a reasonable cost which enables researchers to catalogue both coding and non-coding mutations. TCGA or the cancer genome atlas is a publicly available database which has over a 1000 such cancer genomes just waiting to be analyzed which makes it a potential gold mine for this type of research.

Aim
Scoring mechanism for somatic mutations within regulatory or non-coding regions to identify/predict possible causal mutations.

Expected Use
Scoring mechanism will help researchers in finding mutations of interest as well as narrowing down the options for selecting mutations to be experimentally verified.

Approach
How interesting is the regions in which the mutation occurs in terms of chromatin structure, epigenetics, conservations etc. The figure below shows the type of mutations of interest.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
431
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Investigating the role of non-coding mutations in Cancer Prince of Wales Clinical School Email: dilmi.perera@student.unsw.edu.au Dilmi Perera, John Pimanda, Jason Wong. Adult Cancer Program, Lowy Cancer Research Centre, University of New South Wales, NSW 2052, Australia Introduction Results Method Medical research has focused primarily on protein-coding mutations owing to the difficulty of interpreting noncoding mutations. However, projects like ENCODE [1] and the Epigenome Atlas [2] have made large data sets from various genome wide experiments publicly available which has helped enhance the annotation of the noncoding genome. Integration of these types of data sets have allowed us to better understand the functions of non coding sequences, the rich regulatory models and the epistatic interactions underlying disease associations. Another reason for interest in non-coding mutations to arise only in the recent past is due to the fact that sequencing cost has dropped so rapidly through out the past few years that it is now feasible to sequence whole genomes of cancer samples for a reasonable cost which enables researchers to catalogue both coding and non-coding mutations. TCGA or the cancer genome atlas is a publicly available database which has over a 1000 such cancer genomes just waiting to be analyzed which makes it a potential gold mine for this type of research. Overview Integrate a collection of data sets to get a more detailed description of the regulatory regions in which the mutation occurs. – DNase-seq data: Is the mutations in a nucleosome free region? – Histone marks: What type of histone marks are in the flanking regions? – Conservation score: Is the mutation in a highly conserved regions? – DNA Methylation: Is the mutation in a methylated site? – Motifs: What motifs were created/removed by the mutation? – Mapping to closest gene: Which gene is the regulatory region most likely associated with? – Gene Expression: Is there a change in gene expression level of mapped gene? Application of the proposed method Data set used : • Catalogue of somatic mutations in 21 breast cancers identified by Peter Campbell’s lab at the Wellcome Trust Sanger Institute. • Somatic mutations identified: – 183,916 base substitutions – 2,869 indels – 1,192 structural variants • Gene Expression data was available for 17 of the 21 samples Chro m Mutation Patient ID Gene Fold Change p-value Distance to TSS Conserva Relative tion conservation DNase1H H3K4m H3K4m H3K27 S e3 e1 ac chr17 13298849 PD3851a HS3ST3A1 2.64 1.76E-12 206395 0 0 1 0 1 1 chr3 50563605 PD3851a CACNA2D2 2.35 0.670158 -22713 0 0 1 0 0 0 chr8 43291380 PD3890a HGSNAT 2.54 2.4E-13 295788 0.383 1.048916 1 0 0 0 chr16 69458536 PD3890a CYB5B 2.97 3.03E-16 38 0 0 1 1 0 1 Expected use Scoring mechanism will help researchers in finding mutations of interest as well as narrowing down the options for selecting mutations to be experimentally verified. 3.75 0.000362 -63051 0 0 1 0 1 1 chr6 137661563 PD3890a IFNGR1 2.07 5.06E-09 -120996 0 0 1 0 1 0 chr1 182808165 PD3890a DHX9 2.20 3.13E-08 -274 0.001 0.027892 1 1 0 1 chr1 182995563 PD3890a LAMC1 2.09 5.68E-08 2968 0 0 1 1 1 1 chr12 5498239 PD3904a NTF3 2.15 0.414065 -43041 0.002 0.97561 1 0 1 0 chr18 26296475 PD3904a CDH2 2.01 0.441215 -539030 0.026 0.706522 1 0 0 0 chr14 34630618 PD3904a EGLN3 2.09 9.54E-08 -210334 0.02 0.14966 1 0 1 1 chr8 37612092 PD3904a PROSC 3.20 1.79E-12 -8009 0.227 4.156338 1 0 1 0 chr8 37612579 PD3904a PROSC 3.20 1.79E-12 -7522 0 0 1 0 1 0 chr8 37727778 PD3904a BRF2 6.43 2.51E-18 -20347 0.001 0.558823 1 0 1 1 chr19 51479891 PD3904a KLK7 2.61 0.174688 7180 0 0 1 0 1 1 chr15 63693712 PD3904a CA12 5.01 0.000173 -19637 0.002 0.025916 1 0 1 0 104348126 PD3904a ALCAM 2.04 0.004936 -737431 0 0 1 0 1 1 131942579 PD3904a NTM 3.30 7.07E-09 161867 0.459 0.657515 1 0 1 1 chr1 Scoring mechanism for somatic mutations within regulatory or non-coding regions to identify/predict possible causal mutations. 128685264 PD3890a MYC chr11 Aim chr8 chr3 Figure 3: Sample output from integrating the different datasets. The final outcome of the integration process in a collection of values for each of the respective mutations which will then be used to calculate the score. 210472959 PD3904a HHAT 3.31 1.36E-09 -28637 0.002 0.005128 1 0 1 0 Approach How interesting is the regions in which the mutation occurs in terms of chromatin structure, epigenetics, conservations etc. The figure below shows the type of mutations of interest. Figure 4: These 2 mutations that were particularly interesting as they both showed very high fold change and were found to have all the other important characteristics of a mutation of interest. ETS COL9A1 Mutation Future Work Figure 2: Flow diagram on the data integration and analysis process Two implementations Web server in which all the data for each cell type will be stored (for biologists) • User input: Upload mutations list, gene expression(optional) to the website and select the cell type. • The output will be a score for each mutation along with all the other relevant details such as motifs histone marks etc. Stand alone application (mostly for bioinformaticians) • Pre-installations : Bedtools [3] and FIMO [4] need to be installed • User input: mutations list, , gene expression(optional) , all the other relevant data for their cell type such as DNase1, Histones etc • The output will be a score for each mutation along with all the other details Figure 1: Overview: The type of non coding mutations that will be identified as potential causal mutations  Scoring mechanism for the mutations  Analysing the data from ENCODE and Epigenome atlas for the different cell types. References [1] [2] [3] [4] The ENCODE (ENCylopedia Of DNA Elements) Project. Science, Vol. 306, Issue 5696, 636-640, 22 October 2004 Bernstein BE et al The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010 Oct;28(10):1045-8 Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842. Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics 27(7):1017–1018, 2011.