Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
www.cmmt.ubc.ca
JASPAR BioPython & MANTA
Anthony Mathelier, David Arenillas & Wyeth Wasserman
anthony.mathelier@gmail.com ...
2
Outline
● JASPAR BioPython module
– What is JASPAR?
– How to construct matrices from JASPAR files using
the JASPAR BioPy...
3
http://jaspar.genereg.net
Mathelier et al. JASPAR 2014: an extensively expanded and updated open-access database of
tran...
4
Modelling Transcription Factor Binding Sites
(TFBS)
A [ 1 0 19 20 18 1 20 7 ]
C [ 1 0 1 0 1 18 0 2 ]
G [17 0 0 0 1 0 0 3...
5
Scoring putative TFBS sequences
A  [ 1  0 19 20 18  1 20  7 ]
C  [ 1  0  1  0  1 18  0  2 ]
G  [17  0  0  0  1  0  0  3 ...
6
Overview of the JASPAR 2014 database
7
JASPAR Biopython modules
➢ Bio.motifs.jaspar
➢ Read / write motifs encoded in the JASPAR flat file formats:
sites, PFM a...
8
Constructing a matrix from a JASPAR sites
formatted file
The JASPAR sites format consists of a list of known binding sit...
9
Constructing a matrix from a JASPAR pfm
formatted file
The JASPAR pfm format simply describes a frequency matrix for a s...
10
Constructing matrices from a JASPAR jaspar
formatted file
Note the use of the parse rather than the read method to read...
11
Constructing matrices from a JASPAR jaspar
formatted file cont'd
The frequency portions of the file can be specified in...
12
The JASPAR DB module
Connect to a JASPAR database:
Modelled after the Perl TFBS modules*.
Specifically, the Bio.motifs....
13
JASPAR DB module cont'd
Fetch multiple motifs according to various attributes.
Example: fetch the motifs of all the ver...
14
For more information...
For an overview and examples of using these modules, please
see the JASPAR sub-section under th...
15
MANTA
MongoDB for Analysis of TFBS Alteration
Mathelier et al. Cis-regulatory somatic mutations and gene-expression alt...
16
MANTA
DB
...gctaaGTAACAATgcgca...
...cttaaGTAAACATcgctc...
...ccaatGTAAACAAacgga...
Adapted from Szalkowski and Schmid ...
17
MANTA Statistics
ChIP-seq experiments 477
Transcription factors 103
TFBSs 9,510,336
Unique bases covered
76,160,599 (~2...
AMIA TBI&CRI March 19th
-23rd
, 2012 18
18
Variations may impact TF binding
TF
Binding
sequence
Mutated
binding
sequence
T...
19
DNA
TFBS
Assessing the impact of variations on TF binding
20
DNA
SNV
Assessing the impact of variations on TF binding
21
DNA
SNV
Assessing the impact of variations on TF binding
22
DNA
SNV
Assessing the impact of variations on TF binding
23
DNA
SNV
Assessing the impact of variations on TF binding
24
DNA
SNV
Assessing the impact of variations on TF binding
25
DNA
SNV
Record best TFBS hit with
the mutated sequence
Assessing the impact of variations on TF binding
26
DNA
TFBS
0.80 0.85 0.90 0.95 1.00 1.05 1.10
01234567
alt/ref
Density
Assessing the impact of variations on TF binding
27
DNA
SNV
0.80 0.85 0.90 0.95 1.00 1.05 1.10
01234567
alt/ref
Density
Alternative
Assessing the impact of variations on T...
28
Example of Application of MANTA
Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-c...
29
The MANTA Database
Implemented with MongoDB (http://www.mongodb.org)
Consists of 3 collections:
Experiments
- experimen...
30
MANTA DB with Python
Example: connect to MANTA DB and fetch all TFBS affected by an SNV at position 6425005
on chromoso...
31
MANTA Web Interface
URL: http://manta.cmmt.ubc.ca/manta
Source code: https://github.com/wassermanlab/MANTA
32
33
34
Thanks!
Any questions?
Contacts:
Anthony Mathelier, anthony.mathelier@gmail.com
David Arenillas, dave@cmmt.ubc.ca
URLs:...
Upcoming SlideShare
Loading in …5
×

Webinar about JASPAR BioPython module and MANTA.

556 views

Published on

In early 2014 we upgraded JASPAR, the largest open-access, manually curated, database storing transcription factor (TF) binding profiles (PMID:24194598), and are in the process of preparing the 2016 release. A new BioPython module dedicated to accessing and using TF binding profiles stored in JASPAR is available, which we will introduce in the first portion of the webinar.

In the second part of the webinar, we will introduce the MANTA (Mongodb for the ANalysis of Tfbs Alteration) database we used for the analysis of cis-regulatory somatic mutations in B-cell lymphomas (PMID:25903198). The database stores positions of predicted TFBSs in ChIP-seq data using JASPAR TF binding profiles. We will describe the database and how to access and use it.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Webinar about JASPAR BioPython module and MANTA.

  1. 1. www.cmmt.ubc.ca JASPAR BioPython & MANTA Anthony Mathelier, David Arenillas & Wyeth Wasserman anthony.mathelier@gmail.com & dave@cmmt.ubc.ca Wasserman Lab
  2. 2. 2 Outline ● JASPAR BioPython module – What is JASPAR? – How to construct matrices from JASPAR files using the JASPAR BioPython module. ● MANTA – What is stored in MANTA? – How to interrogate the MANTA DB using Python and our web application.
  3. 3. 3 http://jaspar.genereg.net Mathelier et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014 PMID 24194598
  4. 4. 4 Modelling Transcription Factor Binding Sites (TFBS) A [ 1 0 19 20 18 1 20 7 ] C [ 1 0 1 0 1 18 0 2 ] G [17 0 0 0 1 0 0 3 ] T [ 1 20 0 0 0 1 0 8 ] Example: FOXD1 PFM – Position Frequency Matrix Logo gctaaGTAACAATgcgca cttaaGTAAACATcgctc ccaatGTAAACAAacgga gaaagGTAAACAAtgggc GTAAACATgtact cttgtGTAAACAAaaagc cttaaGTAAACACgtccg cttatGTCAACAGtgggt tGTAAACATtgcat GTAAACAAtgcga cttagGTAAACAT tttcgTTAAGTAAaca caaaATAAACAAcgtgc gctaaCTAAACAGagaga gtgttGTAAACATtggaa taatGTAAACAAtgcgg gaaagGTAAACATaagaa cctaaGTAAACACaacgc cctaaGTAAACATt cttatGTAAACAGaggtc Known binding sites
  5. 5. 5 Scoring putative TFBS sequences A  [ 1  0 19 20 18  1 20  7 ] C  [ 1  0  1  0  1 18  0  2 ] G  [17  0  0  0  1  0  0  3 ] T  [ 1 20  0  0  0  1  0  8 ] A  [­1.5 ­2.5  1.7  1.8  1.6 ­1.5  1.8  0.4 ] C  [­1.5 ­2.5 ­1.5 ­2.5 ­1.5  1.6 ­2.5 ­1.0 ] G  [ 1.6 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5 ­2.5 ­0.6 ] T  [­1.5  1.8 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5  0.6 ] A C G A G T T A A A C A A G C T A A  [­1.5 ­2.5  1.7  1.8  1.6 ­1.5  1.8  0.4 ] C  [­1.5 ­2.5 ­1.5 ­2.5 ­1.5  1.6 ­2.5 ­1.0 ] G  [ 1.6 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5 ­2.5 ­0.6 ] T  [­1.5  1.8 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5  0.6 ] Score = 9.2 PFM PWM – Position Weight Matrix PWM Sum score at each position (aka PSSM – Position Specific Scoring Matrix)
  6. 6. 6 Overview of the JASPAR 2014 database
  7. 7. 7 JASPAR Biopython modules ➢ Bio.motifs.jaspar ➢ Read / write motifs encoded in the JASPAR flat file formats: sites, PFM and jaspar ➢ Bio.motifs.jaspar.db ➢ Search / fetch motifs from a JASPAR formatted database. http://biopython.org* *Cock et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009 Jun 1;25(11):1422-3. PMID: 19304878 Extend Biopython's Bio.motifs module to support construction of TFBS matrices from JASPAR supported formats.
  8. 8. 8 Constructing a matrix from a JASPAR sites formatted file The JASPAR sites format consists of a list of known binding sites for a motif.
  9. 9. 9 Constructing a matrix from a JASPAR pfm formatted file The JASPAR pfm format simply describes a frequency matrix for a single motif.
  10. 10. 10 Constructing matrices from a JASPAR jaspar formatted file Note the use of the parse rather than the read method to read multiple motifs. The JASPAR jaspar format allows for multiple motifs. Each record consists of a header line followed by four lines defining the frequency matrix.
  11. 11. 11 Constructing matrices from a JASPAR jaspar formatted file cont'd The frequency portions of the file can be specified in a simpler format identical to the pfm format.
  12. 12. 12 The JASPAR DB module Connect to a JASPAR database: Modelled after the Perl TFBS modules*. Specifically, the Bio.motifs.jaspar.db.JASPAR5 BioPython class is modelled after the TFBS::DB::JASPAR5 perl class. Fetch a specific motif by it's JASPAR ID: * Lenhard et al. TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics. 2002 PMID 12176838
  13. 13. 13 JASPAR DB module cont'd Fetch multiple motifs according to various attributes. Example: fetch the motifs of all the vertebrate and insect transcription factors from the CORE JASPAR collection which are part of the Forkhead family and which have an information content of at least 12 bits: Note that selection criteria (such a 'tax_group' and 'tf_family') which allow multiple values may be specified either as a single value or as a list of values.
  14. 14. 14 For more information... For an overview and examples of using these modules, please see the JASPAR sub-section under the “Reading motifs” section of the BioPython Tutorial and Cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html For more technical information see the Bio.motifs.jaspar section of the BioPython API docs: http://biopython.org/DIST/docs/api
  15. 15. 15 MANTA MongoDB for Analysis of TFBS Alteration Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biology. 2015. PMID 25903198
  16. 16. 16 MANTA DB ...gctaaGTAACAATgcgca... ...cttaaGTAAACATcgctc... ...ccaatGTAAACAAacgga... Adapted from Szalkowski and Schmid (2010). Briefings in Bioinfomatics.
  17. 17. 17 MANTA Statistics ChIP-seq experiments 477 Transcription factors 103 TFBSs 9,510,336 Unique bases covered 76,160,599 (~2.25% of the human genome)
  18. 18. AMIA TBI&CRI March 19th -23rd , 2012 18 18 Variations may impact TF binding TF Binding sequence Mutated binding sequence Transcription initiated Transcription fails to initiate TF recognizes binding site TF fails to recognize binding site Exon Exon 5’ UTR 5’ UTR AGCTAGCTATATTTAAACAACACTGTCTAGCATTGCCTGATAGATGAGCCGTCGCAGCTGGA AGCTAGCTATATTTAATCCACACTGTCTAGCATTGCCTGATAGATGAGCCGTCGCAGCTGGA TFTF
  19. 19. 19 DNA TFBS Assessing the impact of variations on TF binding
  20. 20. 20 DNA SNV Assessing the impact of variations on TF binding
  21. 21. 21 DNA SNV Assessing the impact of variations on TF binding
  22. 22. 22 DNA SNV Assessing the impact of variations on TF binding
  23. 23. 23 DNA SNV Assessing the impact of variations on TF binding
  24. 24. 24 DNA SNV Assessing the impact of variations on TF binding
  25. 25. 25 DNA SNV Record best TFBS hit with the mutated sequence Assessing the impact of variations on TF binding
  26. 26. 26 DNA TFBS 0.80 0.85 0.90 0.95 1.00 1.05 1.10 01234567 alt/ref Density Assessing the impact of variations on TF binding
  27. 27. 27 DNA SNV 0.80 0.85 0.90 0.95 1.00 1.05 1.10 01234567 alt/ref Density Alternative Assessing the impact of variations on TF binding
  28. 28. 28 Example of Application of MANTA Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biology. 2015. PMID
  29. 29. 29 The MANTA Database Implemented with MongoDB (http://www.mongodb.org) Consists of 3 collections: Experiments - experiment name, type, TF name, JASPAR matrix ID, etc. Peaks - peak position (chromosome, start, end), score, position of maximum peak height, etc. TFBSs / SNVs - position (chromosome, start, end), strand, score for the unmutated TFBS plus similar information and impact score for each position / alt. allele mutation.
  30. 30. 30 MANTA DB with Python Example: connect to MANTA DB and fetch all TFBS affected by an SNV at position 6425005 on chromosome 19.
  31. 31. 31 MANTA Web Interface URL: http://manta.cmmt.ubc.ca/manta Source code: https://github.com/wassermanlab/MANTA
  32. 32. 32
  33. 33. 33
  34. 34. 34 Thanks! Any questions? Contacts: Anthony Mathelier, anthony.mathelier@gmail.com David Arenillas, dave@cmmt.ubc.ca URLs: Wasserman Lab: www.cisreg.ca BioPython: http://biopython.org MANTA: manta.cmmt.ubc.ca/manta

×