• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
interPopula
 

interPopula

on

  • 653 views

 

Statistics

Views

Total Views
653
Views on SlideShare
653
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    interPopula interPopula Presentation Transcript

    • interPopula: Database and tool integration for population genetics With a focus on the HapMap project ˜ Tiago Rodrigues Antao http://popgen.eu/soft/interPop tiagoantao@gmail.com Liverpool School of Tropical Medicine, UK interPopula – p.
    • Preamble – the HapMap project (and UCSC Known Genes) interPopula – p.
    • HapMap The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation. The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease, and responses to drugs and environmental factors. The information produced by the Project will be made freely available. http://hapmap.ncbi.nlm.nih.gov/ interPopula – p.
    • What is there? 11 pops, 90–180 individuals/pop (some cases with family trios), >3M SNPs Frequencies (e.g. for population P and SNP S, there are 30% of As and 70% of Cs) Genotypes (data per individual) Phasing data Pedigree info LD (linkage disequilibrium) computations Copy Number Variation (CNV) info – New! A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851-861. 2007. interPopula – p.
    • UCSC Known Genes A gene set constructed by an automated process, based on protein data from Swiss-Prot/TrEMBL (UniProt) and the associated mRNA data from Genbank Inside UCSC Genome Browser http://genome.ucsc.edu/ Not only for humans (but options limited, less than a handful of species) Really useful for HapMap data (allows to relate SNPs with gene information in a much easier way than Entrez SNP) Hsu et al, Bioinformatics, 2006 22(9):1036-1046 (but see Genome Browser updates on NAR) interPopula – p.
    • We now return to our regularly scheduled program – interPopula interPopula – p.
    • Introduction – 1 A Python library to access HapMap and UCSC Known Genes data A set of scripts providing integration examples. Integrating interPopula with Biopython, matplotlib, Genepop and Entrez SNP. Interaction with the ecology of PopGen databases and Python tools encouraged A set of guidelines to deal with inconsistencies across databases Very easy to use, many examples For Perl: Ensembl Variation API (Rios et al. BMC Bioinformatics 2010, 11:238) interPopula – p.
    • Introduction – 2 Python (2.6) based. Test coverage very high Uses sqlite (Python built-in, no extra dependencies) Creates a local SQL database from ftp data files Can be disk and network intensive Intelligent download: on-demand and never repeats the same data twice Database not normalized (for perfomance and space reasons) Family support (triage of offspring) Data export (Genepop). X and Y aware. interPopula – p.
    • HapMap example To have a feel of the interface... freqDB = Frequency() freqDB.requireChrPop(chr, pop) RSs = freqDB.getRSsForInterval(chr, startPos, endPos) for rs in RSs: #We get frequency information freqSNP = freqDB.getPopSNPs(pop, rs) nuc1, nuc2 = freqSNP[5], freqSNP[6] a1a1, a2a2, a1a2 = freqSNP[7], freqSNP[8], freqSNP[9] interPopula – p.
    • UCSC Known Genes support Everything is supported (not that much, just a long text file plus a link table) Get different IDs (Ascension ID, Prot ID, other links) What is near a certain genomic position (chromosome and position in chromosome) Get exons for a certain gene interPopula – p. 1
    • Integration Many examples provided on interoperability (with matplotlib, Entrez SNP, Genepop and Biopython) Integrating heterogeneous databases Databases do use different reference assemblies Example: The exon positions given by the last version of UCSC Table Browser are not compatible with HapMap (v37 vs v36) Silent bug where rarely applications crash and results seem correct This issue is discussed in the context of HapMap/TableBrowser/EntrezSNP and might be useful in other cases interPopula – p. 1
    • Examples – Known Genes interPopula – p. 1
    • Examples – HapMap/Integration interPopula – p. 1
    • Future work Focus on HapMap and maybe 1000 Genomes project The whole UCSC Table Browser will be spin off later in a different project Copy Number Variation support (since June on HapMap) Phasing support due very soon (like next week) Provide examples with genome wide association studies interPopula – p. 1