Parkinson mibbi

354 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
354
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • CAMERA stands for Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis
  • SRF: sequence reads and IDs [the raw DNA sequences comprise base calls, quality values and platform specific information. ]
    AARF: define and format reference genome, consensus genome, absolute and relative assembly (Asim Siddiqui, Gabor Marth and Paul Flicek)

    UHTS quality workgroup: metrics for assessing quality for the different platforms
  • Parkinson mibbi

    1. 1. www.ebi.ac.uk/arrayexpressEBI is an Outstation of the European Molecular Biology Laboratory. MINSEQE standard, data formats, and storage Helen Parkinson, PhD EMBL-EBI
    2. 2. www.ebi.ac.uk/arrayexpress Scope: UHTS Standardization at multiple levels: From sequence reads to interpreting results for transcriptomics expts Sequence reads Sequence Read Format (SRF) http://srf.sourceforge.net/ Alignment / Assembly / Finishing Proposed Assembly and Alignment Format (AAAF) Local data storage Minimum Information About a Genome Sequence (MIGS) Minimum Information about a Metagenomic Sequence/ Sample (MIMS) Minimal Information about a high-throughput SEQuencing Experiment (MINSEQE) e.g., ArrayExpress, GEO Experimental details for sequence identification Experimental details for expression, binding, modifications, sequence changes Public repositories e.g., DDBJ/EMBL/GenBank
    3. 3. www.ebi.ac.uk/arrayexpress MINSEQE Status • Proposal available • http://www.mged.org/minseqe • Data deposition support : • ArrayExpress • GEO • Short read archives • Builds on SRF and proposed AAAF • sequence reads and IDs [the raw DNA sequences comprise base calls, quality values and platform specific information. ] • define and format reference genome, consensus genome, absolute and relative assembly • Asim Siddiqui, Gabor Marth and Paul Flicek • UHTS Quality Metrics Workgroup started • Metrics for assessing quality for the different platforms • Marc Salit (NIST)
    4. 4. www.ebi.ac.uk/arrayexpress MINSEQE Implementation • ArrayExpress support from Jan 2011, algorithm designed and implemented for submissions from AE and GEO • SequenceScape, Sanger data management system, working with other centres • ENA/EGA @ EBI – split of submission ENA/ArrayExpress with in scope data coming direct to AE for curation and scoring. • Bioconductor pipeline for UHTS data using MINSEQE information from AE – submitted to Bioinformatics - ArrayExpressHTS
    5. 5. www.ebi.ac.uk/arrayexpress ArrayExpress implementation for all GEO and AE data • MINSEQE compliant templates generated • Scoring algorithm designed and implemented • Standards applied at the point of submission by curators @ EBI and NCBI GEO • All data scored at EBI, including presence/absence of raw data in ENA, SRA and variables • GUI Modified to include search by MINSEQE scores e.g. All RNA base HTP sequencing experiments with raw data in EGA
    6. 6. www.ebi.ac.uk/arrayexpress Future work • Release MINSEQE supporting GUI for ArrayExpress • Implementation of taxon specific rules for transcriptomics data at the EBI short read archive and SequenceScape • Likely revision based on changes in file formats used and processing SW in the community – SRF not well supported for e.g. • Solicitation of support from journals • Contributors: • FGED (was MGED), scientists, funders, journals, companies, esp. Chris Stoeckert, FGED president • Wellcome Trust Sanger Institute, funders Welcome Trust • EBI – EGA/ArrayExpress/ERA – EC funding • NCBI GEO – NIH

    ×