Friday Seminar 15 10 2004
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
749
On Slideshare
746
From Embeds
3
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 3

http://marguerpie.free.fr 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Vocabulaire contrôlé
  • 16 packages au total 6 pour design
  • Redondance des informations Plus claire et plus lisible Facilement créable (tableur)
  • Facile a lire et a comprenddre créer ->redondance d’information
  • Facile a lire et a comprenddre créer ->redondance d’information
  • Relaxed : allowing the usual mistakes (if they can be identified as well). Strict : file must exactly match the specification. A complete mode , which checks whole data; In that case, the process will not stop if an error is identified. A step-by-step mode : once an error is found, the process will stop, allowing a correction of errors one by one (for small data set or known small error numbers);

Transcript

  • 1. Facilitating Standardization and Exchange of Array Design ADF MAGE-ML Tool Pierre Marguerite – Friday Seminar EBI – Microarray Informatics Team 15 October 2004
  • 2. ADF MAGE-ML Tool
    • Application
      • stand-alone
      • plateform independant
    • Supports:
      • Simple/Complex microarray layout
      • Differents microarray applications
        • gene_expression
        • snp_detection
        • comparative_genomic_hybridization
        • binding_site_identification
        • Others (minimal)
    • Respects Good practices
  • 3. conversion tool
  • 4. MAGE-ML (MAGE-OM) Description Biosequence Array Array Design DesignElement DesignElement
  • 5. MAGE-ML (next)
  • 6. ADF (previous)
  • 7. Array Design File adh adr adc Header contacts Technical Information
  • 8. Array Design File adr adc Reporters Features Feature /Reporter
  • 9. Array Design File Composite Characteristics Map to reporters
  • 10. ADF version differences
    • 3 parts (files) instead of 1
    • As Workbook or text files
    • No Reporter Identifier item
    • No Reporter Group [role] item
    • New Chromosome item
    • New Chromosome_band item
    • New Species item
  • 11.
    • 2 mandatory steps :
      • Validation
      • Conversion
  • 12. Validation
    • File format validation:
    • File content validation
      • Validation of controlled vocabulary
        • MGED ontology terms
        • Approved Databases (Tags, Accession numbers)
      • Automatic curation (when possible)
  • 13. Validation
    • two levels of checking:
      • Relaxed
      • Strict
    • two execution modes :
      • A complete mode
      • A step-by-step mode
    • Error Log : for correction
  • 14. Checking lists (header)
    • File/Data structure checklist:
      • Header file is a tab-delimited-file
      • Item names are correct or can be identified
      • if an item is not identified, it is skipped.
      • All mandatory items are present in the header
    • Data/file content checklist
      • Correct field value format
      • Possible value types:
    • "Integer"
        • "Free Text"
        • "Controlled vocabulary"
        • "MGED ontology term"
        • "DatabaseEntry"
        • "Sequence"
        • "Species"
      • Check single multiple value
  • 15. Checking lists (feature reporter)
    • Feature Reporter file
    • File/Data structure checklist:
      • Header File is correct (structure and data )
      • FeatureReporter file is a tab-delimited-file
      • Header item names are correct (unknown items are skipped)
      • All mandatory items are present. item cardinalities and dependences are correct.
      • Database tags are approved and database accession numbers are correct
      • Item order is correct (Optional, do not fail the checking)
      • Field dependences are correct
    • Data/file content checklist
      • FeatureReporter file structure must be correct
      • Mandatory Field are present. Field cardinalities and field value multiplicities must be correct.
      • Field values are in a mandatory format
        • Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets
        • Database ID are correct
        • Ontology terms are correct (MGED ontology)
        • Sequences are correct following the associated polymer type (DNA, RNA, protein):
        • Integer field values are correct
      • Duplicate features must not exist
      • Duplicate Reporter (equal names) must have the characteristics.
  • 16. Checking lists (composite)
    • CompositeSequence
    • File/Data structure checklist:
      • Feature Reporter file must be correct (structure and data)
      • CompositeSequence file is a tab-delimited-file
      • Header item names are correct. (Unknown items are skipped)
      • All mandatory items are present. Header item cardinalities and dependences are correct
      • Column order is correct (non mandatory)
    • Data/file content checklist
      • Composite file structure must be correct
      • All mandatory fields are present. Field cardinalities are correct
      • Field values are in expected format. Field multiplicity is correct (same as Feature/Reporter)
      • Names in map are reporter or composite sequence names
      • No duplicate CompositeSequences (same names)
  • 17. Checking lists
    • Header item names are correct
    • All mandatory items are present
    • All mandatory fields are present.
    • No Duplicate features
    • Duplicate Reporter (equal names) must have the characteristics.
    • No duplicate CompositeSequences (same names)
    • Names in map are reporter or composite sequence names
  • 18.  
  • 19.  
  • 20. MGED Ontology / DAML+OIL
  • 21. Approved Databases
  • 22. User modes
  • 23. Implementation - technical choices
    • -MAGE-stk
    • JaxB
    • Configuration (default parameters)
    Performance: 4000 features : ~10 minutes
  • 24. Installer - izpack http://www.izforge.com/izpack/
  • 25. http://www.ebi.ac.uk/adf http://www.ebi.ac.uk/adf/
  • 26.