Friday Seminar 15 10 2004

654 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
654
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Vocabulaire contrôlé
  • 16 packages au total 6 pour design
  • Redondance des informations Plus claire et plus lisible Facilement créable (tableur)
  • Facile a lire et a comprenddre créer ->redondance d’information
  • Facile a lire et a comprenddre créer ->redondance d’information
  • Relaxed : allowing the usual mistakes (if they can be identified as well). Strict : file must exactly match the specification. A complete mode , which checks whole data; In that case, the process will not stop if an error is identified. A step-by-step mode : once an error is found, the process will stop, allowing a correction of errors one by one (for small data set or known small error numbers);
  • Friday Seminar 15 10 2004

    1. 1. Facilitating Standardization and Exchange of Array Design ADF MAGE-ML Tool Pierre Marguerite – Friday Seminar EBI – Microarray Informatics Team 15 October 2004
    2. 2. ADF MAGE-ML Tool <ul><li>Application </li></ul><ul><ul><li>stand-alone </li></ul></ul><ul><ul><li>plateform independant </li></ul></ul><ul><li>Supports: </li></ul><ul><ul><li>Simple/Complex microarray layout </li></ul></ul><ul><ul><li>Differents microarray applications </li></ul></ul><ul><ul><ul><li>gene_expression </li></ul></ul></ul><ul><ul><ul><li>snp_detection </li></ul></ul></ul><ul><ul><ul><li>comparative_genomic_hybridization </li></ul></ul></ul><ul><ul><ul><li>binding_site_identification </li></ul></ul></ul><ul><ul><ul><li>Others (minimal) </li></ul></ul></ul><ul><li>Respects Good practices </li></ul>
    3. 3. conversion tool
    4. 4. MAGE-ML (MAGE-OM) Description Biosequence Array Array Design DesignElement DesignElement
    5. 5. MAGE-ML (next)
    6. 6. ADF (previous)
    7. 7. Array Design File adh adr adc Header contacts Technical Information
    8. 8. Array Design File adr adc Reporters Features Feature /Reporter
    9. 9. Array Design File Composite Characteristics Map to reporters
    10. 10. ADF version differences <ul><li>3 parts (files) instead of 1 </li></ul><ul><li>As Workbook or text files </li></ul><ul><li>No Reporter Identifier item </li></ul><ul><li>No Reporter Group [role] item </li></ul><ul><li>New Chromosome item </li></ul><ul><li>New Chromosome_band item </li></ul><ul><li>New Species item </li></ul>
    11. 11. <ul><li>2 mandatory steps : </li></ul><ul><ul><li>Validation </li></ul></ul><ul><ul><li>Conversion </li></ul></ul>
    12. 12. Validation <ul><li>File format validation: </li></ul><ul><li>File content validation </li></ul><ul><ul><li>Validation of controlled vocabulary </li></ul></ul><ul><ul><ul><li>MGED ontology terms </li></ul></ul></ul><ul><ul><ul><li>Approved Databases (Tags, Accession numbers) </li></ul></ul></ul><ul><ul><li>Automatic curation (when possible) </li></ul></ul>
    13. 13. Validation <ul><li>two levels of checking: </li></ul><ul><ul><li>Relaxed </li></ul></ul><ul><ul><li>Strict </li></ul></ul><ul><li>two execution modes : </li></ul><ul><ul><li>A complete mode </li></ul></ul><ul><ul><li>A step-by-step mode </li></ul></ul><ul><li>Error Log : for correction </li></ul>
    14. 14. Checking lists (header) <ul><li>File/Data structure checklist: </li></ul><ul><ul><li>Header file is a tab-delimited-file </li></ul></ul><ul><ul><li>Item names are correct or can be identified </li></ul></ul><ul><ul><li>if an item is not identified, it is skipped. </li></ul></ul><ul><ul><li>All mandatory items are present in the header </li></ul></ul><ul><li>Data/file content checklist </li></ul><ul><ul><li>Correct field value format </li></ul></ul><ul><ul><li>Possible value types: </li></ul></ul><ul><li>&quot;Integer&quot; </li></ul><ul><ul><ul><li>&quot;Free Text&quot; </li></ul></ul></ul><ul><ul><ul><li>&quot;Controlled vocabulary&quot; </li></ul></ul></ul><ul><ul><ul><li>&quot;MGED ontology term&quot; </li></ul></ul></ul><ul><ul><ul><li>&quot;DatabaseEntry&quot; </li></ul></ul></ul><ul><ul><ul><li>&quot;Sequence&quot; </li></ul></ul></ul><ul><ul><ul><li>&quot;Species&quot; </li></ul></ul></ul><ul><ul><li>Check single multiple value </li></ul></ul>
    15. 15. Checking lists (feature reporter) <ul><li>Feature Reporter file </li></ul><ul><li>File/Data structure checklist: </li></ul><ul><ul><li>Header File is correct (structure and data ) </li></ul></ul><ul><ul><li>FeatureReporter file is a tab-delimited-file </li></ul></ul><ul><ul><li>Header item names are correct (unknown items are skipped) </li></ul></ul><ul><ul><li>All mandatory items are present. item cardinalities and dependences are correct. </li></ul></ul><ul><ul><li>Database tags are approved and database accession numbers are correct </li></ul></ul><ul><ul><li>Item order is correct (Optional, do not fail the checking) </li></ul></ul><ul><ul><li>Field dependences are correct </li></ul></ul><ul><li>Data/file content checklist </li></ul><ul><ul><li>FeatureReporter file structure must be correct </li></ul></ul><ul><ul><li>Mandatory Field are present. Field cardinalities and field value multiplicities must be correct. </li></ul></ul><ul><ul><li>Field values are in a mandatory format </li></ul></ul><ul><ul><ul><li>Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets </li></ul></ul></ul><ul><ul><ul><li>Database ID are correct </li></ul></ul></ul><ul><ul><ul><li>Ontology terms are correct (MGED ontology) </li></ul></ul></ul><ul><ul><ul><li>Sequences are correct following the associated polymer type (DNA, RNA, protein): </li></ul></ul></ul><ul><ul><ul><li>Integer field values are correct </li></ul></ul></ul><ul><ul><li>Duplicate features must not exist </li></ul></ul><ul><ul><li>Duplicate Reporter (equal names) must have the characteristics. </li></ul></ul>
    16. 16. Checking lists (composite) <ul><li>CompositeSequence </li></ul><ul><li>File/Data structure checklist: </li></ul><ul><ul><li>Feature Reporter file must be correct (structure and data) </li></ul></ul><ul><ul><li>CompositeSequence file is a tab-delimited-file </li></ul></ul><ul><ul><li>Header item names are correct. (Unknown items are skipped) </li></ul></ul><ul><ul><li>All mandatory items are present. Header item cardinalities and dependences are correct </li></ul></ul><ul><ul><li>Column order is correct (non mandatory) </li></ul></ul><ul><li>Data/file content checklist </li></ul><ul><ul><li>Composite file structure must be correct </li></ul></ul><ul><ul><li>All mandatory fields are present. Field cardinalities are correct </li></ul></ul><ul><ul><li>Field values are in expected format. Field multiplicity is correct (same as Feature/Reporter) </li></ul></ul><ul><ul><li>Names in map are reporter or composite sequence names </li></ul></ul><ul><ul><li>No duplicate CompositeSequences (same names) </li></ul></ul>
    17. 17. Checking lists <ul><li>Header item names are correct </li></ul><ul><li>All mandatory items are present </li></ul><ul><li>All mandatory fields are present. </li></ul><ul><li>No Duplicate features </li></ul><ul><li>Duplicate Reporter (equal names) must have the characteristics. </li></ul><ul><li>No duplicate CompositeSequences (same names) </li></ul><ul><li>Names in map are reporter or composite sequence names </li></ul>
    18. 20. MGED Ontology / DAML+OIL
    19. 21. Approved Databases
    20. 22. User modes
    21. 23. Implementation - technical choices <ul><li>-MAGE-stk </li></ul><ul><li>JaxB </li></ul><ul><li>Configuration (default parameters) </li></ul>Performance: 4000 features : ~10 minutes
    22. 24. Installer - izpack http://www.izforge.com/izpack/
    23. 25. http://www.ebi.ac.uk/adf http://www.ebi.ac.uk/adf/

    ×