a centre of expertise in data curation and preservation




Create or Receive Scientific data

      Dr. Frank Gibson and ...
a centre of expertise in data curation and preservation




               “In the standard
               model, one coll...
a centre of expertise in data curation and preservation




                                                              ...
a centre of expertise in data curation and preservation




                                                              ...
a centre of expertise in data curation and preservation




                                                              ...
a centre of expertise in data curation and preservation




                                                              ...
If we have a paper                    a centre of expertise in data curation and preservation




who cares about the data...
a centre of expertise in data curation and preservation




                                          A paper = a claim (o...
a centre of expertise in data curation and preservation




                                                              ...
a centre of expertise in data curation and preservation




                    1000+
                Databases




Create...
Biocuration: Databases
            a centre of expertise in data curation and preservation




      Create or Receive
Biocuration: Wiki
          a centre of expertise in data curation and preservation




    Create or Receive
a centre of expertise in data curation and preservation




                                                              ...
a centre of expertise in data curation and preservation




Create or Receive
Funders
                                                   a centre of expertise in data curation and preservation




htt...
a centre of expertise in data curation and preservation




                             Create
                          ...
a centre of expertise in data curation and preservation
Curation aims

           Amenable
           Preservable
        ...
a centre of expertise in data curation and preservation

Significant Properties of Data


                          Conten...
a centre of expertise in data curation and preservation




Content




  Create or Receive
a centre of expertise in data curation and preservation
                                                  Publisher


Type...
Simple Dublin Core     a centre of expertise in data curation and preservation




                                       ...
a centre of expertise in data curation and preservation




Content:
Domain Specific

             Create or Receive
a centre of expertise in data curation and preservation


Syntax




         Create or Receive
a centre of expertise in data curation and preservation




Create or Receive
a centre of expertise in data curation and preservation




        Choosing a Syntax
• Openness
   • -is there an open, p...
a centre of expertise in data curation and preservation


Semantics




            Create or Receive
a centre of expertise in data curation and preservation




   Semantics can be complex

One semantic = many words
Many wo...
a centre of expertise in data curation and preservation




          • Excel data example – do I need it?




           ...
What is fly?      a centre of expertise in data curation and preservation




                                            ...
a centre of expertise in data curation and preservation




             Ontology
• A controlled vocabulary is an associat...
a centre of expertise in data curation and preservation




 Ontologies for Life science
• Emergence has occurred for two ...
a centre of expertise in data curation and preservation




Application of
Significant Properties
In
Proteomics


        ...
a centre of expertise in data curation and preservation



 Minimum Information about a
Proteomics Experiment (MIAPE)
•   ...
a centre of expertise in data curation and preservation




Create or Receive
a centre of expertise in data curation and preservation




Minimum reporting guidelines
                       • Describe...
a centre of expertise in data curation and preservation




     Syntax for proteomics
• The content in MIAPE GE needs to ...
a centre of expertise in data curation and preservation




                        FuGE
•   Model of common components in...
a centre of expertise in data curation and preservation




         UML/XML/RDBMS
• UML gives structure (but not syntax)
...
GelMLa centre of expertise in data curation and preservation




Create or Receive
a centre of expertise in data curation and preservation



Semantics
   for
  Gels




            Create or Receive
Semantics for science
            a centre of expertise in data curation and preservation




      Create or Receive
a centre of expertise in data curation and preservation




Curation of Gel experiments
                                  ...
Discoverability and reuse
              a centre of expertise in data curation and preservation




                      ...
a centre of expertise in data curation and preservation




      Persistent Identifiers
• a name for a resource which wil...
a centre of expertise in data curation and preservation




   Rights management
                          • Difficult to ...
Receiving data for curation
                      a centre of expertise in data curation and preservation




            ...
Who will receive it?                              Route map
                             a centre of expertise in data cur...
a centre of expertise in data curation and preservation




        Meta Route Map
• How to build the map if you don’t hav...
a centre of expertise in data curation and preservation




     Appraise and Select
• Investigates the evaluation and sel...
a centre of expertise in data curation and preservation




       Acknowledgments
• The CARMEN project
  • www.carmen.org...
a centre of expertise in data curation and preservation




Create or Receive
Upcoming SlideShare
Loading in …5
×

Create and recieve scientific data

4,413 views

Published on

A talk given at the DCC digital curation 101 workshop which illustrates how to curate and manage scientific data, considering the content, syntax and semantics of the data

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,413
On SlideShare
0
From Embeds
0
Number of Embeds
2,916
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Create and recieve scientific data

  1. 1. a centre of expertise in data curation and preservation Create or Receive Scientific data Dr. Frank Gibson and Dr. Phillip Lord Frank.Gibson@newcastle.ac.uk Phillip.Lord@newcastle.ac.uk Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc- sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh
  2. 2. a centre of expertise in data curation and preservation “In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.” - Geoffrey Bowker Create or Receive
  3. 3. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  4. 4. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  5. 5. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  6. 6. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  7. 7. If we have a paper a centre of expertise in data curation and preservation who cares about the data? Create or Receive http://flickr.com/photos/nicmcphee/2756494307/
  8. 8. a centre of expertise in data curation and preservation A paper = a claim (or claims) The full record that supports that claim should be available for detailed examination and critique Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  9. 9. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  10. 10. a centre of expertise in data curation and preservation 1000+ Databases Create or Receive
  11. 11. Biocuration: Databases a centre of expertise in data curation and preservation Create or Receive
  12. 12. Biocuration: Wiki a centre of expertise in data curation and preservation Create or Receive
  13. 13. a centre of expertise in data curation and preservation Create or Receive Slide by Cameron Neylon http://www.slideshare.net/CameronNeylon
  14. 14. a centre of expertise in data curation and preservation Create or Receive
  15. 15. Funders a centre of expertise in data curation and preservation http://flickr.com/photos/luismimunoznajar/2093185804/or Create Receive
  16. 16. a centre of expertise in data curation and preservation Create or Receive Create or Receive
  17. 17. a centre of expertise in data curation and preservation Curation aims Amenable Preservable Ownable Accessible Citable Create or Receive
  18. 18. a centre of expertise in data curation and preservation Significant Properties of Data Content Syntax Semantics Create or Receive
  19. 19. a centre of expertise in data curation and preservation Content Create or Receive
  20. 20. a centre of expertise in data curation and preservation Publisher Type Title Creator Source Identifier Date Rights Create or Receive
  21. 21. Simple Dublin Core a centre of expertise in data curation and preservation Type Format Title Identifier Creator Source Subject Language Description Relation Publisher Coverage Contributor Rights Date Create or Receive
  22. 22. a centre of expertise in data curation and preservation Content: Domain Specific Create or Receive
  23. 23. a centre of expertise in data curation and preservation Syntax Create or Receive
  24. 24. a centre of expertise in data curation and preservation Create or Receive
  25. 25. a centre of expertise in data curation and preservation Choosing a Syntax • Openness • -is there an open, publicly available specification for the format; are its specifications in the public domain; is it unencrypted? • Portability • -is the format independent of hardware, operating system, of other software; is it independent of particular institutions, groups, or events; is it in widespread current use; does it contain little or no built-in functionality? • Quality • -is it robust; simple; highly tested; loss-free? Create or Receive
  26. 26. a centre of expertise in data curation and preservation Semantics Create or Receive
  27. 27. a centre of expertise in data curation and preservation Semantics can be complex One semantic = many words Many words = one semantic Create or Receive
  28. 28. a centre of expertise in data curation and preservation • Excel data example – do I need it? Create or Receive •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80 •Zeeberg et al. BMC Bioinformatics 2004 5:80 doi:10.1186/1471-2105-5-80
  29. 29. What is fly? a centre of expertise in data curation and preservation •Fly •Fly •http://en.wikipedia.org/wiki/Image:Air_india_b747-400_vt-esn_arp.jpg •http://en.wikipedia.org/wiki/Image:MuscuDomestica.jpg •Fly •Fly •http://en.wikipedia.org/wiki/Image:Green_Highlander_salmon_fly.jpg •http://en.wikipedia.org/wiki/Image:Fly_poster.jpg Create or Receive
  30. 30. a centre of expertise in data curation and preservation Ontology • A controlled vocabulary is an association between formal names (identifiers) and their definitions. • An ontology is a controlled vocabulary augmented with logical constraints that describe their interrelationships. Create or Receive
  31. 31. a centre of expertise in data curation and preservation Ontologies for Life science • Emergence has occurred for two reasons • Consistent annotation of data • To add meaning and understanding that can be interpreted computationally • Bio-ontologies registered on the OBO foundry Create or Receive
  32. 32. a centre of expertise in data curation and preservation Application of Significant Properties In Proteomics Create or Receive
  33. 33. a centre of expertise in data curation and preservation Minimum Information about a Proteomics Experiment (MIAPE) • Sufficiency. • The MIAPE guidelines should require sufficient information about a dataset and its experimental context to allow a reader to understand and critically evaluate the interpretation and conclusions, and to support their experimental corroboration. • Practicability. • Achieving compliance with MIAPE should not be so burdensome as to prohibit its widespread use. Create or Receive
  34. 34. a centre of expertise in data curation and preservation Create or Receive
  35. 35. a centre of expertise in data curation and preservation Minimum reporting guidelines • Describe content • Implementation independent • Impacts • Publication • Syntax • Semantics Create or Receive
  36. 36. a centre of expertise in data curation and preservation Syntax for proteomics • The content in MIAPE GE needs to be structured to facilitate • dissemination • transfer • storage • A community development process to agree on a syntax • building upon the FuGE data model • A pre-existing community developed representation of scientific experiments • Interoperable Create or Receive
  37. 37. a centre of expertise in data curation and preservation FuGE • Model of common components in science investigations, such as materials, data, protocols, equipment and software. • Provides a framework for capturing complete laboratory workflows, enabling the integration of pre-existing data formats. Create or Receive
  38. 38. a centre of expertise in data curation and preservation UML/XML/RDBMS • UML gives structure (but not syntax) • Very abstract, very general • XML provides a concrete syntax • Meta language is interoperable, checkable, viable and has basic metadata support (language, character coding and so on). • Tends toward the verbose. Not (very) searchable for itself. • Therefore, transfer and archive format. • RDBMS • SQL is (sort of) a standard • Highly computationally amenable form; v. good for searching • Conversion from XML is possible, but in a number of ways. • Hard work – nice to have an off-the-shelf implementation. Create or Receive
  39. 39. GelMLa centre of expertise in data curation and preservation Create or Receive
  40. 40. a centre of expertise in data curation and preservation Semantics for Gels Create or Receive
  41. 41. Semantics for science a centre of expertise in data curation and preservation Create or Receive
  42. 42. a centre of expertise in data curation and preservation Curation of Gel experiments Public Laboratory Data entry and transfer repositories I) GelML data entry tools GelML MAIPE GE II) Direct database submission III) Automated export of GelInfoML MAIPE GI sepCV Create or Receive
  43. 43. Discoverability and reuse a centre of expertise in data curation and preservation •Persistent Identifiers •Rights management Create or Receive
  44. 44. a centre of expertise in data curation and preservation Persistent Identifiers • a name for a resource which will remain the same regardless of where the resource is located • In biology typically assigned to data upon publication • Type of identifier dependent on publication method • Description and Representation Information provides more information about persistent identifiers Create or Receive
  45. 45. a centre of expertise in data curation and preservation Rights management • Difficult to determine • Lots of legal issues • In biology/bioinformatics tends to be open access •Creative commons Create or Receive
  46. 46. Receiving data for curation a centre of expertise in data curation and preservation Content Syntax Semantics Create or Receive
  47. 47. Who will receive it? Route map a centre of expertise in data curation and preservation What are their policies on: Route map Content, Syntax, Semantics Plan your experiment to conform to Content, Syntax, Semantics Implement experiment to; Collect appropriate Content Structure in appropriate Syntax Ensure Semantics are preserved Curate… Create or Receive
  48. 48. a centre of expertise in data curation and preservation Meta Route Map • How to build the map if you don’t have one yet. Create or Receive
  49. 49. a centre of expertise in data curation and preservation Appraise and Select • Investigates the evaluation and selection of data for longterm curation and preservation Create or Receive
  50. 50. a centre of expertise in data curation and preservation Acknowledgments • The CARMEN project • www.carmen.org.uk • The Proteomics Standards Initiative (PSI) • http://psidev.info • Colleagues at Newcastle University • Phillip Lord, Anil Wipat, Allyson Lister Create or Receive
  51. 51. a centre of expertise in data curation and preservation Create or Receive

×