In your worst nightmares



How experimental scientists are
doing provenance for themselves
Non-st
                     ackup
             esn’t b                andard/
      bly do
Proba                         i...
Non-st
                     ackup
             esn’t b                andard/
      bly do
Proba                         i...
...a typical dataset...
We have...




tp://www.flickr.com/photos/schnurrbart/43568532/ CC-BY-SA
But how did we end up here?




http://www.flickr.com/photos/davidmasters/2884480103/ CC-BY-SA
...we used to be good at this...




© Cell Press, Nature Publishing Group, American Chem
Soc, American Soc Microbiology, ...
When it was on paper...




   ...you had to ask for a copy...

       ...and you said so in the paper...
http://www.flickr...
But in the online world...
 ...too many people
 ...too many files
 ...too much movement


 ...it’s all too hard isn’t it?
h...
But all is not lost...
...because even online researchers
      still care about citation
http://twitter.com/mrgunn/statuses/1542572037
                 http://is.gd/tgaz
http://is.gd/thvE




http://is.gd/thwD
Link to information...

...acknowledge source...

...evolving best practice
http://is.gd/thzK




?
http://is.gd/thAA
Expectations of link behaviour

        Granularity of citation

         Evolving best practice


Some technical problems...
Some real research data...
Published data...   http://is.gd/thCK
Published data...   http://is.gd/thEg
Data summary...   http://is.gd/thEX
Original experiment   http://is.gd/thFa
Versioning...   http://is.gd/thGb
Versioning and provenance...

...through linked open data...

...and third party timestamps
Video



http://is.gd/thMB
URI for every object...

 ...can link in or out

No semantics to links
 (at the moment)
http://is.gd/thVr
Technical solutions...
• Push data to the open web
• Highly granular URIs...repositories for which “the
  file” is not the ...
Social solutions...
• Use the strong culture of citation in
  community
• Leverage the need of researchers to
  track thei...
Problems are primarily
 social, not technical...


....technical solutions are
 needed to make it easy
...but the first problem
  is to tell people why
    they should care...
In your worst nightmares: Provenance
In your worst nightmares: Provenance
In your worst nightmares: Provenance
Upcoming SlideShare
Loading in...5
×

In your worst nightmares: Provenance

1,789

Published on

A talk given at the "Use Cases for Provenance Workshop" at the eSI on April 20 2009

2 Comments
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,789
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
43
Comments
2
Likes
3
Embeds 0
No embeds

No notes for slide

In your worst nightmares: Provenance

  1. 1. In your worst nightmares How experimental scientists are doing provenance for themselves
  2. 2. Non-st ackup esn’t b andard/ bly do Proba inconsi stent data fo rmats Record o f data anal ning? ysis? ersio V
  3. 3. Non-st ackup esn’t b andard/ bly do Proba inconsi stent data fo rmats Record o f data anal ning? ysis? ersio V Uses Excel for data analysis?!?!!
  4. 4. ...a typical dataset...
  5. 5. We have... tp://www.flickr.com/photos/schnurrbart/43568532/ CC-BY-SA
  6. 6. But how did we end up here? http://www.flickr.com/photos/davidmasters/2884480103/ CC-BY-SA
  7. 7. ...we used to be good at this... © Cell Press, Nature Publishing Group, American Chem Soc, American Soc Microbiology, fair use claimed
  8. 8. When it was on paper... ...you had to ask for a copy... ...and you said so in the paper... http://www.flickr.com/photos/nbachiyski/2186228572/ CC-BY
  9. 9. But in the online world... ...too many people ...too many files ...too much movement ...it’s all too hard isn’t it? http://www.flickr.com/photos/antjeverena/3368703708/ CC-BY
  10. 10. But all is not lost...
  11. 11. ...because even online researchers still care about citation
  12. 12. http://twitter.com/mrgunn/statuses/1542572037 http://is.gd/tgaz
  13. 13. http://is.gd/thvE http://is.gd/thwD
  14. 14. Link to information... ...acknowledge source... ...evolving best practice
  15. 15. http://is.gd/thzK ?
  16. 16. http://is.gd/thAA
  17. 17. Expectations of link behaviour Granularity of citation Evolving best practice Some technical problems....mostly social
  18. 18. Some real research data...
  19. 19. Published data... http://is.gd/thCK
  20. 20. Published data... http://is.gd/thEg
  21. 21. Data summary... http://is.gd/thEX
  22. 22. Original experiment http://is.gd/thFa
  23. 23. Versioning... http://is.gd/thGb
  24. 24. Versioning and provenance... ...through linked open data... ...and third party timestamps
  25. 25. Video http://is.gd/thMB
  26. 26. URI for every object... ...can link in or out No semantics to links (at the moment)
  27. 27. http://is.gd/thVr
  28. 28. Technical solutions... • Push data to the open web • Highly granular URIs...repositories for which “the file” is not the atomic concept • Strong versioning and forking functionality...like any halfway decent code repository • Strong identity management solutions for people, projects, organizations • Tools for linking objects
  29. 29. Social solutions... • Use the strong culture of citation in community • Leverage the need of researchers to track their own data properly • A discussion of best practice for citation,
  30. 30. Problems are primarily social, not technical... ....technical solutions are needed to make it easy
  31. 31. ...but the first problem is to tell people why they should care...
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×