Repositories for Scientific Data: An #animalgarden show (Pecha Kucha) - Peter Murray-Rust

883 views

Published on

Peter Murray-Rust's Pecha Kucha presentation "Repositories for Scientific Data: An #animalgarden show" which was delivered on Friday 2nd August 2013 at the Repository Fringe 2013.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
883
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Repositories for Scientific Data: An #animalgarden show (Pecha Kucha) - Peter Murray-Rust

  1. 1. REPOSITORIES FOR SCIENTIFIC DATA An #animalgarden show Peter Murray-Rust, OKFN and University of Cambridge Chuff OWL Moomin AMI Gulliver Sleepless cleanTux UncleSam
  2. 2. I’m AMI studying biodiversity. I compute phylogenetic trees Only 4% of computed trees are saved I’m in a pear tree.
  3. 3. Where can I put my data? Institutional repos don’t work, we’ve tried WE NEED DOMAIN REPOSITORIES FOR SCIENCE
  4. 4. So how do you manage data? We’re BIG DATA at NASA We hire data experts
  5. 5. But I’m a LONG-TAIL scientist!
  6. 6. Australia have a national data service (ANDS) We could use their TARDIS* Let’s ask the crystallographers. They save their data
  7. 7. I want to publish this paper You MUST send ALL the data. The IUCr will check if it’s correct
  8. 8. It takes years to create vocabularies Core dictionary (coreCIF) version 2.4.3 _diffrn_ambient_temperature Definition: The mean temperature in kelvins at which the intensities were measured. Range: 0.0 -> infinity Type: numb ID For humans For machines: Constraint + type We need domain vocabularies through inter/national efforts
  9. 9. PMRgroup also built a crystal structure repo (Crystaleye) It’s got 200,000 entries But none from Elsevier, Wiley, Springer
  10. 10. And NONE of the results are archived Computational Materials scientists costs 1,000 Million USD / year PMR wrote software to turn FORTRAN into XML
  11. 11. PMR and others have started a global effort to create vocabularies It’s hard and slow work PMR group built compchem repository Chempound XML RDF NoSQL SPARQL
  12. 12. Is PMR making progress? Hoping to work with Obama’s 500 M USD “materials genome”
  13. 13. WE NEED DOMAIN REPOSITORIES FOR BIODIVERSITY
  14. 14. We could use Figshare As long as it’s Open Or OKFN’s CKAN
  15. 15. And we can also do theses! PMR and Ross Mounce will index the whole of published bioscience! 5 years of JISC projects helped
  16. 16. We’re going to index SPECIES, PLACES, DATES I’m a baby Buddleja Davidii
  17. 17. OKFN Chuff! I’m an Okapi balloonii
  18. 18. WE NEED DOMAIN REPOSITORIES FOR SCIENCE Wake up, nearly finished PechaKucha i knackering
  19. 19. Chuff REPOSITORIES FOR SCIENTIFIC DATA An #animalgarden show Peter Murray-Rust, OKFN and University of Cambridge WE NEED DOMAIN REPOSITORIES FOR SCIENCE

×