Grab a bucket!



                                            It’s raining data!
Photo: http://www.flickr.com/photos/peasap...
the...




Painting: “Cassandra,” Evelyn de Morgan
Photo: http://commons.wikimedia.org/wiki/File:Cassandra1.jpeg
         ...
I’ve got nothing against




                                       but the reality was...
Photo: http://www.flickr.com/pho...
... blurrier.
     goals?

                    means?

                               something for nothing?

            ...
the...




         of Data Curation?
What do we know about data?




Photo: http://www.flickr.com/photos/kentbye/2053916246/
There’s a lot of data.




Photo: http://www.flickr.com/photos/noelzialee/2126153623/
Photo: http://www.flickr.com/photos/jonevans/1032687817/



          Data are there to be interacted with.
Data are wildly diverse in nature...




         ... as are their technical environments.
Photo: http://www.flickr.com/pho...
Data are already out there.




Photo: NASA (via http://nasaimages.org/), “Multiwavelength M81”
A lot of data are analog...




                     ... but really want to be digital.
Photo: http://www.flickr.com/photos...
Data are project-based.




http://www.exploringthehyper.net/
Data are sloppy.


Photo: http://www.flickr.com/photos/midorisyu/2622024163/
Data aren’t standardized.




Photo: http://www.flickr.com/photos/mikewade/3463334719/
Our Big Bucket:




 the digital library
Our other Big Bucket:




the institutional repository
Photo: http://www.flickr.com/photos/peasap/655111542/



                                    Impedance mismatches
What do we know about these?
Photo: http://www.flickr.com/photos/schex/193912573/
Carefully built and tended




                   http://www.collectionscanada.gc.ca/naskapi/index-e.html
Production is a Taylorist’s dream.
                     Photo: http://www.flickr.com/photos/villeneuve53/1808995620/
when it isn’t a Taylorist’s nightmare.
Photo: http://www.flickr.com/photos/elsie/97542274/
What do we know about these?
We’re caged up




                                   inside our institutions.
Photo: http://www.flickr.com/photos/annia316...
Photo: http://commons.wikimedia.org/wiki/File:Black_Ford_Model_T_in_HK.JPG




                                           ...
Bring it on; we’ll take anything!




                 ... as long as it’s static and final.
Photo: http://www.flickr.com/p...
Right, anything you’ve got!




                                           ... one file at a time.
Photo: http://www.flickr...
Any look and feel...
Any metadata you want!




                ... as long as it’s key-value pairs.
Photo: http://www.flickr.com/photos/rattodi...
Do anything you want...




                       ... as long as it’s “download.”
Photo: http://www.flickr.com/photos/proc...
Content models




           Enough said.
So where does all that leave us?




Photo: http://www.flickr.com/photos/library_of_congress/2162653769/
Photo: http://www.flickr.com/photos/jonevans/1032687817/



                  We need bigger, better buckets.
Silos are both necessary




                                         and unacceptable.
Photo: http://www.flickr.com/photos...
We have a lot of modeling to do.




                                        And meta-modeling.
Photo: http://www.flickr.co...
We have a lot of code to write.
Photo: http://www.flickr.com/photos/fienna/170559081/
We can’t code or model in isolation.
Photo: http://www.flickr.com/photos/naus3a01/240614578/
Fedora is the new world.




                           But Fedora must change.
Photo: http://www.flickr.com/photos/mythwhi...
Solr brings it all together
Photo: http://www.flickr.com/photos/chantrybee/2911840052/
... the




Vermeer: the Muse Clio, from “The Allegory of Painting”
                                                      ...
Thank you!



This presentation is available under a Creative
Commons Attribution 3.0 United States license.
Upcoming SlideShare
Loading in …5
×

Grab a bucket! It's raining data!

3,646 views

Published on

For Access 2009 conference. Grab a bucket, it's raining data! Library data, research data, primary data, mashed-up data, raw data, cooked data, our data, other people's data... But which bucket should we grab? And can we really, truly fit all the data in one bucket? And don't we risk turning data into sludge if we mix it all together in our bucket? Finding a bucket is the easy part. Grappling with data acquisition, modeling, discovery, and reuse is hard. How will we do it? Can we?

Published in: Technology, Education

Grab a bucket! It's raining data!

  1. 1. Grab a bucket! It’s raining data! Photo: http://www.flickr.com/photos/peasap/655111542/ Dorothea Salo University of Wisconsin Access 2009
  2. 2. the... Painting: “Cassandra,” Evelyn de Morgan Photo: http://commons.wikimedia.org/wiki/File:Cassandra1.jpeg of Open Access
  3. 3. I’ve got nothing against but the reality was... Photo: http://www.flickr.com/photos/y2bk/528300692/
  4. 4. ... blurrier. goals? means? something for nothing? fit between content and container? fit between user needs and system? and so now, I may be becoming Photo: http://www.flickr.com/photos/jennsstuff/2965783700/
  5. 5. the... of Data Curation?
  6. 6. What do we know about data? Photo: http://www.flickr.com/photos/kentbye/2053916246/
  7. 7. There’s a lot of data. Photo: http://www.flickr.com/photos/noelzialee/2126153623/
  8. 8. Photo: http://www.flickr.com/photos/jonevans/1032687817/ Data are there to be interacted with.
  9. 9. Data are wildly diverse in nature... ... as are their technical environments. Photo: http://www.flickr.com/photos/28481088@N00/670258156/
  10. 10. Data are already out there. Photo: NASA (via http://nasaimages.org/), “Multiwavelength M81”
  11. 11. A lot of data are analog... ... but really want to be digital. Photo: http://www.flickr.com/photos/mrbill/3452943573/
  12. 12. Data are project-based. http://www.exploringthehyper.net/
  13. 13. Data are sloppy. Photo: http://www.flickr.com/photos/midorisyu/2622024163/
  14. 14. Data aren’t standardized. Photo: http://www.flickr.com/photos/mikewade/3463334719/
  15. 15. Our Big Bucket: the digital library
  16. 16. Our other Big Bucket: the institutional repository
  17. 17. Photo: http://www.flickr.com/photos/peasap/655111542/ Impedance mismatches
  18. 18. What do we know about these? Photo: http://www.flickr.com/photos/schex/193912573/
  19. 19. Carefully built and tended http://www.collectionscanada.gc.ca/naskapi/index-e.html
  20. 20. Production is a Taylorist’s dream. Photo: http://www.flickr.com/photos/villeneuve53/1808995620/
  21. 21. when it isn’t a Taylorist’s nightmare. Photo: http://www.flickr.com/photos/elsie/97542274/
  22. 22. What do we know about these?
  23. 23. We’re caged up inside our institutions. Photo: http://www.flickr.com/photos/annia316/115439737/
  24. 24. Photo: http://commons.wikimedia.org/wiki/File:Black_Ford_Model_T_in_HK.JPG Any color...
  25. 25. Bring it on; we’ll take anything! ... as long as it’s static and final. Photo: http://www.flickr.com/photos/orblivio/146691405/
  26. 26. Right, anything you’ve got! ... one file at a time. Photo: http://www.flickr.com/photos/jetalone/39990302/
  27. 27. Any look and feel...
  28. 28. Any metadata you want! ... as long as it’s key-value pairs. Photo: http://www.flickr.com/photos/rattodisabina/2460905893/
  29. 29. Do anything you want... ... as long as it’s “download.” Photo: http://www.flickr.com/photos/procsilas/306417902/
  30. 30. Content models Enough said.
  31. 31. So where does all that leave us? Photo: http://www.flickr.com/photos/library_of_congress/2162653769/
  32. 32. Photo: http://www.flickr.com/photos/jonevans/1032687817/ We need bigger, better buckets.
  33. 33. Silos are both necessary and unacceptable. Photo: http://www.flickr.com/photos/jojakeman/2818910104/
  34. 34. We have a lot of modeling to do. And meta-modeling. Photo: http://www.flickr.com/photos/crobj/727348790/
  35. 35. We have a lot of code to write. Photo: http://www.flickr.com/photos/fienna/170559081/
  36. 36. We can’t code or model in isolation. Photo: http://www.flickr.com/photos/naus3a01/240614578/
  37. 37. Fedora is the new world. But Fedora must change. Photo: http://www.flickr.com/photos/mythwhisper/3361907495/
  38. 38. Solr brings it all together Photo: http://www.flickr.com/photos/chantrybee/2911840052/
  39. 39. ... the Vermeer: the Muse Clio, from “The Allegory of Painting” of Data Curation.
  40. 40. Thank you! This presentation is available under a Creative Commons Attribution 3.0 United States license.

×