Successfully reported this slideshow.
Your SlideShare is downloading. ×

Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
STLPR Trends Web Stats
STLPR Trends Web Stats
Loading in …3
×

Check these out next

1 of 39 Ad

Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent

A presentation from Museums and the Web 2009:

A prototype system that allows the aggregation of data from museum and related Web sites, including object and event records, was rapidly developed. By screen-scraping the existing pages of 17 Web sites, tens of thousands of data records were collected without any technical agreement, investment or consent from the participating institutions. In this paper, we examine the reasons and benefits for aggregating this type of data, how our approach differs to other funded projects that have similar aspirations, and the relative strengths and weaknesses of each. An analysis of the data is presented, showing how the aggregate data set varies by assorted parameters, including location and date. Our work is related to the bigger picture of on-line data publishing, such as Semantic Web technologies, and some suggestions are presented as to how the grand vision of the Semantic Web may be achievable without the complexity.

Session: Technology Strategies [Technology]

A presentation from Museums and the Web 2009:

A prototype system that allows the aggregation of data from museum and related Web sites, including object and event records, was rapidly developed. By screen-scraping the existing pages of 17 Web sites, tens of thousands of data records were collected without any technical agreement, investment or consent from the participating institutions. In this paper, we examine the reasons and benefits for aggregating this type of data, how our approach differs to other funded projects that have similar aspirations, and the relative strengths and weaknesses of each. An analysis of the data is presented, showing how the aggregate data set varies by assorted parameters, including location and date. Our work is related to the bigger picture of on-line data publishing, such as Semantic Web technologies, and some suggestions are presented as to how the grand vision of the Semantic Web may be achievable without the complexity.

Session: Technology Strategies [Technology]

Advertisement
Advertisement

More Related Content

Advertisement

More from museums and the web (20)

Recently uploaded (20)

Advertisement

Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent

  1. 1. hoard.it : Stealing your data Or... “Where is your online value?” Or... “Originality sucks” Dan Zambonini www.boxuk.com Museums and the Web 2009, Indianapolis, April 16
  2. 2. WARNING
  3. 3. WARNING 1. I am playing Devil’s Advocate 2. These are‘thoughts in progress’
  4. 4. Introduction 1. The hoard.it project 2. Museums and the Web: where’s the value?
  5. 5. Introduction 1. The hoard.it project 2. Museums and the Web: where’s the value?
  6. 6. 2.5 - 15%
  7. 7. 2.5 - 15%
  8. 8. Cross-Collections Projects “Search through the cultural collections of Europe” “explore and comment on collections” “find and explore digital collections from museums” “Discover cultural objects, collections”
  9. 9. Why is this a Problem? 1. Some duplication of effort • £25,000 - £100,000 to put collections online • £1,500 - £6,500 per cross-collection project 2. Potential end-user confusion 3. Usually only include larger institutions 4. Is there really a need?
  10. 10. Our Approach • Use data that already exists • No cost/duplication of effort • No input or changes from museums • Lightweight, open to all • Re-expose the data programmatically • Enable easy re-use
  11. 11. How it works Screen-Scraper + Spider
  12. 12. How it works Screen-Scraper + Spider
  13. 13. How it works Screen-Scraper + Spider
  14. 14. Difficulties and Limitations • Must have collections online • Must have a consistent template • Slow; not real-time • Technical variations (encoding, standards) • Rudimentary: Flash/Forms a barrier
  15. 15. Difficulties: Normalization • Dates • circa 19th century, 1960s, 2008-01, 1Jan ’52, 2000 BC, 30s, April 4 1934, 04-76, 1783-25-04, 10-11-64, about 200 AD, Victorian, 1100-1150, ... • http://feeds.boxuk.com/convert/date/ • Location • Points of interest, cities, towns, countries, administrative regions, political regions, ancient names, continents, postal codes, co-ordinates, ... • http://developer.yahoo.com/geo/
  16. 16. The Data Virtual Museum of Canada! Carnegie Museum of Art! Smithsonian NASM! National Museum of Australia! National Portrait Gallery! Imperial War Museum! National Museums of Scotland! Ingenious! Museum of London: E20CL! British Museum! Victoria and Albert Museum! National Maritime Museum! Powerhouse! Science Museum! 24 Hour Museum! Freebase: Events! Wikipedia: List of Painters! 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000!
  17. 17. The Data Virtual Museum of Canada! Carnegie Museum of Art! Smithsonian NASM! National Museum of Australia! National Portrait Gallery! Imperial War Museum! National Museums of Scotland! Ingenious! Museum of London: E20CL! British Museum! Victoria and Albert Museum! National Maritime Museum! Powerhouse! Science Museum! 24 Hour Museum! Freebase: Events! Wikipedia: List of Painters! 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000! 70,000 objects
  18. 18. The Data • URL 100% • Identifier 95% • Title 100% • Description 70% • Image 85% • Creator 50% • Created Date 75% • Copyright 50% • Dimensions 45% • Subject 65% • Location 45% • Materials 65%
  19. 19. Data Mining - Location 65% Europe 15% Asia 14% North America 4% Oceania Percentage of objects from the same continent as museum: • North America: 85% • Europe: 75% • Oceania: 65%
  20. 20. % of objects by continent of origin! 0! 10! 20! 30! 40! 50! 60! 70! 80! 90! -1000! -900! -800! -700! -600! -500! -400! -300! -200! -100! 0! 100! 200! 300! 400! 500! Year! 600! 700! 800! 900! 1000! 1100! 1200! 1300! 1400! 1500! 1600! 1700! 1800! 1900! 2000! Asia! Africa! Europe! Oceania! North America! South America! Data Mining - Date/Location
  21. 21. % of objects by material! 0! 5! 10! 15! 20! 25! 30! 35! 40! 0! 10 0! 20 0! 30 0! 40 0! 50 0! 60 0! 70 0! 80 0! 90 0! 10 00 ! Year! 11 0 0! 12 00 ! 13 00 ! 14 00 ! 15 00 ! 16 00 ! 17 00 ! 18 00 ! 19 00 ! 20 00 ! Clay! Gold! Silver! Stone! Data Mining - Date/Material
  22. 22. How it has been used • Experiments: http://hoard.it/labs/ • UK Museums on the Web 2008 Hack Day • Who knows...? Photo courtesy of Brian Kelly
  23. 23. How it has been used
  24. 24. Next steps...
  25. 25. Next steps... ABSOLUTELY NOTHING
  26. 26. Do you offer anything? dbPedia, Freebase
  27. 27. What can you offer? • Expertise • Media • The Physical Space • Reputation and Trust • Audience • Voice, Exposure and Influence
  28. 28. What’s changed? “...not all information should flow everywhere; only the meaningful should be transmitted. But in the network economy only signals in real time (or close to it) are truly meaningful. Examine the speed of knowledge in your system. How can it be brought closer to real time? If this requires the cooperation of subcontractors, distant partners, and far- flung customers, so much the better.” Kevin Kelly http://www.kk.org/newrules/blog/2009/04/if-you-are-not-in-real-time-yo.php
  29. 29. What’s changed? !quot;#$%#$& !quot;#$%& '($(& )%*+,-%.& '()%&
  30. 30. What’s changed?
  31. 31. What’s changed? EXECUTION not IDEAS
  32. 32. What’s changed? !quot;#$%&'() *+#,) !quot;#$%&'( )*#+%$%&'( ,--.**%+%$&'( /0.(1%20&(3.#"4.*( 5.*%26(
  33. 33. UK Newspaper Example ,-./012345quot; #!quot; +quot; *quot; F44:G2.:=quot; 6278925:quot; )quot; (quot; 'quot; H2-1Iquot;JKL.8==quot; &quot; H2-1Iquot;A2-1quot; %quot; H2-1Iquot;A-..4.quot; $quot; H2-1Iquot;CM2.quot; #quot; H2-1Iquot;>8187.2LBquot; !quot; D5-E08quot;D=8.=quot; ;2/8<44:quot;;25=quot; ;-525/-21quot;>-G8=quot; >B8quot;N02.O-25quot; >B8quot;P5O8L85O85Mquot; >B8quot;C05quot; >B8quot;>-G8=quot; 9CCquot;C0<=/.-<8.=quot; >?-@8.quot;;4114?8.=quot; A85345=quot;-5quot;$&quot;B.=quot;
  34. 34. For example • Let your patrons collaborate • Let your patrons run your space • Give local communities a voice • Provide advice and guidance • Collect & distribute niche knowledge • ... • You know better than I do.
  35. 35. What has to change? • A focus on proven user needs • Re-usable services, not more data • Smaller projects • Iterative approaches • A real commitment to the web platform • (At least some) In-house development
  36. 36. How do we get there? • Should web projects generate revenue? • Don’t be afraid of re-inventing the wheel • Demand all projects use/expose APIs that are easy (REST not SOAP/OAI) and publicized • Show early, show often • Annoy funding bodies to support more, smaller, longer (i.e. iterative) ‘boring’ projects, and less ‘big, audacious’ projects.
  37. 37. Summary • We stole your data... • But then so are lots of other people... • So produce value elsewhere. • Ideas are harmful: do what’s proven... • But do it brilliantly. • And to do that, we need change.
  38. 38. Thank you www.boxuk.com dan@boxuk.com twitter.com/zambonini
  39. 39. Thank you www.boxuk.com dan@boxuk.com twitter.com/zambonini

Editor's Notes








































×