• Save
Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent
Upcoming SlideShare
Loading in...5
×
 

Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent

on

  • 2,210 views

A presentation from Museums and the Web 2009:...

A presentation from Museums and the Web 2009:

A prototype system that allows the aggregation of data from museum and related Web sites, including object and event records, was rapidly developed. By screen-scraping the existing pages of 17 Web sites, tens of thousands of data records were collected without any technical agreement, investment or consent from the participating institutions. In this paper, we examine the reasons and benefits for aggregating this type of data, how our approach differs to other funded projects that have similar aspirations, and the relative strengths and weaknesses of each. An analysis of the data is presented, showing how the aggregate data set varies by assorted parameters, including location and date. Our work is related to the bigger picture of on-line data publishing, such as Semantic Web technologies, and some suggestions are presented as to how the grand vision of the Semantic Web may be achievable without the complexity.

Session: Technology Strategies [Technology]

Statistics

Views

Total Views
2,210
Views on SlideShare
2,103
Embed Views
107

Actions

Likes
0
Downloads
0
Comments
0

6 Embeds 107

http://www.archimuse.com 64
http://www.museumsandtheweb.com 28
http://conference.archimuse.com 10
http://archimuse.com 3
http://www.slideshare.net 1
http://translate.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent Presentation Transcript

  • hoard.it : Stealing your data Or... “Where is your online value?” Or... “Originality sucks” Dan Zambonini www.boxuk.com Museums and the Web 2009, Indianapolis, April 16
  • WARNING
  • WARNING 1. I am playing Devil’s Advocate 2. These are‘thoughts in progress’
  • Introduction 1. The hoard.it project 2. Museums and the Web: where’s the value?
  • Introduction 1. The hoard.it project 2. Museums and the Web: where’s the value?
  • 2.5 - 15%
  • 2.5 - 15%
  • Cross-Collections Projects “Search through the cultural collections of Europe” “explore and comment on collections” “find and explore digital collections from museums” “Discover cultural objects, collections”
  • Why is this a Problem? 1. Some duplication of effort • £25,000 - £100,000 to put collections online • £1,500 - £6,500 per cross-collection project 2. Potential end-user confusion 3. Usually only include larger institutions 4. Is there really a need?
  • Our Approach • Use data that already exists • No cost/duplication of effort • No input or changes from museums • Lightweight, open to all • Re-expose the data programmatically • Enable easy re-use
  • How it works Screen-Scraper + Spider
  • How it works Screen-Scraper + Spider
  • How it works Screen-Scraper + Spider
  • Difficulties and Limitations • Must have collections online • Must have a consistent template • Slow; not real-time • Technical variations (encoding, standards) • Rudimentary: Flash/Forms a barrier
  • Difficulties: Normalization • Dates • circa 19th century, 1960s, 2008-01, 1Jan ’52, 2000 BC, 30s, April 4 1934, 04-76, 1783-25-04, 10-11-64, about 200 AD, Victorian, 1100-1150, ... • http://feeds.boxuk.com/convert/date/ • Location • Points of interest, cities, towns, countries, administrative regions, political regions, ancient names, continents, postal codes, co-ordinates, ... • http://developer.yahoo.com/geo/
  • The Data Virtual Museum of Canada! Carnegie Museum of Art! Smithsonian NASM! National Museum of Australia! National Portrait Gallery! Imperial War Museum! National Museums of Scotland! Ingenious! Museum of London: E20CL! British Museum! Victoria and Albert Museum! National Maritime Museum! Powerhouse! Science Museum! 24 Hour Museum! Freebase: Events! Wikipedia: List of Painters! 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000!
  • The Data Virtual Museum of Canada! Carnegie Museum of Art! Smithsonian NASM! National Museum of Australia! National Portrait Gallery! Imperial War Museum! National Museums of Scotland! Ingenious! Museum of London: E20CL! British Museum! Victoria and Albert Museum! National Maritime Museum! Powerhouse! Science Museum! 24 Hour Museum! Freebase: Events! Wikipedia: List of Painters! 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000! 70,000 objects
  • The Data • URL 100% • Identifier 95% • Title 100% • Description 70% • Image 85% • Creator 50% • Created Date 75% • Copyright 50% • Dimensions 45% • Subject 65% • Location 45% • Materials 65%
  • Data Mining - Location 65% Europe 15% Asia 14% North America 4% Oceania Percentage of objects from the same continent as museum: • North America: 85% • Europe: 75% • Oceania: 65%
  • % of objects by continent of origin! 0! 10! 20! 30! 40! 50! 60! 70! 80! 90! -1000! -900! -800! -700! -600! -500! -400! -300! -200! -100! 0! 100! 200! 300! 400! 500! Year! 600! 700! 800! 900! 1000! 1100! 1200! 1300! 1400! 1500! 1600! 1700! 1800! 1900! 2000! Asia! Africa! Europe! Oceania! North America! South America! Data Mining - Date/Location
  • % of objects by material! 0! 5! 10! 15! 20! 25! 30! 35! 40! 0! 10 0! 20 0! 30 0! 40 0! 50 0! 60 0! 70 0! 80 0! 90 0! 10 00 ! Year! 11 0 0! 12 00 ! 13 00 ! 14 00 ! 15 00 ! 16 00 ! 17 00 ! 18 00 ! 19 00 ! 20 00 ! Clay! Gold! Silver! Stone! Data Mining - Date/Material
  • How it has been used • Experiments: http://hoard.it/labs/ • UK Museums on the Web 2008 Hack Day • Who knows...? Photo courtesy of Brian Kelly
  • How it has been used
  • Next steps...
  • Next steps... ABSOLUTELY NOTHING
  • Do you offer anything? dbPedia, Freebase
  • What can you offer? • Expertise • Media • The Physical Space • Reputation and Trust • Audience • Voice, Exposure and Influence
  • What’s changed? “...not all information should flow everywhere; only the meaningful should be transmitted. But in the network economy only signals in real time (or close to it) are truly meaningful. Examine the speed of knowledge in your system. How can it be brought closer to real time? If this requires the cooperation of subcontractors, distant partners, and far- flung customers, so much the better.” Kevin Kelly http://www.kk.org/newrules/blog/2009/04/if-you-are-not-in-real-time-yo.php
  • What’s changed? !quot;#$%#$& !quot;#$%& '($(& )%*+,-%.& '()%&
  • What’s changed?
  • What’s changed? EXECUTION not IDEAS
  • What’s changed? !quot;#$%&'() *+#,) !quot;#$%&'( )*#+%$%&'( ,--.**%+%$&'( /0.(1%20&(3.#&quot;4.*( 5.*%26(
  • UK Newspaper Example ,-./012345quot; #!quot; +quot; *quot; F44:G2.:=quot; 6278925:quot; )quot; (quot; 'quot; H2-1Iquot;JKL.8==quot; &quot; H2-1Iquot;A2-1quot; %quot; H2-1Iquot;A-..4.quot; $quot; H2-1Iquot;CM2.quot; #quot; H2-1Iquot;>8187.2LBquot; !quot; D5-E08quot;D=8.=quot; ;2/8<44:quot;;25=quot; ;-525/-21quot;>-G8=quot; >B8quot;N02.O-25quot; >B8quot;P5O8L85O85Mquot; >B8quot;C05quot; >B8quot;>-G8=quot; 9CCquot;C0<=/.-<8.=quot; >?-@8.quot;;4114?8.=quot; A85345=quot;-5quot;$&quot;B.=quot;
  • For example • Let your patrons collaborate • Let your patrons run your space • Give local communities a voice • Provide advice and guidance • Collect & distribute niche knowledge • ... • You know better than I do.
  • What has to change? • A focus on proven user needs • Re-usable services, not more data • Smaller projects • Iterative approaches • A real commitment to the web platform • (At least some) In-house development
  • How do we get there? • Should web projects generate revenue? • Don’t be afraid of re-inventing the wheel • Demand all projects use/expose APIs that are easy (REST not SOAP/OAI) and publicized • Show early, show often • Annoy funding bodies to support more, smaller, longer (i.e. iterative) ‘boring’ projects, and less ‘big, audacious’ projects.
  • Summary • We stole your data... • But then so are lots of other people... • So produce value elsewhere. • Ideas are harmful: do what’s proven... • But do it brilliantly. • And to do that, we need change.
  • Thank you www.boxuk.com dan@boxuk.com twitter.com/zambonini
  • Thank you www.boxuk.com dan@boxuk.com twitter.com/zambonini