hoard.it : Stealing your data
Or... “Where is your online value?”
Or... “Originality sucks”
Dan Zambonini
www.boxuk.com

M...
WARNING
WARNING
1. I am playing Devil’s Advocate

2. These are‘thoughts in progress’
Introduction
1. The hoard.it project

2. Museums and the Web:
   where’s the value?
Introduction
1. The hoard.it project

2. Museums and the Web:
   where’s the value?
2.5 - 15%
2.5 - 15%
Cross-Collections Projects

  “Search through the cultural collections of Europe”



            “explore and comment on c...
Why is this a Problem?
1. Some duplication of effort
  • £25,000 - £100,000 to put collections online
  • £1,500 - £6,500 ...
Our Approach
• Use data that already exists
   • No cost/duplication of effort
• No input or changes from museums
   • Lig...
How it works
Screen-Scraper + Spider
How it works
Screen-Scraper + Spider
How it works
Screen-Scraper + Spider
Difficulties and Limitations
•   Must have collections online
•   Must have a consistent template
•   Slow; not real-time
...
Difficulties: Normalization
•   Dates
    •   circa 19th century, 1960s, 2008-01, 1Jan ’52, 2000 BC, 30s, April 4 1934,
  ...
The Data
   Virtual Museum of Canada!

     Carnegie Museum of Art!

          Smithsonian NASM!

 National Museum of Aust...
The Data
   Virtual Museum of Canada!

     Carnegie Museum of Art!

          Smithsonian NASM!

 National Museum of Aust...
The Data
 • URL            100%
 • Identifier     95%
 • Title          100%
 • Description    70%
 • Image          85%
 ...
Data Mining - Location
                                       65%   Europe
                                       15%   As...
% of objects by continent of origin!




             0!
                  10!
                        20!
               ...
% of objects by material!




                      0!
                           5!
                                10!
 ...
How it has been used
•   Experiments: http://hoard.it/labs/




•   UK Museums on the
    Web 2008 Hack Day


•   Who know...
How it has been used
Next steps...
Next steps...


 ABSOLUTELY
  NOTHING
Do you offer anything?
dbPedia, Freebase
What can you offer?
•   Expertise
•   Media
•   The Physical Space
•   Reputation and Trust
•   Audience
•   Voice, Exposu...
What’s changed?
“...not all information should flow everywhere; only the
meaningful should be transmitted.

But in the net...
What’s changed?


                  !quot;#$%#$&
!quot;#$%&




                  '($(&
                  )%*+,-%.&




  ...
What’s changed?
What’s changed?

 EXECUTION
    not
   IDEAS
What’s changed?

              !quot;#$%&'()
              *+#,)




                      !quot;#$%&'(
                  ...
UK Newspaper Example
                                ,-./012345quot;
                                 #!quot;
            ...
For example
•   Let your patrons collaborate
•   Let your patrons run your space
•   Give local communities a voice
•   Pr...
What has to change?
•   A focus on proven user needs
•   Re-usable services, not more data
•   Smaller projects
•   Iterat...
How do we get there?
•   Should web projects generate revenue?
•   Don’t be afraid of re-inventing the wheel
•   Demand al...
Summary
•   We stole your data...
•   But then so are lots of other people...
•   So produce value elsewhere.


•   Ideas ...
Thank you
      www.boxuk.com


      dan@boxuk.com


    twitter.com/zambonini
Thank you
      www.boxuk.com


      dan@boxuk.com


    twitter.com/zambonini
Upcoming SlideShare
Loading in...5
×

Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent

1,144

Published on

A presentation from Museums and the Web 2009:

A prototype system that allows the aggregation of data from museum and related Web sites, including object and event records, was rapidly developed. By screen-scraping the existing pages of 17 Web sites, tens of thousands of data records were collected without any technical agreement, investment or consent from the participating institutions. In this paper, we examine the reasons and benefits for aggregating this type of data, how our approach differs to other funded projects that have similar aspirations, and the relative strengths and weaknesses of each. An analysis of the data is presented, showing how the aggregate data set varies by assorted parameters, including location and date. Our work is related to the bigger picture of on-line data publishing, such as Semantic Web technologies, and some suggestions are presented as to how the grand vision of the Semantic Web may be achievable without the complexity.

Session: Technology Strategies [Technology]

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,144
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide







































  • Transcript of "Dan Zambonini and Mike Ellis, hoard.it: Aggregating, displaying and mining object-data without consent"

    1. 1. hoard.it : Stealing your data Or... “Where is your online value?” Or... “Originality sucks” Dan Zambonini www.boxuk.com Museums and the Web 2009, Indianapolis, April 16
    2. 2. WARNING
    3. 3. WARNING 1. I am playing Devil’s Advocate 2. These are‘thoughts in progress’
    4. 4. Introduction 1. The hoard.it project 2. Museums and the Web: where’s the value?
    5. 5. Introduction 1. The hoard.it project 2. Museums and the Web: where’s the value?
    6. 6. 2.5 - 15%
    7. 7. 2.5 - 15%
    8. 8. Cross-Collections Projects “Search through the cultural collections of Europe” “explore and comment on collections” “find and explore digital collections from museums” “Discover cultural objects, collections”
    9. 9. Why is this a Problem? 1. Some duplication of effort • £25,000 - £100,000 to put collections online • £1,500 - £6,500 per cross-collection project 2. Potential end-user confusion 3. Usually only include larger institutions 4. Is there really a need?
    10. 10. Our Approach • Use data that already exists • No cost/duplication of effort • No input or changes from museums • Lightweight, open to all • Re-expose the data programmatically • Enable easy re-use
    11. 11. How it works Screen-Scraper + Spider
    12. 12. How it works Screen-Scraper + Spider
    13. 13. How it works Screen-Scraper + Spider
    14. 14. Difficulties and Limitations • Must have collections online • Must have a consistent template • Slow; not real-time • Technical variations (encoding, standards) • Rudimentary: Flash/Forms a barrier
    15. 15. Difficulties: Normalization • Dates • circa 19th century, 1960s, 2008-01, 1Jan ’52, 2000 BC, 30s, April 4 1934, 04-76, 1783-25-04, 10-11-64, about 200 AD, Victorian, 1100-1150, ... • http://feeds.boxuk.com/convert/date/ • Location • Points of interest, cities, towns, countries, administrative regions, political regions, ancient names, continents, postal codes, co-ordinates, ... • http://developer.yahoo.com/geo/
    16. 16. The Data Virtual Museum of Canada! Carnegie Museum of Art! Smithsonian NASM! National Museum of Australia! National Portrait Gallery! Imperial War Museum! National Museums of Scotland! Ingenious! Museum of London: E20CL! British Museum! Victoria and Albert Museum! National Maritime Museum! Powerhouse! Science Museum! 24 Hour Museum! Freebase: Events! Wikipedia: List of Painters! 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000!
    17. 17. The Data Virtual Museum of Canada! Carnegie Museum of Art! Smithsonian NASM! National Museum of Australia! National Portrait Gallery! Imperial War Museum! National Museums of Scotland! Ingenious! Museum of London: E20CL! British Museum! Victoria and Albert Museum! National Maritime Museum! Powerhouse! Science Museum! 24 Hour Museum! Freebase: Events! Wikipedia: List of Painters! 0! 2000! 4000! 6000! 8000! 10000! 12000! 14000! 16000! 70,000 objects
    18. 18. The Data • URL 100% • Identifier 95% • Title 100% • Description 70% • Image 85% • Creator 50% • Created Date 75% • Copyright 50% • Dimensions 45% • Subject 65% • Location 45% • Materials 65%
    19. 19. Data Mining - Location 65% Europe 15% Asia 14% North America 4% Oceania Percentage of objects from the same continent as museum: • North America: 85% • Europe: 75% • Oceania: 65%
    20. 20. % of objects by continent of origin! 0! 10! 20! 30! 40! 50! 60! 70! 80! 90! -1000! -900! -800! -700! -600! -500! -400! -300! -200! -100! 0! 100! 200! 300! 400! 500! Year! 600! 700! 800! 900! 1000! 1100! 1200! 1300! 1400! 1500! 1600! 1700! 1800! 1900! 2000! Asia! Africa! Europe! Oceania! North America! South America! Data Mining - Date/Location
    21. 21. % of objects by material! 0! 5! 10! 15! 20! 25! 30! 35! 40! 0! 10 0! 20 0! 30 0! 40 0! 50 0! 60 0! 70 0! 80 0! 90 0! 10 00 ! Year! 11 0 0! 12 00 ! 13 00 ! 14 00 ! 15 00 ! 16 00 ! 17 00 ! 18 00 ! 19 00 ! 20 00 ! Clay! Gold! Silver! Stone! Data Mining - Date/Material
    22. 22. How it has been used • Experiments: http://hoard.it/labs/ • UK Museums on the Web 2008 Hack Day • Who knows...? Photo courtesy of Brian Kelly
    23. 23. How it has been used
    24. 24. Next steps...
    25. 25. Next steps... ABSOLUTELY NOTHING
    26. 26. Do you offer anything? dbPedia, Freebase
    27. 27. What can you offer? • Expertise • Media • The Physical Space • Reputation and Trust • Audience • Voice, Exposure and Influence
    28. 28. What’s changed? “...not all information should flow everywhere; only the meaningful should be transmitted. But in the network economy only signals in real time (or close to it) are truly meaningful. Examine the speed of knowledge in your system. How can it be brought closer to real time? If this requires the cooperation of subcontractors, distant partners, and far- flung customers, so much the better.” Kevin Kelly http://www.kk.org/newrules/blog/2009/04/if-you-are-not-in-real-time-yo.php
    29. 29. What’s changed? !quot;#$%#$& !quot;#$%& '($(& )%*+,-%.& '()%&
    30. 30. What’s changed?
    31. 31. What’s changed? EXECUTION not IDEAS
    32. 32. What’s changed? !quot;#$%&'() *+#,) !quot;#$%&'( )*#+%$%&'( ,--.**%+%$&'( /0.(1%20&(3.#"4.*( 5.*%26(
    33. 33. UK Newspaper Example ,-./012345quot; #!quot; +quot; *quot; F44:G2.:=quot; 6278925:quot; )quot; (quot; 'quot; H2-1Iquot;JKL.8==quot; &quot; H2-1Iquot;A2-1quot; %quot; H2-1Iquot;A-..4.quot; $quot; H2-1Iquot;CM2.quot; #quot; H2-1Iquot;>8187.2LBquot; !quot; D5-E08quot;D=8.=quot; ;2/8<44:quot;;25=quot; ;-525/-21quot;>-G8=quot; >B8quot;N02.O-25quot; >B8quot;P5O8L85O85Mquot; >B8quot;C05quot; >B8quot;>-G8=quot; 9CCquot;C0<=/.-<8.=quot; >?-@8.quot;;4114?8.=quot; A85345=quot;-5quot;$&quot;B.=quot;
    34. 34. For example • Let your patrons collaborate • Let your patrons run your space • Give local communities a voice • Provide advice and guidance • Collect & distribute niche knowledge • ... • You know better than I do.
    35. 35. What has to change? • A focus on proven user needs • Re-usable services, not more data • Smaller projects • Iterative approaches • A real commitment to the web platform • (At least some) In-house development
    36. 36. How do we get there? • Should web projects generate revenue? • Don’t be afraid of re-inventing the wheel • Demand all projects use/expose APIs that are easy (REST not SOAP/OAI) and publicized • Show early, show often • Annoy funding bodies to support more, smaller, longer (i.e. iterative) ‘boring’ projects, and less ‘big, audacious’ projects.
    37. 37. Summary • We stole your data... • But then so are lots of other people... • So produce value elsewhere. • Ideas are harmful: do what’s proven... • But do it brilliantly. • And to do that, we need change.
    38. 38. Thank you www.boxuk.com dan@boxuk.com twitter.com/zambonini
    39. 39. Thank you www.boxuk.com dan@boxuk.com twitter.com/zambonini

    ×