Your SlideShare is downloading. ×
0
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)

163

Published on

Presentation at symposium “Scholarly Access to Web Archives: Progress, Requirements and Challenges”, IIPC, April 25, 2013 (Ljubljana, Slovenia). This presentation discusses the results of the WebART …

Presentation at symposium “Scholarly Access to Web Archives: Progress, Requirements and Challenges”, IIPC, April 25, 2013 (Ljubljana, Slovenia). This presentation discusses the results of the WebART project’s first year, in which different research disciplines joined forces to tackle the issue of scholarly access to Web archives. It introduces WebARTist, a novel Web archive search interface, and discusses the potential of scholarly research using Web archives, as well as current barriers to success, based on the experiences gained during a pilot project.

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
163
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. WebART project Web Archive RetrievalTools Jaap Kamps, Richard Rogers, Arjen deVries Paul Doorenbosch, RenéVoorburg,Victor-JanVos Anat Ben-David, Hugo Huurdeman,Thaer Sammar Flickr: LucViatour IIPC symposium “Scholarly Access to Web Archives”, Ljubljana,April 25, 2013
  • 2. WebART project Web Archive RetrievalTools Jaap Kamps, Richard Rogers, Arjen deVries Paul Doorenbosch, RenéVoorburg,Victor-JanVos Anat Ben-David, Hugo Huurdeman,Thaer Sammar Flickr: LucViatour “Facilitating Scholarly Use Of Web Archives” IIPC symposium “Scholarly Access to Web Archives”, Ljubljana,April 25, 2013
  • 3. What are Web archives for?
  • 4. 2012-2016
  • 5. Thaer Samar PhD/programmer Hugo Huurdeman PhD researcher Anat Ben-David Postdoc Arjen deVries Jaap Kamps Richard Rogers Paul Doorenbosch RenéVoorburg Victor-JanVos
  • 6. WebART Goals •Evaluating current curation and selection procedures of Web archives •Getting insights into current use of Web archives •Developing new methods and tools for research using Web archives
  • 7. Flickr: koninklijkebibliotheek KB:Web archive since 2007 Statistics: •4,000+ websites •17,000+ harvests •7+TerabyteSelective approach
  • 8. KB:Web archive since 2007 Statistics: •4,000+ websites •17,000+ harvests •7+TerabyteSelective approach Original image:A N P
  • 9. ”Wayback Machine” interface
  • 10. • WebARTist (pilot - beta 1) • Initial dataset (corpus) • 432 crawls, 16 months (13.64 GB) Full-text search engine KB CommonCrawl+ nu.nl (Dutch news aggregator)
  • 11. WebARTist: Use case • Digital Methods Winter School (Jan. ’13) • Co-design workshop (“Living Lab”) • researchers & developers • first use WebARTist
  • 12. Word frequency analysis 0 100 200 300 400 500 600 700 800 17/05/2011 25/08/2011 03/12/2011 12/03/2012 20/06/2012 28/09/2012 06/01/2013
  • 13. Co-Word Analysis
  • 14. 1 abcnews.go.com1 brucespringsteen.net 1 theverge.com 1 sportamerika.nl 1 reuters.com 1 ebird.org 1 googleblog.blogspot.co.uk 1 presscentre.sony.eu 1 project.wnyc.org 1 bbc.com 1 poynter.org 1 abclocal.go.com 1 en.wikipedia.org 1 nhc.noaa.gov 1 nypost.com 2 earthcam.com 2 maps.google.com 3 hp.com 4 google.org 4 edition.cnn.com Syria Sandy 7 wired.com 7 allthingsd.com 7 abcnews.go.com 7 thesun.co.uk 7 allesoversterrenkunde.nl 8 volkskrant.nl 9 fd.nl 9 nos.nl 9 mobiel.nuvideo.nl 9 guardian.co.uk 10 bit.ly 10 billboard.biz 10 cbsnews.com 11 usmagazine.com 11 variety.com 12 theverge.com 12 people.com 13 Rutte enVerhagen leggen schuld bij PVV 13 telegraaf.nl 14 washingtonpost.com 18 edition.cnn.com 19 bbc.co.uk 20 youtube.com 20 nytimes.com 21 styletoday.nl 21 bloomberg.com 24 thesistools.com 26 hollywoodreporter.com 30 online.wsj.com 30 deadline.com 33 poll.nupubliek.nl 34 spaarrente.nl 39 gamer.nl 48 reuters.com 52 tmz.com 57 open.spotify.com 78 peil.nl 93 gezondheidsnet.nl US Election 4 1 blogs.aljazeera.net 1 youtube.com 1 worldpressphoto.org 1 wikileaks.org 1 washingtonpost.com 1 eubusiness.com 1 vesti.bg 1 trouw.nl 1 #NAME 1 en.wikipedia.org 1 l 1 sana.sy 1 hosted.ap.org 1 shariah4belgium.com 1 nrc.nl 1 guardian.co.uk 1 geopolicity.com 1 nctb.nl 1 rt.com 1 kaspersky.com 2 todayszaman.com 2 volkskrant.nl 2 spaarrente.nl 2 reuters.com 2 peil.nl 2 hrw.org 2 uk.reuters.com 2 cbsnews.com 3 telegraph.co.uk 3 maps.google.nl 4 bbc.co.uk 5 edition.cnn.com 5 aljazeera.com english.alarabiya.net 7 maps.google.com Outlink Analysis
  • 15. Geomapping location Wire service
  • 16. Temporal Image Analyses
  • 17. Timeline
  • 18. Use case analysis (1) •DMI Winter School •Analysis types performed: • Word frequency count, Outlink frequency count • (Visual) Co-Word analysis • Geomapping • “Temporal Analysis”
  • 19. Use case analysis (2) Analysis / visualization: DMI Dorling Map Tool, Gephi, Google Fusion tables, Google Refine, TimelineJS Data processing: Excel, Google Spread- sheets
  • 20. Use case analysis (3) •Basic usage statistics WebARTist 0 7,5 15 22,5 30 Date filter Site filter Collection filter Percentage of queries
  • 21. Use case conclusions (1) •Data quality and quantity • Limited dataset, but many analysis types possible (daily news crawls) • Not always clear what’s in & what’s out • crawl settings (e.g depth), temporal gaps • Data expansion opportunity: • combining datasets (but ...) • e.g. KB, CommonCrawl & IA Completeness Inconsistencies
  • 22. Use case conclusions (2) •Search System • Influence of retrieval algorithms & indexing settings • Recall & Precision: precision issues • Feature request: duplicate handling •Interface • How to convey uncertainty? • How to convey advanced technical features? • e.g. advanced query mechanisms
  • 23. Use case conclusions (3) •Users • High demand for export functions (formats) • (un)familiarity with temporal (archive) search • Trying to utilize “current Web” tools (e.g. link analysis), not applicable to “past Web” • “User search as in (regular) Web search engines” ( see also [Costa & Silva ’11] )
  • 24. Next steps WebART •New prototype ready (~3TB) • faceted search, thumbnail browsing, site categories & advanced metadata •Formal evaluation of pilot project • Web archive critique • Search system •Research scenarios & use cases
  • 25. Future WebART search tools
  • 26. webarchiving.nl @webart12
  • 27. Summary •Introduction WebART & CATCH •Pilot project • WebARTist • DMI Winter School Use Case • Analysis & Conclusions Use Case •The Future Summary
  • 28. WebART project Web Archive RetrievalTools Jaap Kamps, Richard Rogers, Arjen deVries Paul Doorenbosch, RenéVoorburg,Victor-JanVos Anat Ben-David, Hugo Huurdeman,Thaer Sammar Flickr: LucViatour IIPC symposium “Scholarly Access to Web Archives”, Ljubljana,April 25, 2013

×