Telling Stories with Web Archives
Upcoming SlideShare
Loading in...5
×
 

Telling Stories with Web Archives

on

  • 1,738 views

Keynote presentation from the Southeast Women in Computing Conference

Keynote presentation from the Southeast Women in Computing Conference
November 16, 2013
Lake Guntersville State Park, Alabama

Statistics

Views

Total Views
1,738
Views on SlideShare
757
Embed Views
981

Actions

Likes
0
Downloads
4
Comments
0

33 Embeds 981

http://ws-dl.blogspot.com 764
http://ws-dl.blogspot.com.br 36
https://twitter.com 30
http://ws-dl.blogspot.fr 25
http://ws-dl.blogspot.in 24
http://ws-dl.blogspot.de 16
http://ws-dl.blogspot.co.uk 16
http://ws-dl.blogspot.nl 13
http://ws-dl.blogspot.ru 8
http://ws-dl.blogspot.ca 7
http://ws-dl.blogspot.com.es 5
http://ws-dl.blogspot.no 5
http://www.ws-dl.blogspot.ru 4
http://cloud.feedly.com 3
http://ws-dl.blogspot.cz 2
http://ws-dl.blogspot.be 2
http://ws-dl.blogspot.com.ar 2
http://ws-dl.blogspot.it 2
http://ws-dl.blogspot.gr 2
http://ws-dl.blogspot.se 2
http://ws-dl.blogspot.ro 1
http://ws-dl.blogspot.ch 1
http://ws-dl.blogspot.co.at 1
http://ws-dl.blogspot.mx 1
http://ws-dl.blogspot.com.au 1
http://feedly.com 1
http://ws-dl.blogspot.co.nz 1
http://newsblur.com 1
http://ws-dl.blogspot.jp 1
http://flower-inwoven25.rssing.com 1
http://ws-dl.blogspot.ie 1
http://ws-dl.blogspot.kr 1
http://ws-dl.blogspot.sg 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We have seen machine readable lost of URIs, can we automatically create this list?
  • Storify is a social network service that lets the user create stories or timelines using social media such as Twitter, Facebook andInstagram. Storify was launched in September 2010, and has been open to the public since April 2011.http://storify.com/nzherald/muhttp://storify.com/nzherald/mu
  • The problem is that storify operate as bookmarking, it doesn’t preserve the links You have no clue of what the person is saying about the link
  • Which brings overview from wikipedia as a first result
  • Which brings overview from wikipedia as a first result
  • Which brings overview from wikipedia as a first result
  • Which brings overview from wikipedia as a first result
  • But replaying the story as it captured in the news web sites???Three information needsThis one is unserved
  • Three information needsThis one is unservedNow let me tell you a story of egyptian revolution, using a couple of screen shots which appeared in the time of revolution
  • Can we satisfy the information need of rewinding/replaying the events as they appeared in the past?How do we integrate web archives into live web to support storytelling?How do we integrate web archives into live web for repalying news stories as they captured?The research aims to integrate the past with the presentby automatically creating, identifying, and linking storiesculled from the past web that are related to the contentof a live web page or a specic event. This raises some ofthe questions: Can we leverage the content of social mediaservices to discover stories? Can we extract stories basedon user access patterns of the Wayback Machine? Can weassociate the names that people give particular events withtheir datetimes in order to find them in web archives?
  • If we look at different places we will get different URIs that express different prospective of the story Searching two places give us two results
  • I bet that anyone here know the importance of this page. Trust me. This page is very important I know that Egyptian revolution started on this group, I was one of the first who joined this page which had been created in June 10, 2010This is one of the most important pages, I know it because I have background , trust me! This is an important page for the story even if we know that, the current status is not representing the story
  • If we have time frame specified for the event/story, we will use deduping news collections
  • http://www.bartamaha.com/egypts-mubarak-resigns-after-30-year-rule-42593/Handle the duplicates of the news
  • http://web.archive.org/web/20110215202830/http://www.uscatholic.org/blog/2011/02/mubarak-steps-down-egypt
  • http://wayback.archive-it.org/2358/20110204123721/http://abcnews.go.com/International/egypt-abc-news-christiane-amanpour-exclusive-interview-president/story?id=12833673
  • Can we satisfy the information need of rewinding/replaying the events as they appeared in the past?How do we integrate web archives into live web to support storytelling?How do we integrate web archives into live web for repalying news stories as they captured?The research aims to integrate the past with the presentby automatically creating, identifying, and linking storiesculled from the past web that are related to the contentof a live web page or a specic event. This raises some ofthe questions: Can we leverage the content of social mediaservices to discover stories? Can we extract stories basedon user access patterns of the Wayback Machine? Can weassociate the names that people give particular events withtheir datetimes in order to find them in web archives?
  • Archive-It create slides for the seed URIs which is not normally happened by Web archive users as we discovered from the data.Humans exhibit Dip and Dive, while robots exhibit Dip and Skim Combination that humans exhibit (slides and dives)

Telling Stories with Web Archives Telling Stories with Web Archives Presentation Transcript

  • Telling Stories with Web Archives Dr. Michele C. Weigle Web Sciences and Digital Libraries (WS-DL) Lab Department of Computer Science Old Dominion University Norfolk, VA Includes joint work with Dr. Michael L. Nelson and our PhD students, Scott Ainsworth, Yasmin AlNoamany, Ahmed AlSum, Justin Brunelle, Mat Kelly, Hany SalahEldeen Southeast Women in Computing Conference November 16, 2013
  • Outline • What is a web archive? • Why are archives important? • What's my story? • How can we help others tell their stories? • Related WS-DL Projects Southeast Women in Computing Conference - Nov 16, 2013 #SEWIC2013
  • What is a web archive? Southeast Women in Computing Conference - Nov 16, 2013 View slide
  • What are some web archives? Southeast Women in Computing Conference - Nov 16, 2013 View slide
  • How can I access the archives? MementoFox Memento for Chrome http://www.mementoweb.org/ http://ws-dl.blogspot.com/2010/03/2010-03-19-mementofox-add-on-released.html http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html Southeast Women in Computing Conference - Nov 16, 2013
  • Outline • What is a web archive? • Why are archives important? • What's my story? • How can we help others tell their stories? • Related WS-DL Projects Southeast Women in Computing Conference - Nov 16, 2013
  • The Web holds our stories Southeast Women in Computing Conference - Nov 16, 2013
  • But webpages can disappear • Average lifespan of a webpage - 50-100 days • A year after publication, about 11% of content shared on social media will be gone. SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012 http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html Southeast Women in Computing Conference - Nov 16, 2013
  • But maybe it's archived Ainsworth, AlSum, SalahEldeen, Weigle, and Nelson, "How Much of the Web is Archived?", JCDL 2011 http://ws-dl.blogspot.com/2011/06/2011-06-23-how-much-of-web-is-archived.html Southeast Women in Computing Conference - Nov 16, 2013
  • But social media is hard to archive Southeast Women in Computing Conference - Nov 16, 2013
  • Our Research Group Goals • We believe that web archives are valuable cultural resources, and we want everyone to know about them. • We want to make it easy for people to bridge the gap between the live web and the archives. • We believe that replaying the past is more compelling than reading a summary. Southeast Women in Computing Conference - Nov 16, 2013
  • vs. Southeast Women in Computing Conference - Nov 16, 2013
  • Replaying the past can be more compelling than just a summary Southeast Women in Computing Conference - Nov 16, 2013
  • Outline • What is a web archive? • Why are archives important? • What's my story? • How can we help others tell their stories? • Related WS-DL Projects Southeast Women in Computing Conference - Nov 16, 2013
  • What's My Story? • As another illustration, I'll tell you a little bit more about myself ... • ... using the Internet Archive Southeast Women in Computing Conference - Nov 16, 2013
  • NLU - 1997 Southeast Women in Computing Conference - Nov 16, 2013
  • UNC-CS - 1997 Southeast Women in Computing Conference - Nov 16, 2013
  • My CS Homepage - 1997 Southeast Women in Computing Conference - Nov 16, 2013
  • CS Student Assoc Pres - 1999 Southeast Women in Computing Conference - Nov 16, 2013
  • Teaching - 2000 Southeast Women in Computing Conference - Nov 16, 2013
  • Finding gems in the archive Southeast Women in Computing Conference - Nov 16, 2013
  • My Research - 2002 Southeast Women in Computing Conference - Nov 16, 2013
  • Married, Graduated, and Teaching - 2003 Southeast Women in Computing Conference - Nov 16, 2013
  • Faculty Position at Clemson - 2004 Southeast Women in Computing Conference - Nov 16, 2013
  • Clemson missing captures Southeast Women in Computing Conference - Nov 16, 2013
  • Proof I was there - 2006 Southeast Women in Computing Conference - Nov 16, 2013
  • Faculty Position at ODU - 2006 Southeast Women in Computing Conference - Nov 16, 2013
  • Vehicular Networks - 2006 Southeast Women in Computing Conference - Nov 16, 2013
  • 1st PhD Student Graduated - 2010 Southeast Women in Computing Conference - Nov 16, 2013
  • InfoVis, Work with WS-DL - 2011 Southeast Women in Computing Conference - Nov 16, 2013
  • Telling My Story • Going through the archive was a lot of fun. • But, it wasn't always easy. • Today, I might want to incorporate Facebook and Twitter posts in my story. Not saved at Internet Archive. =( • Let's make this easy to do for everyone. Southeast Women in Computing Conference - Nov 16, 2013
  • Outline • What is a web archive? • Why are archives important? • What's my story? • How can we help others tell their stories? • Related WS-DL Projects Southeast Women in Computing Conference - Nov 16, 2013
  • Project Overview • Project forms the PhD work of Yasmin AlNoamany, ideas in early stages • Joins my interests in measurement, web science, information visualization. – measurement - how do people use web archives? – web science - how can we analyze web archives to find pages related to live web pages? – info vis - how can we present the stories that we have harvested from the archive? Southeast Women in Computing Conference - Nov 16, 2013
  • How do people use web archives? • We obtained a year's worth (2012) of requests to the Internet Archive's Wayback Machine – client IPs anonymized Southeast Women in Computing Conference - Nov 16, 2013
  • How do people use web archives? • First, there are a lot of robots (aka bots) who access the archive – 10 bot sessions for every 1 human session – maybe people don't know about the archive? • Typical human sessions are pretty short – people aren't spending lots of time in the archive – it took me over an hour of walking through the archive to build my story – maybe people who do know about the archive aren't using it to build stories? AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013 Southeast Women in Computing Conference - Nov 16, 2013
  • How do people use web archives? • 65% of the requested archived pages no longer exist on the live web • People use the archive because the pages they are interested in no longer exist – like most of my examples from my story AlNoamany, AlSum, Weigle, and Nelson, "Who and What Links to the Internet Archive", IJDL, to appear, 2013 Southeast Women in Computing Conference - Nov 16, 2013
  • Helping Others Tell Stories • How can we use this information to help people tell stories? • How do people tell stories? • What tools do they use today? Southeast Women in Computing Conference - Nov 16, 2013
  • Egyptian Revolution on Storify Southeast Women in Computing Conference - Nov 16, 2013
  • Bookmarking is not preserving Southeast Women in Computing Conference - Nov 16, 2013
  • How do people tell stories? • There are three levels of information: – overview – recent events – story definition and replay Southeast Women in Computing Conference - Nov 16, 2013
  • Overview Southeast Women in Computing Conference - Nov 16, 2013
  • Overview Southeast Women in Computing Conference - Nov 16, 2013
  • Recent Events Southeast Women in Computing Conference - Nov 16, 2013
  • Recent Events Southeast Women in Computing Conference - Nov 16, 2013
  • Story Replay Southeast Women in Computing Conference - Nov 16, 2013
  • Story Replay Not yet addressed Southeast Women in Computing Conference - Nov 16, 2013
  • Research Questions How do we • define the time frame of a story? • define the individual events that make up a story? • identify, evaluate, and select candidate archived web pages to support the events of the story? • visualize the resulting story? Southeast Women in Computing Conference - Nov 16, 2013
  • Define the Time Frame of a Story • People remember the name of the story, but not the date – Hurricane Katrina - Aug 29, 2005 – 2011 Egyptian Revolution - Jan 25, 2011 – Boston Marathon Bombing - April 15, 2013 • Some stories have no definitive beginning/ending – BP Gulf Oil Spill - April 20 - September? 2010 effects, court cases still ongoing – Egyptian Revolution - which one? (1952, 2011, 2013) Southeast Women in Computing Conference - Nov 16, 2013
  • Define the Time Frame of a Story • Propose candidate times based on user query Southeast Women in Computing Conference - Nov 16, 2013
  • Define a Story's Events • Consult hand-crafted timelines • User-provided timelines • Detect themes in relevant archived web pages Southeast Women in Computing Conference - Nov 16, 2013
  • Identify Relevant Archived Web Pages • Identify "seed URIs" and query the archive for their existence during the appropriate time – also query for URIs linked from the seed URIs • How to identify seed URIs? – wikipedia – news sites – social media (tweets, Facebook shares) – Storify Southeast Women in Computing Conference - Nov 16, 2013
  • Different sources will provide different seed URIs Southeast Women in Computing Conference - Nov 16, 2013
  • What about social media pages? Southeast Women in Computing Conference - Nov 16, 2013
  • Create your own Facebook archive • May need to allow for usercontributed content Kelly, Nelson, and Weigle, "WARCreate and WAIL: WARC, Wayback, and Heritrix Made Easy," Demo at Digital Preservation 2013. http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html Southeast Women in Computing Conference - Nov 16, 2013
  • Suppose we found 100 relevant pages for each event in the story I’ll add here many copies from bbc, nytimes, foxnews Southeast Women in Computing Conference - Nov 16, 2013
  • Evaluate Relevant Archived Web Pages • Are there duplicate accounts? • What is the reputation, bias, or point of view of the source? • How well was the page archived? Southeast Women in Computing Conference - Nov 16, 2013
  • Duplication Southeast Women in Computing Conference - Nov 16, 2013
  • Reputation of Source Southeast Women in Computing Conference - Nov 16, 2013
  • Quality of Archived Page Southeast Women in Computing Conference - Nov 16, 2013
  • Select Relevant Archived Web Pages • User will select pages to use in the final story • But user needs to be presented with some choices Southeast Women in Computing Conference - Nov 16, 2013
  • Selecting Relevant Pages Mubarak's Resignation Southeast Women in Computing Conference - Nov 16, 2013
  • Visualize the Story • Provide different interactive visualizations that enable exploring the story easily • Provide the user with the ability to modify the story and specify the start and end dates Southeast Women in Computing Conference - Nov 16, 2013
  • Using Storify Southeast Women in Computing Conference - Nov 16, 2013
  • Interactive Timeline Replaying Story of Egyptian Revolution Southeast Women in Computing Conference - Nov 16, 2013
  • Slideshow • Different View Southeast Women in Computing Conference - Nov 16, 2013
  • Research Questions How do we • define the time frame of a story? • define the individual events that make up a story? • identify, evaluate, and select candidate archived web pages to support the events of the story? • visualize the resulting story? Southeast Women in Computing Conference - Nov 16, 2013
  • Outline • What is a web archive? • Why are archives important? • What's my story? • How can we help others tell their stories? • Related WS-DL Projects Southeast Women in Computing Conference - Nov 16, 2013
  • User Access Patterns AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013 Southeast Women in Computing Conference - Nov 16, 2013
  • Everybody Dips, Humans Dive, Robots Skim Robots (34,203 sessions) Humans (3,431 sessions) AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013 Southeast Women in Computing Conference - Nov 16, 2013
  • What domains does each archive hold? AlSum, Weigle, Nelson and Van de Sompel, "Profiling Web Archive Coverage for Top-Level Domain and Content Language," TPDL 2013. Southeast Women in Computing Conference - Nov 16, 2013
  • What domains does each archive hold? AlSum, Weigle, Nelson and Van de Sompel, "Profiling Web Archive Coverage for Top-Level Domain and Content Language," TPDL 2013. Southeast Women in Computing Conference - Nov 16, 2013
  • Sometimes the live web "leaks" into the archive Sept 3, 2008 2012 http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html Southeast Women in Computing Conference - Nov 16, 2013
  • ODU's WS-DL Group ODU You are here Southeast Women in Computing Conference - Nov 16, 2013
  • ODU's WS-DL Group • Our recent work has been featured in the popular press • We're always looking for more great students! Dr. Michele C. Weigle Old Dominion University Norfolk, VA mweigle@cs.odu.edu @weiglemc http://www.cs.odu.edu/~mweigle/ http://ws-dl.blogspot.com/ Southeast Women in Computing Conference - Nov 16, 2013