Dr. Michele Weigle gave a presentation on telling stories using web archives. She discussed defining a story's timeline and key events, identifying relevant archived web pages, and visualizing the assembled story. Her research group is exploring how to help others reconstruct personal and historical narratives from archived web content, as pages on the live web often disappear over time.
Improving Collection Understanding in Web ArchivesShawn Jones
We propose using visualization of representative mementos to aide in collection understanding of web archive collections, as inspired by AlNomanay's work.
Combining Social Media Storytelling With Web ArchivesShawn Jones
(This was a guest presentation for CS6604 - Digital Libraries - Fall 2019 - taught by Edward A. Fox)
Web archive collections consist of 1000s of documents. Manually making sense of collections at this scale is difficult. We propose using social media storytelling to aid in summarizing web archive collections. We discuss AlNoamany's Algorithm for generating a representative sample from these collections and highlight how to use the Dark and Stormy Archives toolkit.
I presented this paper at iPres 2018. Here, we introduce the Off-Topic Memento Toolkit, used to detect versions of web pages that have drifted off topic from the general topic of a collection.
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
ws-dl.cs.odu.edu
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
Old Dominion University ECE Department Colloquium
2015-11-13
I presented this at iPres 2018. It consists of an analysis of some structural features found in Archive-It collections. We also categorize Archive-It collections into 4 different semantic categories and then uses the structural features to predict these categories with a Random Forest Classifier.
Improving Collection Understanding in Web ArchivesShawn Jones
We propose using visualization of representative mementos to aide in collection understanding of web archive collections, as inspired by AlNomanay's work.
Combining Social Media Storytelling With Web ArchivesShawn Jones
(This was a guest presentation for CS6604 - Digital Libraries - Fall 2019 - taught by Edward A. Fox)
Web archive collections consist of 1000s of documents. Manually making sense of collections at this scale is difficult. We propose using social media storytelling to aid in summarizing web archive collections. We discuss AlNoamany's Algorithm for generating a representative sample from these collections and highlight how to use the Dark and Stormy Archives toolkit.
I presented this paper at iPres 2018. Here, we introduce the Off-Topic Memento Toolkit, used to detect versions of web pages that have drifted off topic from the general topic of a collection.
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
ws-dl.cs.odu.edu
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
Old Dominion University ECE Department Colloquium
2015-11-13
I presented this at iPres 2018. It consists of an analysis of some structural features found in Archive-It collections. We also categorize Archive-It collections into 4 different semantic categories and then uses the structural features to predict these categories with a Random Forest Classifier.
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
With web archives, journalists find evidence and information to back up their stories, historians store information for later users, and social scientists can study the actions of humans during specific time periods. These different groups gain value not only from creating their own collections but from using the collections of others. Web archive collections store the content that would otherwise be lost. As users, we currently have no efficient way of understanding what is in each collection without manually reviewing all of its items. Web archives intentionally consist of different versions of the same document. With these multiple versions, we can watch the evolution of a single resource over time, following the changes to an organization or how the public learns the details of an unfolding news story. As aggregations of archived web pages, or mementos, these collections become resources unto themselves. While past work has used mementos for studying how web resources change over time or evaluated the changes to various industries, there is still theoretical work to be done in improving the usability of web archive collections. Our goal is to help collection creators and the public at large to make better use of these collections through improvements to collection understanding. We build upon the work of AlNoamany by using visualizations from social media storytelling. Our goal is to produce a story for each web archive collection. Each story consists of representative mementos selected from the web archive collection that are then individually visualized as surrogates (e.g., screenshots, cards containing a summary of the page). This solution has the benefit of using visualization paradigms familiar to users. In this work, we provide background on the problem, analyze previous work in this area, and highlight our preliminary work before providing a plan for future research.
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
This is a presentation of social media storytelling tools that were covered in a blog post written for the Web Science and Digital Libraries research group: http://ws-dl.blogspot.com/2017/08/2017-08-11-where-can-we-post-stories.html
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
Presented at ACM CIKM 2019. Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
WS-DL’s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
Talk given at Library of Congress by Michele C. Weigle (@weiglemc)
December 18, 2018
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
CNI Spring 2016
2016-04-05
Summarizing archival collections using storytelling techniquesMichael Nelson
Summarizing archival collections using storytelling techniques
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
@phonedude_mln
Research Funded by IMLS LG-71-15-0077-15
Dodging the Memory Hole
Los Angeles, CA, 2016-10-14
Since Wikipedia launched in 2001, librarians have maintained a cautious and, at times, hostile relationship with the online, crowd-sourced encyclopedia. Librarians have largely ignored Wikipedia, citing it as an unreliable and non-authoritative resource, and steering information seekers toward traditional reference materials. While librarians waged this quiet war, Wikipedia has gained increasing dominance as an information resource, and is now the indisputable starting point for most quick research. In this presentation, attendees will learn how to wield the power of Wikipedia in their libraries and embrace Wikipedia as an information resource. Presenters will discuss how to use Wikipedia for reference and instruction, linking online resources, increasing search engine optimization, and creating linked data for the semantic web. Presenters will also discuss the great need for librarians to delve into the world of Wikipedia as researchers and contributors; including the ethics of contributing to Wikipedia. Presenters: Dustin Fife, Rebekah Cummings, Jessica Breiman
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
With web archives, journalists find evidence and information to back up their stories, historians store information for later users, and social scientists can study the actions of humans during specific time periods. These different groups gain value not only from creating their own collections but from using the collections of others. Web archive collections store the content that would otherwise be lost. As users, we currently have no efficient way of understanding what is in each collection without manually reviewing all of its items. Web archives intentionally consist of different versions of the same document. With these multiple versions, we can watch the evolution of a single resource over time, following the changes to an organization or how the public learns the details of an unfolding news story. As aggregations of archived web pages, or mementos, these collections become resources unto themselves. While past work has used mementos for studying how web resources change over time or evaluated the changes to various industries, there is still theoretical work to be done in improving the usability of web archive collections. Our goal is to help collection creators and the public at large to make better use of these collections through improvements to collection understanding. We build upon the work of AlNoamany by using visualizations from social media storytelling. Our goal is to produce a story for each web archive collection. Each story consists of representative mementos selected from the web archive collection that are then individually visualized as surrogates (e.g., screenshots, cards containing a summary of the page). This solution has the benefit of using visualization paradigms familiar to users. In this work, we provide background on the problem, analyze previous work in this area, and highlight our preliminary work before providing a plan for future research.
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
This is a presentation of social media storytelling tools that were covered in a blog post written for the Web Science and Digital Libraries research group: http://ws-dl.blogspot.com/2017/08/2017-08-11-where-can-we-post-stories.html
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
Presented at ACM CIKM 2019. Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
WS-DL’s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
Talk given at Library of Congress by Michele C. Weigle (@weiglemc)
December 18, 2018
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
CNI Spring 2016
2016-04-05
Summarizing archival collections using storytelling techniquesMichael Nelson
Summarizing archival collections using storytelling techniques
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
@phonedude_mln
Research Funded by IMLS LG-71-15-0077-15
Dodging the Memory Hole
Los Angeles, CA, 2016-10-14
Since Wikipedia launched in 2001, librarians have maintained a cautious and, at times, hostile relationship with the online, crowd-sourced encyclopedia. Librarians have largely ignored Wikipedia, citing it as an unreliable and non-authoritative resource, and steering information seekers toward traditional reference materials. While librarians waged this quiet war, Wikipedia has gained increasing dominance as an information resource, and is now the indisputable starting point for most quick research. In this presentation, attendees will learn how to wield the power of Wikipedia in their libraries and embrace Wikipedia as an information resource. Presenters will discuss how to use Wikipedia for reference and instruction, linking online resources, increasing search engine optimization, and creating linked data for the semantic web. Presenters will also discuss the great need for librarians to delve into the world of Wikipedia as researchers and contributors; including the ethics of contributing to Wikipedia. Presenters: Dustin Fife, Rebekah Cummings, Jessica Breiman
A slight revised version of the presentation that I gave at the Victorian Australian Society of Archivists - covering off on open source projects using AtoM, Archivematica and other tools.
This is a call to arms for libraries, inspired loosely by the famous SHIFT HAPPENS deck. Feel free to embed it anywhere and everywhere, with attribution.
Come on people! This is libraries' time!
Library Collection Development -- Class 2 -- Community AssessmentSarah Clark
How can libraries best assess community needs when thinking about developing relevant collections? Created for a UCLA collection development and management course, 2013.
My academic story as told through the Internet Archive's Wayback Machine.
Slides from my keynote presentation at the Southeast Women in Computing Conference, November 16, 2013
Full talk slides at http://www.slideshare.net/mweigle/telling-stories-with-web-archives
Introduction to the International Image Interoperability FrameworkIIIF_io
A presentation given at the International Image Interoperability Framework event held at the Museum of Modern Art in New York City on May 10, 2016.
Tom Cramer
Stanford University Libraries
Presenters: Janice Shipp, Kristi Smith, Vivian Bynoe, Brittani Sterling.
Presented at the Georgia Libraries Conference in Columbus, GA on 10/04/2018.
The Coastal Georgia Library Collaborative (CGLC) officially formed in 2016 to encourage collaboration, networking and professional development for Savannah area librarians and paraprofessionals. The Atlanta Emerging Librarians (AEL) formed in 2008 to serve Metro Atlanta MLIS students, new graduates and new librarians in a similar way. This presentation discusses both groups challenges and successes.
MWDL as a Service Hub for the Digital Public Library of America: Updates and ...Rebekah Cummings
In this presentation, Sandra and Rebekah talk about how MWDL became a Service Hub for the DPLA and what being a Service Hub entails. They will also discuss upcoming MWDL/DPLA announcements and events such as the digitization mini-contracts program and the DPLA Community Representatives program.
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...Frederick Zarndt
Brief history of crowdsourcing
Crowdsourcing at libraries around the world
Benefits of crowdsourcing
Demographics of library crowdsourcers
How to use various crowdsourcing web apps
The DPLA and NY Heritage for Tech Camp 2014Larry Naukam
This is an introduction to the Digital Public Library of America and to New York Heritage. It was put together for showing these web sites to school media librarians and others, an helping them to use it more effectively. It may also be used to find items for use in the Common Core curriculum.
Library Collection Development -- Class 1 -- The purpose of libraries and lib...Sarah Clark
What is the mission of libraries? How is that mission staying constant and how is it changing? Introduction to thinking about the purpose of libraries and collection development through the lens of one librarian at an independent school library in Los Angeles.
Islandora Webinar: Highlighting UMKC Digital Special Collectionseohallor
In our second webinar of 2016, discoverygarden is pleased to present an interactive discussion with Sandy Rodriguez from the University of Missouri-Kansas City (UMKC) on the recent launch of the Islandora Repository, UMKC Digital Special Collections.
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Michele Weigle
Based on work published in ACM Transactions on Information Systems (TOIS), 36(1), July 2017 by Lulwah Alkwai, Michael L. Nelson, and Michele C. Weigle
Presented at ACM SIGIR 2019 on July 24, 2019 by Michele C. Weigle
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
A Retasking Framework For Wireless Sensor NetworksMichele Weigle
Presented by Yang He
Military Communications Conference (MILCOM)
October 6-8, 2014
Baltimore, MD
Michael Ruffing, Yang He, Jason Hallstrom, Mat Kelly, Stephan Olariu and Michele C. Weigle, "A Retasking Framework For Wireless Sensor Networks," In Proceedings of the Military Communications Conference (MILCOM). Baltimore, MD, October 2014.
Strategies for Sensor Data Aggregation in Support of Emergency ResponseMichele Weigle
Presented by Xianping Wang
Military Communications Conference (MILCOM)
October 6-8, 2014
Baltimore, MD
Xianping Wang, Aaron Walden, Michele C. Weigle and Stephan Olariu, "Strategies for Sensor Data Aggregation in Support of Emergency Response," In Proceedings of the Military Communications Conference (MILCOM). Baltimore, MD, October 2014.
Presented by Michele C. Weigle, June 4, 2015
Columbia University Web Archiving Collaboration: New Tools and Models
Work by Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
What's Grad School All About?
Capital Region Celebration of Women in Computing (CAPWIC), Harrisonburg, VA
February 27, 2015
Presented by Michele Weigle
Archive What I See Now - 2014 NEH ODH OverviewMichele Weigle
"Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher
Slides from 2014 NEH ODH Project Directors' Meeting
September 15, 2014
Michele C. Weigle, Michael L. Nelson, Liza Potts
"Archive What I See Now" - NEH ODH overviewMichele Weigle
"Archive What I See Now": Bringing Institutional Web Archiving Tools to the Individual Researcher
Slides from shutdown-cancelled NEH ODH Project Directors' Meeting (originally scheduled for Oct 4, 2013)
Michele C. Weigle and Michael L. Nelson
TDMA Slot Reservation in Cluster-Based VANETsMichele Weigle
Mohammad Almalag's PhD Defense Slides
Department of Computer Science
Old Dominion University
April 3, 2013
Note: You may need to download the file to see all of the animations.
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc NetworksMichele Weigle
Hadi Arbabi's PhD Defense Slides
Department of Computer Science
Old Dominion University
April 21, 2011
Note: You may need to download the file to see all of the animations.
Data Aggregation and Dissemination in Vehicular Ad-Hoc NetworksMichele Weigle
Khaled Ibrahim's PhD Defense Slides
Department of Computer Science
Old Dominion University
February 21, 2011
Note: You may need to download the file to see all of the animations.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Telling Stories with Web Archives
1. Telling Stories with Web Archives
Dr. Michele C. Weigle
Web Sciences and Digital Libraries (WS-DL) Lab
Department of Computer Science
Old Dominion University
Norfolk, VA
Includes joint work with Dr. Michael L. Nelson and our PhD students, Scott Ainsworth, Yasmin
AlNoamany, Ahmed AlSum, Justin Brunelle, Mat Kelly, Hany SalahEldeen
Southeast Women in Computing Conference
November 16, 2013
2. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
#SEWIC2013
3. What is a web archive?
Southeast Women in Computing Conference - Nov 16, 2013
4. What are some web archives?
Southeast Women in Computing Conference - Nov 16, 2013
5. How can I access the archives?
MementoFox
Memento for Chrome
http://www.mementoweb.org/
http://ws-dl.blogspot.com/2010/03/2010-03-19-mementofox-add-on-released.html
http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
Southeast Women in Computing Conference - Nov 16, 2013
6. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
7. The Web holds our stories
Southeast Women in Computing Conference - Nov 16, 2013
8. But webpages can disappear
• Average lifespan of a webpage - 50-100 days
• A year after publication, about 11% of content
shared on social media will be gone.
SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
Southeast Women in Computing Conference - Nov 16, 2013
9. But maybe it's archived
Ainsworth, AlSum, SalahEldeen, Weigle, and Nelson, "How Much of the Web is Archived?", JCDL 2011
http://ws-dl.blogspot.com/2011/06/2011-06-23-how-much-of-web-is-archived.html
Southeast Women in Computing Conference - Nov 16, 2013
10. But social media is hard to archive
Southeast Women in Computing Conference - Nov 16, 2013
11. Our Research Group Goals
• We believe that web archives are valuable
cultural resources, and we want everyone to
know about them.
• We want to make it easy for people to bridge
the gap between the live web and the archives.
• We believe that replaying the past is more
compelling than reading a summary.
Southeast Women in Computing Conference - Nov 16, 2013
26. Replaying the past can be
more compelling than just a
summary
Southeast Women in Computing Conference - Nov 16, 2013
27. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
28. What's My Story?
• As another illustration, I'll tell you a little bit
more about myself ...
• ... using the Internet Archive
Southeast Women in Computing Conference - Nov 16, 2013
39. Proof I was there - 2006
Southeast Women in Computing Conference - Nov 16, 2013
40. Faculty Position at ODU - 2006
Southeast Women in Computing Conference - Nov 16, 2013
41. Vehicular Networks - 2006
Southeast Women in Computing Conference - Nov 16, 2013
42. 1st PhD Student Graduated - 2010
Southeast Women in Computing Conference - Nov 16, 2013
43. InfoVis, Work with WS-DL - 2011
Southeast Women in Computing Conference - Nov 16, 2013
44. Telling My Story
• Going through the archive was a lot of fun.
• But, it wasn't always easy.
• Today, I might want to incorporate Facebook
and Twitter posts in my story. Not saved at
Internet Archive. =(
• Let's make this easy to do for everyone.
Southeast Women in Computing Conference - Nov 16, 2013
45. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
46. Project Overview
• Project forms the PhD work of Yasmin
AlNoamany, ideas in early stages
• Joins my interests in measurement, web
science, information visualization.
– measurement - how do people use web archives?
– web science - how can we analyze web archives to
find pages related to live web pages?
– info vis - how can we present the stories that we
have harvested from the archive?
Southeast Women in Computing Conference - Nov 16, 2013
47. How do people use web archives?
• We obtained a year's worth (2012) of requests
to the Internet Archive's Wayback Machine
– client IPs anonymized
Southeast Women in Computing Conference - Nov 16, 2013
48. How do people use web archives?
• First, there are a lot of robots (aka bots) who
access the archive
– 10 bot sessions for every 1 human session
– maybe people don't know about the archive?
• Typical human sessions are pretty short
– people aren't spending lots of time in the archive
– it took me over an hour of walking through the archive
to build my story
– maybe people who do know about the archive aren't
using it to build stories?
AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013
Southeast Women in Computing Conference - Nov 16, 2013
49. How do people use web archives?
• 65% of the requested archived pages no longer
exist on the live web
• People use the archive because the pages they
are interested in no longer exist
– like most of my examples from my story
AlNoamany, AlSum, Weigle, and Nelson, "Who and What Links to the Internet Archive", IJDL, to appear, 2013
Southeast Women in Computing Conference - Nov 16, 2013
50. Helping Others Tell Stories
• How can we use this information to help
people tell stories?
• How do people tell stories?
• What tools do they use today?
Southeast Women in Computing Conference - Nov 16, 2013
52. Bookmarking is not preserving
Southeast Women in Computing Conference - Nov 16, 2013
53. How do people tell stories?
• There are three levels of information:
– overview
– recent events
– story definition and replay
Southeast Women in Computing Conference - Nov 16, 2013
60. Research Questions
How do we
• define the time frame of a story?
• define the individual events that make up
a story?
• identify, evaluate, and select candidate
archived web pages to support the events
of the story?
• visualize the resulting story?
Southeast Women in Computing Conference - Nov 16, 2013
61. Define the Time Frame of a Story
• People remember the name of the story, but not
the date
– Hurricane Katrina - Aug 29, 2005
– 2011 Egyptian Revolution - Jan 25, 2011
– Boston Marathon Bombing - April 15, 2013
• Some stories have no definitive beginning/ending
– BP Gulf Oil Spill - April 20 - September? 2010 effects, court cases still ongoing
– Egyptian Revolution - which one? (1952, 2011, 2013)
Southeast Women in Computing Conference - Nov 16, 2013
62. Define the Time Frame of a Story
• Propose candidate times based on user query
Southeast Women in Computing Conference - Nov 16, 2013
63. Define a Story's Events
• Consult hand-crafted
timelines
• User-provided timelines
• Detect themes in relevant
archived web pages
Southeast Women in Computing Conference - Nov 16, 2013
64. Identify Relevant Archived Web Pages
• Identify "seed URIs" and query the archive for
their existence during the appropriate time
– also query for URIs linked from the seed URIs
• How to identify seed URIs?
– wikipedia
– news sites
– social media (tweets, Facebook shares)
– Storify
Southeast Women in Computing Conference - Nov 16, 2013
65. Different sources will provide
different seed URIs
Southeast Women in Computing Conference - Nov 16, 2013
66. What about social media pages?
Southeast Women in Computing Conference - Nov 16, 2013
67. Create your own Facebook archive
• May need to
allow for usercontributed
content
Kelly, Nelson, and Weigle, "WARCreate and WAIL: WARC, Wayback, and Heritrix Made Easy," Demo at Digital Preservation 2013.
http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html
Southeast Women in Computing Conference - Nov 16, 2013
68. Suppose we found 100 relevant pages
for each event in the story
I’ll add here many copies from bbc, nytimes,
foxnews
Southeast Women in Computing Conference - Nov 16, 2013
69. Evaluate Relevant Archived Web Pages
• Are there duplicate accounts?
• What is the reputation, bias, or point of view
of the source?
• How well was the page archived?
Southeast Women in Computing Conference - Nov 16, 2013
72. Quality of Archived Page
Southeast Women in Computing Conference - Nov 16, 2013
73. Select Relevant Archived Web Pages
• User will select pages to use in the final story
• But user needs to be presented with some
choices
Southeast Women in Computing Conference - Nov 16, 2013
75. Visualize the Story
• Provide different interactive visualizations that
enable exploring the story easily
• Provide the user with the ability to modify the
story and specify the start and end dates
Southeast Women in Computing Conference - Nov 16, 2013
79. Research Questions
How do we
• define the time frame of a story?
• define the individual events that make up
a story?
• identify, evaluate, and select candidate
archived web pages to support the events
of the story?
• visualize the resulting story?
Southeast Women in Computing Conference - Nov 16, 2013
80. Outline
• What is a web archive?
• Why are archives important?
• What's my story?
• How can we help others tell their stories?
• Related WS-DL Projects
Southeast Women in Computing Conference - Nov 16, 2013
81. User Access Patterns
AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013
Southeast Women in Computing Conference - Nov 16, 2013
82. Everybody Dips, Humans Dive, Robots Skim
Robots (34,203 sessions)
Humans (3,431 sessions)
AlNoamany, Weigle, and Nelson, "Access Patterns for Robots and Humans in Web Archives", JCDL 2013
Southeast Women in Computing Conference - Nov 16, 2013
83. What domains does each archive hold?
AlSum, Weigle, Nelson and Van de Sompel, "Profiling Web Archive Coverage for Top-Level Domain and Content Language," TPDL 2013.
Southeast Women in Computing Conference - Nov 16, 2013
84. What domains does each archive hold?
AlSum, Weigle, Nelson and Van de Sompel, "Profiling Web Archive Coverage for Top-Level Domain and Content Language," TPDL 2013.
Southeast Women in Computing Conference - Nov 16, 2013
85. Sometimes the live web "leaks" into
the archive
Sept 3, 2008
2012
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Southeast Women in Computing Conference - Nov 16, 2013
87. ODU's WS-DL Group
• Our recent work has been featured in the popular press
• We're always looking for more great students!
Dr. Michele C. Weigle
Old Dominion University
Norfolk, VA
mweigle@cs.odu.edu
@weiglemc
http://www.cs.odu.edu/~mweigle/
http://ws-dl.blogspot.com/
Southeast Women in Computing Conference - Nov 16, 2013
Editor's Notes
We have seen machine readable lost of URIs, can we automatically create this list?
Storify is a social network service that lets the user create stories or timelines using social media such as Twitter, Facebook andInstagram. Storify was launched in September 2010, and has been open to the public since April 2011.http://storify.com/nzherald/muhttp://storify.com/nzherald/mu
The problem is that storify operate as bookmarking, it doesn’t preserve the links You have no clue of what the person is saying about the link
Which brings overview from wikipedia as a first result
Which brings overview from wikipedia as a first result
Which brings overview from wikipedia as a first result
Which brings overview from wikipedia as a first result
But replaying the story as it captured in the news web sites???Three information needsThis one is unserved
Three information needsThis one is unservedNow let me tell you a story of egyptian revolution, using a couple of screen shots which appeared in the time of revolution
Can we satisfy the information need of rewinding/replaying the events as they appeared in the past?How do we integrate web archives into live web to support storytelling?How do we integrate web archives into live web for repalying news stories as they captured?The research aims to integrate the past with the presentby automatically creating, identifying, and linking storiesculled from the past web that are related to the contentof a live web page or a specic event. This raises some ofthe questions: Can we leverage the content of social mediaservices to discover stories? Can we extract stories basedon user access patterns of the Wayback Machine? Can weassociate the names that people give particular events withtheir datetimes in order to find them in web archives?
If we look at different places we will get different URIs that express different prospective of the story Searching two places give us two results
I bet that anyone here know the importance of this page. Trust me. This page is very important I know that Egyptian revolution started on this group, I was one of the first who joined this page which had been created in June 10, 2010This is one of the most important pages, I know it because I have background , trust me! This is an important page for the story even if we know that, the current status is not representing the story
If we have time frame specified for the event/story, we will use deduping news collections
http://www.bartamaha.com/egypts-mubarak-resigns-after-30-year-rule-42593/Handle the duplicates of the news
Can we satisfy the information need of rewinding/replaying the events as they appeared in the past?How do we integrate web archives into live web to support storytelling?How do we integrate web archives into live web for repalying news stories as they captured?The research aims to integrate the past with the presentby automatically creating, identifying, and linking storiesculled from the past web that are related to the contentof a live web page or a specic event. This raises some ofthe questions: Can we leverage the content of social mediaservices to discover stories? Can we extract stories basedon user access patterns of the Wayback Machine? Can weassociate the names that people give particular events withtheir datetimes in order to find them in web archives?
Archive-It create slides for the seed URIs which is not normally happened by Web archive users as we discovered from the data.Humans exhibit Dip and Dive, while robots exhibit Dip and Skim Combination that humans exhibit (slides and dives)