My talk to introduce Web Archiving and the Web Science and Digital Libraries Research Group to some invited students from India for a summer workshop in Old Dominion University, Norfolk, VA
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Introducing Web Archiving and WSDL Research Group
1. Introducing
Web Archiving and
WSDL Research Group
Sawood Alam
Department of Computer Science
Old Dominion University
Norfolk, Virginia - 23529 (USA)
2. About Me
Sawood Alam
Lexical Signature
Web, Digital Library, Web Archiving, Ruby on Rails, PHP, HTML,
CSS, JavaScript, ExtJS, Go, Urdu, RTL, Docker, and Linux.
● BTech, Jamia Millia Islamia, India, 2008
● MSc, Old Dominion University, USA, 2013
● PhD, Old Dominion University, USA, Current
4. Agenda
● Archiving and Web archiving
● Purpose and importance
● Scope of the web archiving
● Issues and challenges
● Tools and techniques
● Memento: Time Travel for the Web
● Archive X-Ray
● Research opportunities in Web archiving
● Our WSDL Research Group
5. What is an Archive?
● Accumulation of historical records
● Long term storage and preservation
● Less frequently used
● Physical or digital
6. What is Web Archiving?
● Periodic snapshots of web pages
● Preserving important events on the Web
● Making archived content accessible
7. Why do We Care Archiving?
Web contents decay rapidly!
● To preserve the history
● To tell a story
● For evidence
● For backup
● For personal satisfaction
9. Web Archiving Efforts
● Internet Archive
● Archive-It
● Wikipedia
● UK Web Archive
● Various national and non-profit archives
● Film, music and other multimedia archives
● Scholarly archives
● Personal archiving
12. Archive X-Ray!
● How much of the Web is archived?
● Profiling various archive services
● Predicting what they contain
● Routing Memento aggregator queries
25. From: Michael Nelson [mailto:mln@cs.odu.edu]
Sent: Wednesday, December 02, 2015 12:33 PM
To: Jones, Gina
Cc: Rourke, Patrick; Grotke, Abigail
Subject: Re: WebSciDL
Hi Gina, I'll investigate. memgator is software that one my students
wrote, but I suspect the traffic you're seeing is b/c it is deployed in
http://oldweb.today/ can you share the IP addr from where you're
seeing the traffic? I presume the requests are for Memento TimeMaps?
It should not being actually scraping HTML pages.
regards,
Michael
On Wed, 2 Dec 2015, Jones, Gina wrote:
> Hi Michael, we have a slight configuration issue with the current OW
> set up for our webarchives. I think, from looking at the logs, that
> "MemGator:1.0-rc3 <@WebSciDL>" is really causing some issues on
our wayback.
> Do you know who is running this scraper? Itʼs not part of memento is
it?
>
> Gina Jones
> Web Archiving Team
> Library of Congress
From: Ilya Kreymer <ikreymer@gmail.com>
Date: Wed, 2 Dec 2015 10:33:56 -0800
Subject: high traffic on oldweb!
To: Herbert Van de Sompel <hvdsomp@gmail.com>, Sawood Alam
<ibnesayeed@gmail.com>
Hi Herbert, Sawood,
Herbert: Perhaps you are lucky that I am not using the LANL
aggregator, as the traffic has gotten really high, and also I was asked
to remove an archive due to the traffic it was causing temporarily..
I am thinking that ability to remove source archives quickly is an
important aspect of an aggregator.
Sawood: Hopefully yours will support something like this so I don't
need to restart the container to change the archivelist ;)
Ilya
Broadcasting is Bad
29. Archive Profile
● High-level summary of an archive
● Predicts presence of mementos
● Provides statistics about the holdings
● Small in size and publicly available
● Easy to update and partially patch
● Useful for Memento query routing and
other things
com,cnn)/ {“frequency”: 40, “spread”: 2}
uk,co,bbc)/ {“frequency”: 20, “spread”: 1}
com,usatoday)/ {“frequency”: 5, “spread”: 1}
30. Research Opportunities
● Information retrieval
● Information visualization
● Client and server side archiving
● Archiving dynamic content
● Distributed archiving
● Discovering alternate long term archiving
techniques
● Predicting “Important” events on the Web
and archiving them timely
31. Web Science and Digital
Libraries Research Group
ws-dl.cs.odu.edu
ws-dl.blogspot.com
@WebSciDL
github.com/oduwsdl
flickr.com/photos/124419986@N07
38. Sawood Alam
Department of Computer Science
Old Dominion University
Norfolk, Virginia - 23529 (USA)
salam@cs.odu.edu
ibnesayeed@gmail.com
@ibnesayeed
www.cs.odu.edu/~salam