This document summarizes a presentation about using MementoMaps to efficiently route memento lookup requests to appropriate web archives. MementoMaps provide concise summaries of what URIs are held by each archive in order to avoid broadcasting requests to all archives. They can be generated from archive indexes, compacted for size, and published for discovery. Adopting MementoMaps could significantly reduce wasted lookup requests across archives.
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
Supporting Web Archiving via Web PackagingSawood Alam
We describe challenges related to web archiving, replaying archived web resources, and verifying their authenticity. We show that Web Packaging has significant potential to help address these challenges and identify areas in which changes are needed in order to fully realize that potential.
Position Paper: https://arxiv.org/abs/1906.07104
Presented in the Internet Architecture Board's ESCAPE 2019 Workshop (Exploring Synergy between Content Aggregation and the Publisher Ecosystem)
https://www.iab.org/activities/workshops/escape-workshop/
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
InterPlanetary Wayback (IPWB) facilitates permanence and collaboration in web archives by disseminating the contents of WARC files into the InterPlanetary File System (IPFS) network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. IPWB splits the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, builds a CDXJ index with references to the IPFS hashes returns, and combines the headers and payload from IPFS at the time of replay. We also explore the possibility of an index-free, fully decentralized collaborative web archiving system as the next step.
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
Supporting Web Archiving via Web PackagingSawood Alam
We describe challenges related to web archiving, replaying archived web resources, and verifying their authenticity. We show that Web Packaging has significant potential to help address these challenges and identify areas in which changes are needed in order to fully realize that potential.
Position Paper: https://arxiv.org/abs/1906.07104
Presented in the Internet Architecture Board's ESCAPE 2019 Workshop (Exploring Synergy between Content Aggregation and the Publisher Ecosystem)
https://www.iab.org/activities/workshops/escape-workshop/
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
InterPlanetary Wayback (IPWB) facilitates permanence and collaboration in web archives by disseminating the contents of WARC files into the InterPlanetary File System (IPFS) network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. IPWB splits the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, builds a CDXJ index with references to the IPFS hashes returns, and combines the headers and payload from IPFS at the time of replay. We also explore the possibility of an index-free, fully decentralized collaborative web archiving system as the next step.
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
University of Virginia Colloquium
2016-09-12
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
Justin F. Brunelle
Michele C. Weigle
Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University
@WebSciDL
IIPC 2016
Reykjavik, Iceland, April 11, 2016
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
Michael L. Nelson
@phonedude_mln
Michele C. Weigle
@weiglemc
National Symposium on Web Archiving Interoperability
2017-02-21
Many projects joint with LANL
Funding from NSF, IMLS, NEH, and AMF
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
For ocean science researchers, the success of data discovery revolves around the capability to ask complex questions of data centers like BCO-DMO. Our ability to accurately respond depends on our system's capability to understand the question, interpret its relevancy to what we know, and return those results in a way a human can digest.
As the needs for responding to the grand challenges of science become more interdisciplinary, data discovery will become more dependent on information from a variety of sources to enable researchers to reliably access the data they need. To do this effectively, all data and their metadata require context, cooperation and semantic interoperability.
This talk explores the current landscape of data discovery, the questions researchers ask of our software, how bad our software is at responding, and how Linked Data is a viable solution for improving those responses.
Watch this presentation at: https://www.youtube.com/watch?v=wEllMpcNQFg
https://austin2014.drupal.org/session/linked-data-drupal-oceanographic-data-management
http://www.bco-dmo.org
http://lod.bco-dmo.org/sparql
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
To the Rescue of the Orphans of Scholarly Communication
presentation at CNI Spring 2017 meeting
Herbert Van de Sompel
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
http://orcid.org/0000-0003-3749-8116
Martin Klein
http://orcid.org/0000-0003-0130-2097
Extended version of slides presented at the "404/File Not Found" symposium held at Georgetown University on October 24 2014, see http://www.law.georgetown.edu/library/404/ . The presentation provides a brief overview of the link/reference rot problem and then discusses three complimentary strategies to combat it: Pro-actively capturing web resources that are linked from a seed collection; Referencing the captures by means of annotated links; Accessing the captures using Memento infrastructure.
The evolution of the Web should move forward in an upward spiral that cylces between guiding values, engineering and science. Guiding values should comprise social values as well as system principles that further stabilization and growth of the Web. Principles I will talk about will include social inclusion, connectedness and fairness. Example efforts improve Web access for disabled, critically access Web structures and Web growth, and try to transfer knowledge about previously found patterns of Web growth to analogous cases.
Profiling Web Archival Voids for Memento RoutingSawood Alam
Slides of the paper presentation, "Profiling Web Archival Voids for Memento Routing", for JCDL 2021.
Authors: Sawood Alam, Michele C. Weigle, Michael L. Nelson
Preprint: https://arxiv.org/abs/2108.03311
Recording: https://youtu.be/ImJWkndNoS8
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
University of Virginia Colloquium
2016-09-12
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
Justin F. Brunelle
Michele C. Weigle
Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University
@WebSciDL
IIPC 2016
Reykjavik, Iceland, April 11, 2016
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
Michael L. Nelson
@phonedude_mln
Michele C. Weigle
@weiglemc
National Symposium on Web Archiving Interoperability
2017-02-21
Many projects joint with LANL
Funding from NSF, IMLS, NEH, and AMF
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
For ocean science researchers, the success of data discovery revolves around the capability to ask complex questions of data centers like BCO-DMO. Our ability to accurately respond depends on our system's capability to understand the question, interpret its relevancy to what we know, and return those results in a way a human can digest.
As the needs for responding to the grand challenges of science become more interdisciplinary, data discovery will become more dependent on information from a variety of sources to enable researchers to reliably access the data they need. To do this effectively, all data and their metadata require context, cooperation and semantic interoperability.
This talk explores the current landscape of data discovery, the questions researchers ask of our software, how bad our software is at responding, and how Linked Data is a viable solution for improving those responses.
Watch this presentation at: https://www.youtube.com/watch?v=wEllMpcNQFg
https://austin2014.drupal.org/session/linked-data-drupal-oceanographic-data-management
http://www.bco-dmo.org
http://lod.bco-dmo.org/sparql
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
To the Rescue of the Orphans of Scholarly Communication
presentation at CNI Spring 2017 meeting
Herbert Van de Sompel
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
http://orcid.org/0000-0003-3749-8116
Martin Klein
http://orcid.org/0000-0003-0130-2097
Extended version of slides presented at the "404/File Not Found" symposium held at Georgetown University on October 24 2014, see http://www.law.georgetown.edu/library/404/ . The presentation provides a brief overview of the link/reference rot problem and then discusses three complimentary strategies to combat it: Pro-actively capturing web resources that are linked from a seed collection; Referencing the captures by means of annotated links; Accessing the captures using Memento infrastructure.
The evolution of the Web should move forward in an upward spiral that cylces between guiding values, engineering and science. Guiding values should comprise social values as well as system principles that further stabilization and growth of the Web. Principles I will talk about will include social inclusion, connectedness and fairness. Example efforts improve Web access for disabled, critically access Web structures and Web growth, and try to transfer knowledge about previously found patterns of Web growth to analogous cases.
Profiling Web Archival Voids for Memento RoutingSawood Alam
Slides of the paper presentation, "Profiling Web Archival Voids for Memento Routing", for JCDL 2021.
Authors: Sawood Alam, Michele C. Weigle, Michael L. Nelson
Preprint: https://arxiv.org/abs/2108.03311
Recording: https://youtu.be/ImJWkndNoS8
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.
Mathematics & Computer Science Seminar
Emory University
October 2, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
Leveraging Wikipedia as a Hub for Data Integration: the Remixing Archival Metadata Project (RAMP)
Timothy A. Thompson, Metadata Librarian (Spanish/Portuguese Specialty), Princeton University Library
Aqua Browser Implementation at Oklahoma State Universityyouthelectronix
On Wednesday November 7th Dr. Anne Prestamo discussed "AquaBrowser Implementation at Oklahoma
State Univerity Library" as part of a program on Next Generation Catalogs held at at the University of Massachusetts at Amherst and co-sponsored by the Five Colleges' Librarians
Council and Simmons College Graduate School of Library and Information Science (GSLIS).
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationDenis Shestakov
Full-text of my PhD dissertation titled "Search Interfaces on the Web: Querying and Characterizing" defended in ICT-Building, Turku, Finland on 12.06.2008
Thesis contributions:
* New methods for deep Web characterization
* Estimating the scale of a national segment of the Web
* Building a publicly available dataset describing >200 web databases on the Russian Web
* Designing and implementing the I-Crawler, a system for automatic finding and classifying search interfaces
* Technique for recognizing and analyzing JavaScript-rich and non-HTML searchable forms
* Introducing a data model for representing search interfaces and result pages
* New user-friendly and expressive form query language for querying search interfaces and extracting data from result pages
* Designing and implementing a prototype system for querying web databases
* Bibliography with over 110 references to publications in the area of deep Web
Thomas Delerm and Adrien Di Mascio from Logibal will explain the interest of web semantics in modern web applications for the best use of your data.
They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a "web 3.0"
Online Collections Crawlability for Libraries, Archives, and Museumsmherbison
The Goal is Crawlability.
Allow and encourage webcrawlers to access everything on your website that you want users to be able to find.
(1) If webcrawlers can’t get to your stuff...
(2) Search engines won’t index your stuff...
(3) Your stuff won’t turn up in users’ web searches...
(4) Users won’t find your stuff!
A presentation on mashing up Twitter Annotations with the Semantic Web. June 24, 2010 at the Semantic Technology Conference, San Francisco (SemTech 2010).
CDX Summary: Web Archival Collection InsightsSawood Alam
Large web archival collections are often opaque about their holdings. We created an open-source tool called, CDX Summary, to generate statistical reports based on URIs, hosts, TLDs, paths, query parameters, status codes, media types, date and time, etc. present in the CDX index of a collection of WARC files. Our tool also surfaces a configurable number of potentially good random memento samples from the collection for visual inspection, quality assurance, representative thumbnails generation, etc. The tool generates both human and machine readable reports with varying levels of details for different use cases. Furthermore, we implemented a Web Component that can render generated JSON summaries in HTML documents. Early exploration of CDX insights on Wayback Machine collections helped us improve our crawl operations.
Venue: TPDL 2022
Recording: https://www.youtube.com/watch?v=K5i3XShqW6A
Video Archiving and Playback in the Wayback MachineSawood Alam
At the Internet Archive (IA) we collect static and dynamic lists of seeds from various sources (like Save Page Now, Wikipedia EventStream, Cloudflare, etc.) for archiving. Some of these seeds include web pages with videos on them. Those URLs are curated based on certain criteria to identify potential videos that should be archived or excluded. Candidate video page URLs for archiving are placed in a queue (currently using Kafka) to be consumed by a separate process. We maintain a persistent database of videos we have already archived, which is used both for status tracking as well as a seen-check system to avoid duplicate downloads of large media files that usually do not change. We use youtube-dl (or one of its forks) to download videos and their metadata. We archive the container HTML page, associated video metadata, any transcriptions, thumbnails, and at least one of the many video files with different resolutions and formats. These pieces are stored in separate WARC records (some with “response” type and others as “metadata”). Some popular video streaming services do not have static links to embed video files, which makes it difficult to identify and serve video files corresponding to their container HTML pages on archival replay. To glue related pieces together for replay we are currently using a key-value store, but exploring ways to get away with an additional index. We are using a custom video player and perform necessary rewriting in the container HTML page for a more reliable video playback experience. We create a daily summary of metadata of videos that we have archived and load it in a custom-built Video Archiving Insights dashboard to identify any issues or biases, which are utilized as a feedback loop for quality assurance and to enhance our curation criteria and archiving strategies. We are always looking forward to ways to improve the system that works at scale as well as means to interoperate.
Recording: youtube.com/watch?v=6MiYKOq_DKo
A brief introduction to WARC File Format used for long-term Web Archival preservation. These slides were initially prepared to give a guest lecture in the CS 531 Web Server Design (Fall 2018) course at Old Dominion University.
MemGator - A Memento Aggregator CLI and Server in GoSawood Alam
MemGator - A Portable Concurrent Memento Aggregator CLI and Server Written in Go.
The corresponding poster can be found at http://www.cs.odu.edu/~salam/presentations/memgator-jcdl16-poster.pdf
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
Live-leakage (zombie resource) is an issue in archival replay of web pages. This work proposes a mechanism to avoid such live-leakage using ServiceWorker. This work was presented in WADL 2017 on June 22 in Toronto, Ontario, Canada.
Client-side Reconstruction of Composite Mementos Using ServiceWorkerSawood Alam
Live-leakage (zombie resource) is an issue in archival replay of web pages. This work proposes a mechanism to avoid such live-leakage using ServiceWorker. This work was presented in JCDL 2017 on June 20 in Toronto, Ontario, Canada.
Introducing Web Archiving and WSDL Research GroupSawood Alam
My talk to introduce Web Archiving and the Web Science and Digital Libraries Research Group to some invited students from India for a summer workshop in Old Dominion University, Norfolk, VA
A talk given to final year B.Tech. Computer Science students at Jamia Millia Islamia, New Delhi, India with the intent of spreading awareness about web archiving and digital preservation and motivating the students for research.
HTTP Mailbox - Asynchronous RESTful CommunicationSawood Alam
A Thesis Presentation to the Faculty of Old Dominion University in Partial Fulfillment of the Requirements for the Degree of Master of Science Computer Science
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
1. Sawood Alam, Internet Archive
Michael L. Nelson, Old Dominion University
Michele C. Weigle, Old Dominion University
Daniel Gomes, Arquivo.pt
Summarize Your
Archival Holdings
With MementoMap
IIPC Web Archiving Conference, June 16, 2021
#MementoMap
@ibnesayeed
3. @ibnesayeed 3
$ memgator -f cdxj http://si.edu/ | grep -v "^!" | cut -d'/' -f3 | sort | uniq -c | sort -nr
13263 web.archive.org
3590 wayback.archive-it.org
1202 web.archive.bibalex.org
651 webarchive.loc.gov
321 arquivo.pt
32 wayback.vefsafn.is
11 web.archive.org.au
3 archive.is
1 www.webarchive.org.uk
1 swap.stanford.edu
1 perma.cc
$ memgator -f cdxj http://odu.edu/ | grep -v "^!" | cut -d'/' -f3 | sort | uniq -c | sort -nr
3071 web.archive.org
796 wayback.archive-it.org
751 web.archive.bibalex.org
99 webarchive.loc.gov
26 arquivo.pt
2 archive.is
1 wayback.vefsafn.is
Cross-Archive Memento Lookup With MemGator
Although there are
13k+ mementos in IA,
there are also
mementos in 10 other
public web archives.
https://github.com/oduwsdl/MemGator
ODU is less popular, but
there are mementos in 7
different web archives.
4. @ibnesayeed
Who Would Have Thought to Lookup in the Icelandic
Web Archive for odu.edu Mementos?
4
http://wayback.vefsafn.is/wayback/20100810032449/http://odu.edu/
5. @ibnesayeed
Prevalence of Sample Query URI Sets in Archives
5
Sample
(1M URIs Each)
In
Archive-It
In
UKWA
In
Stanford
Union
{AIT, UK, SU}
DMOZ 4.097% 3.594% 0.034% 7.575%
MementoProxy 4.182% 0.408% 0.046% 4.527%
IAWayback 3.716% 0.519% 0.039% 4.165%
UKWayback 0.108% 0.034% 0.002% 0.134%
Alam et al., “Web Archive Profiling Through CDX Summarization”, IJDL 2016
6. @ibnesayeed
Why Aggregate Small Archives?
● Wayback Machine does not cover everything
● Archives often have unique mementos (small overlap)
● Linguistic and geolocation diversity
● High-quality curated collections
● Restricted resources and private archives
6
13. @ibnesayeed
MemGator Log Responses From Various Archives
13
93% of the requests
made from MemGator
to upstream archives
were wasteful.
Only about one third
of the requests to the
largest web archive
(IA) were a hit.
14. @ibnesayeed
Aggregation Is Great, But Broadcasting Is Wasteful
14
What do we want? Aggregate all archives, large or small
What’s the problem? Broadcasting is wasteful and problematic
What’s the solution? Selectively poll archives that are likely to
return good results for a lookup URI
How to identify those? Profile web archives
How to profile archives? MementoMap Framework
Sawood Alam, “MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing”, Doctoral Dissertation, ODU, 2020
15. @ibnesayeed
If Only Archives Could Tell What to Ask Them For
● Websites advertise their holdings using sitemap.xml, why can’t archives?
○ Archives have billions or even trillions of URI-Ms
○ Such exhaustive lists would go stale very quickly
● How about robots.txt?
○ It is compact, but is exclusion format, it does not tell what the site has
○ It assumes a single domain, patterns are for paths (not the domain name)
● How about well-known URIs?
○ Good for automated discovery of domain-specific metadata resources
● How about combining these ideas?
○ Introducing MementoMap!
15
19. @ibnesayeed
What is Archived in Arquivo.pt?
What is Accessed from MemGator?
19
2B URI-Rs that have
1-9 mementos each in
Arquivo.pt were never
requested from ODU’s
MemGator server.
43 URI-Rs were
requested thousands
of times each, but
had zero mementos
in Arquivo.pt.
45 URI-Rs had tens
of mementos each
that were requested
hundreds of times.
20. @ibnesayeed
What is Archived in Arquivo.pt?
What is Accessed from MemGator?
20
Blind spot of a
usage-based
profile
Blind spot of a
content-based
profile
21. @ibnesayeed
Who Bears the Cost of Bad Routing Decisions?
21
Actual
Present in the Archive Not in the Archive
Predicted
Routed to the
Archive
True Positive (TP) False Positive (FP)
Not Routed to
the Archive
False Negative (FN) True Negative (TN)
FP: Wasteful (Infrastructure suffers)
FN: Disuse (Users suffer)
22. @ibnesayeed
URI Canonicalization and SURT
22
https://news.bbc.co.uk/images/Logo.png?width=200&height=80&rotate=90%C2%B0#top
http://www.news.BBC.co.uk/images/Logo.png?width=200&height=80&rotate=90%c2%b0#top
http://www.news.bbc.co.uk/images/Logo.png?rotate=90%c2%B0&width=200&height=80
http://NEWS.BBC.CO.UK:80//images//Logo.png?height=80&width=200&rotate=90%c2%b0#top
news.bbc.co.uk/images/Logo.png?height=80&rotate=90%C2%B0&width=200
uk,co,bbc,news,)/images/logo.png?height=80&rotate=90%c2%b0&width=200
Canonicalization
SURT
24. @ibnesayeed
SURT Representation With Wildcard
24
Original SURTs did not have wildcards.
We introduced it for dynamic profiling.
In practice the common “http://(” prefix
is removed.
25. @ibnesayeed
Shape of URI Key Tree of Arquivo.pt
25
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
26. @ibnesayeed
A MementoMap Example
26
!context ["http://oduwsdl.github.io/contexts/ukvs"]
!id {uri: "http://archive.example.org/"}
!fields {keys: ["surt"], values: ["frequency"]}
!meta {type: "MementoMap", name: "A Test Web Archive", year: 1996}
!meta {updated_at: "2018-09-03T13:27:52Z"}
* 54321/20000
com,* 10000+
org,arxiv)/ 100
org,arxiv)/* 2500~/900
org,arxiv)/pdf/* 0
uk,co,bbc)/images/* 300+/20-
https://github.com/oduwsdl/ORS/blob/master/ukvs.md
Goodbye HmPn/DLim static profiling policies, thanks to our SURT with wildcard.
27. @ibnesayeed
MementoMap
27
https://github.com/oduwsdl/MementoMap
$ mementomap
Usage: mementomap [-h] {generate,compact,lookup,batchlookup} ...
Positional Arguments:
{generate,compact,lookup,batchlookup}
generate Generate a MementoMap from a sorted file with the
first columns as SURT (e.g., CDX/CDXJ)
compact Compact a large MementoMap file into a small one
lookup Search for a URI/SURT into a MementoMap
batchlookup Search for a list of URIs/SURTs into a MementoMap
Optional Arguments:
-h, --help Show this help message and exit
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
28. @ibnesayeed
Processed Lines vs. Compacted MementoMap Growth
28
com,example)/a/1/x
com,example)/a/2
com,example)/a/3
com,example)/b/1
com,example)/b/2
com,example)/c/1
com,example)/a/*
com,example)/b/1
com,example)/b/2
com,example)/c/1
com,example)/*
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
29. @ibnesayeed
MementoMap Generation, Compaction, and Lookup
29
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
1.5% Relative Cost yields 60% Accuracy.
Arquivo.pt can save 60% wasted traffic by
publishing a 119MB summary file!
30. @ibnesayeed
Why Profile Archival Voids?
30
$ curl -I https://web.archive.org/web/https://quora.com/
HTTP/1.1 403 FORBIDDEN
Server: nginx/1.15.8
Date: Wed, 02 Dec 2020 20:39:33 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Server-Timing: captures_list;dur=0.150497
X-App-Server: wwwb-app58
X-ts: 403
The Internet Archive has
many “*.com” domains,
but it may not want to
capture or replay some.
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
31. @ibnesayeed
Archival Voids Profiles Reduce False Positives
31
org,arxiv)/abs/a 40
org,arxiv)/abs/b 23
org,arxiv)/abs/c 17
org,arxiv)/format/a 15
org,arxiv)/format/b 20
org,arxiv)/format/c 10
org,arxiv)/search/a 30
...
org,arxiv)/abs/* 80
org,arxiv)/format/* 45
org,arxiv)/search/* 60
org,arxiv)/* 185
org,arxiv)/abs/d
False Positive org,arxiv)/pdf/a
org,arxiv)/pdf/b
org,arxiv)/pdf/c
False Positive
org,arxiv)/* 185
org,arxiv)/pdf/* 0 How about summarizing frequently
accessed URIs an archive does not hold?
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
32. @ibnesayeed
404-Only Frequencies and Request Savings
32
An archival voids profile of 2.4k URIs, that were accessed hundreds of
times each or more, could have saved about 8.4% of wasted requests.
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
33. @ibnesayeed
Archival Voids Recommendations
33
● Keep archival voids profiles separate from archival holdings
● Update often
● Use specific keys with only high confidence
● Profile only resources that are high in demand
● Archives themselves are better sources of truth than external
observers
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
34. @ibnesayeed
Dissemination and Discovery Methods
34
GET /.well-known/mementomap HTTP/1.1
Host: arquivo.pt
Link: <https://arquivo.pt/path/to/mementomap.ukvs>;
rel="mementomap"
<link href="https://arquivo.pt/path/to/mementomap.ukvs"
rel="mementomap">
Well-known URI
Link Header
Link HTML Element
35. @ibnesayeed
MementoMap Adoption Path
● PWA, UKWA, and NLA have shown interest
● PyWB archival replay system is open for implementation
● MemGator and LANL’s Time Travel service are interested
● Big web archives can start with publishing archival voids
○ No need to profile IA
● Archives with access restrictions can have multiple
MementoMaps
● Third parties can create and publish MementoMaps of the
rest of the archives while they catch up
● Coexist with the ongoing IIPC-funded Bloom filters project
35
36. @ibnesayeed
MementoMap Call for Adoption
36
🕮
MementoMap Framework (Doctoral Dissertation)
https://digitalcommons.odu.edu/computerscience_etds/129/
Unified Key Value Store (UKVS)
https://github.com/oduwsdl/ORS/blob/master/ukvs.md
⚙
MementoMap CLI
https://github.com/oduwsdl/MementoMap
MemGator
https://github.com/oduwsdl/MemGator
$ mementomap generate --hcf=4.0 --pcf=2.0 index.cdx[j] mementomap.ukvs
# Provide sorted list of SURTs to STDIN if not using CDX[J] index
$ scp mementomap.ukvs ${WEBHOST}:${WEBROOT}/.well-known/mementomap
# Preferably, compress the file and allow content negotiation
✉
Email: sawood@archive.org
Twitter: @ibnesayeed
IIPC Slack: #mementomap