Slides of the paper presentation, "Profiling Web Archival Voids for Memento Routing", for JCDL 2021.
Authors: Sawood Alam, Michele C. Weigle, Michael L. Nelson
Preprint: https://arxiv.org/abs/2108.03311
Recording: https://youtu.be/ImJWkndNoS8
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
InterPlanetary Wayback (IPWB) facilitates permanence and collaboration in web archives by disseminating the contents of WARC files into the InterPlanetary File System (IPFS) network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. IPWB splits the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, builds a CDXJ index with references to the IPFS hashes returns, and combines the headers and payload from IPFS at the time of replay. We also explore the possibility of an index-free, fully decentralized collaborative web archiving system as the next step.
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.
Introducing Web Archiving and WSDL Research GroupSawood Alam
My talk to introduce Web Archiving and the Web Science and Digital Libraries Research Group to some invited students from India for a summer workshop in Old Dominion University, Norfolk, VA
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
University of Virginia Colloquium
2016-09-12
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
InterPlanetary Wayback (IPWB) facilitates permanence and collaboration in web archives by disseminating the contents of WARC files into the InterPlanetary File System (IPFS) network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. IPWB splits the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, builds a CDXJ index with references to the IPFS hashes returns, and combines the headers and payload from IPFS at the time of replay. We also explore the possibility of an index-free, fully decentralized collaborative web archiving system as the next step.
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.
Introducing Web Archiving and WSDL Research GroupSawood Alam
My talk to introduce Web Archiving and the Web Science and Digital Libraries Research Group to some invited students from India for a summer workshop in Old Dominion University, Norfolk, VA
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
University of Virginia Colloquium
2016-09-12
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Ahmed AlSum, Michele C. Weigle, Michael L. Nelson,
Herbert Van de Sompel
TPDL 2013
September 22-26, 2013
Valletta, Malta
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
Justin F. Brunelle
Michele C. Weigle
Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University
@WebSciDL
IIPC 2016
Reykjavik, Iceland, April 11, 2016
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolMichael Nelson
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Justin F. Brunelle
Michael L. Nelson
Lyudmila Balakireva
Robert Sanderson
Herbert Van de Sompel
TPDL 2013, September 24, 2013
I presented this talk on creating RSS feeds at the European Innovative Users Group (EIUG) 2010 conference held at Aston University, 15-16 June 2010.
I describe a method of exporting and reusing metadata held in the Innovative Millennium LMS that enables reuse of the data and presentation as an RSS feed - in this case new books lists. This is achieved using Free / Open Sources software.
I explain how the process can be generalised to export of other bibliographic data, for example to export reading lists information to a VLE (BlackBoard) as XML, or presenting lists of e-resources on a Web site using a PHP front-end.
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Gaurav Vaidya
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikisource by Gaurav Vaidya, based on a paper by Andrea Thomer, Gaurav Vaidya*, Robert Guralnick, David Bloom and Laura Russell. Presented November 8, 2012 (http://www.mcn.edu/2012/extracting-data-historical-documents-crowdsourcing-annotations-wikisource)
Find out more at http://bit.ly/jhfnblog
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
Semantic Web will be the next big thing in the world of internet. This presentation talks about various approaches that can be used to query the underlying triple store that has all the information.
Have you ever been stuck downloading a module you already have? Have you ever been stuck without internet, unable to npm install when all the modules you need are stored in your coworkers' computers in the same LAN? Well no more! With the IPFS companion for npm, you get (a) distributed discovery: install modules seamlessly from any other computer you can reach; (b) cryptographic versioning: never install the same version twice; (c) free deduplication: don't download or store the same things multiple times.
npm-over-ipfs uses IPFS (the InterPlanetary Filesystem), a new file distribution protocol. IPFS is like Git meets Bittorrent; it is perfect for Node.js modules, it enables devs to have local caches, work offline or work in LANs, and use modules present in nearby machines.
A Framework for Aggregating Private and Public Web Archivesjcdl2018
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL
#jcdl2018
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Ahmed AlSum, Michele C. Weigle, Michael L. Nelson,
Herbert Van de Sompel
TPDL 2013
September 22-26, 2013
Valletta, Malta
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
Justin F. Brunelle
Michele C. Weigle
Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University
@WebSciDL
IIPC 2016
Reykjavik, Iceland, April 11, 2016
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolMichael Nelson
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Justin F. Brunelle
Michael L. Nelson
Lyudmila Balakireva
Robert Sanderson
Herbert Van de Sompel
TPDL 2013, September 24, 2013
I presented this talk on creating RSS feeds at the European Innovative Users Group (EIUG) 2010 conference held at Aston University, 15-16 June 2010.
I describe a method of exporting and reusing metadata held in the Innovative Millennium LMS that enables reuse of the data and presentation as an RSS feed - in this case new books lists. This is achieved using Free / Open Sources software.
I explain how the process can be generalised to export of other bibliographic data, for example to export reading lists information to a VLE (BlackBoard) as XML, or presenting lists of e-resources on a Web site using a PHP front-end.
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Gaurav Vaidya
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikisource by Gaurav Vaidya, based on a paper by Andrea Thomer, Gaurav Vaidya*, Robert Guralnick, David Bloom and Laura Russell. Presented November 8, 2012 (http://www.mcn.edu/2012/extracting-data-historical-documents-crowdsourcing-annotations-wikisource)
Find out more at http://bit.ly/jhfnblog
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
Semantic Web will be the next big thing in the world of internet. This presentation talks about various approaches that can be used to query the underlying triple store that has all the information.
Have you ever been stuck downloading a module you already have? Have you ever been stuck without internet, unable to npm install when all the modules you need are stored in your coworkers' computers in the same LAN? Well no more! With the IPFS companion for npm, you get (a) distributed discovery: install modules seamlessly from any other computer you can reach; (b) cryptographic versioning: never install the same version twice; (c) free deduplication: don't download or store the same things multiple times.
npm-over-ipfs uses IPFS (the InterPlanetary Filesystem), a new file distribution protocol. IPFS is like Git meets Bittorrent; it is perfect for Node.js modules, it enables devs to have local caches, work offline or work in LANs, and use modules present in nearby machines.
A Framework for Aggregating Private and Public Web Archivesjcdl2018
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL
#jcdl2018
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
Aqua Browser Implementation at Oklahoma State Universityyouthelectronix
On Wednesday November 7th Dr. Anne Prestamo discussed "AquaBrowser Implementation at Oklahoma
State Univerity Library" as part of a program on Next Generation Catalogs held at at the University of Massachusetts at Amherst and co-sponsored by the Five Colleges' Librarians
Council and Simmons College Graduate School of Library and Information Science (GSLIS).
Leveraging Wikipedia as a Hub for Data Integration: the Remixing Archival Metadata Project (RAMP)
Timothy A. Thompson, Metadata Librarian (Spanish/Portuguese Specialty), Princeton University Library
LANL Research Library
March 12, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
www.cs.odu.edu/~{mklein,mln}
Descriptive Standards and Applications in Memory InstitutionsE. Murphy
This presentation is for a group class project completed in the spring 2011 semester. The project examined metadata practices in 2 memory institutions as well as the current best practices for creating interoperable metadata.
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationDenis Shestakov
Full-text of my PhD dissertation titled "Search Interfaces on the Web: Querying and Characterizing" defended in ICT-Building, Turku, Finland on 12.06.2008
Thesis contributions:
* New methods for deep Web characterization
* Estimating the scale of a national segment of the Web
* Building a publicly available dataset describing >200 web databases on the Russian Web
* Designing and implementing the I-Crawler, a system for automatic finding and classifying search interfaces
* Technique for recognizing and analyzing JavaScript-rich and non-HTML searchable forms
* Introducing a data model for representing search interfaces and result pages
* New user-friendly and expressive form query language for querying search interfaces and extracting data from result pages
* Designing and implementing a prototype system for querying web databases
* Bibliography with over 110 references to publications in the area of deep Web
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
The promise of Linked Data is not that it is another way of aggregating data. For too long have library data been trapped within data-silos only accessible through obscure protocols. Why is access to library data still an issue? Letting everyone access and link to library data lets anyone build the next killer app. LIBRIS, the Swedish Union Catalogue is, since the beginning of this year, available as Linked Data. We discuss how and why. -- Martin Malmsten & Anders Söderbäck, National Library of Sweden
CDX Summary: Web Archival Collection InsightsSawood Alam
Large web archival collections are often opaque about their holdings. We created an open-source tool called, CDX Summary, to generate statistical reports based on URIs, hosts, TLDs, paths, query parameters, status codes, media types, date and time, etc. present in the CDX index of a collection of WARC files. Our tool also surfaces a configurable number of potentially good random memento samples from the collection for visual inspection, quality assurance, representative thumbnails generation, etc. The tool generates both human and machine readable reports with varying levels of details for different use cases. Furthermore, we implemented a Web Component that can render generated JSON summaries in HTML documents. Early exploration of CDX insights on Wayback Machine collections helped us improve our crawl operations.
Venue: TPDL 2022
Recording: https://www.youtube.com/watch?v=K5i3XShqW6A
Video Archiving and Playback in the Wayback MachineSawood Alam
At the Internet Archive (IA) we collect static and dynamic lists of seeds from various sources (like Save Page Now, Wikipedia EventStream, Cloudflare, etc.) for archiving. Some of these seeds include web pages with videos on them. Those URLs are curated based on certain criteria to identify potential videos that should be archived or excluded. Candidate video page URLs for archiving are placed in a queue (currently using Kafka) to be consumed by a separate process. We maintain a persistent database of videos we have already archived, which is used both for status tracking as well as a seen-check system to avoid duplicate downloads of large media files that usually do not change. We use youtube-dl (or one of its forks) to download videos and their metadata. We archive the container HTML page, associated video metadata, any transcriptions, thumbnails, and at least one of the many video files with different resolutions and formats. These pieces are stored in separate WARC records (some with “response” type and others as “metadata”). Some popular video streaming services do not have static links to embed video files, which makes it difficult to identify and serve video files corresponding to their container HTML pages on archival replay. To glue related pieces together for replay we are currently using a key-value store, but exploring ways to get away with an additional index. We are using a custom video player and perform necessary rewriting in the container HTML page for a more reliable video playback experience. We create a daily summary of metadata of videos that we have archived and load it in a custom-built Video Archiving Insights dashboard to identify any issues or biases, which are utilized as a feedback loop for quality assurance and to enhance our curation criteria and archiving strategies. We are always looking forward to ways to improve the system that works at scale as well as means to interoperate.
Recording: youtube.com/watch?v=6MiYKOq_DKo
Supporting Web Archiving via Web PackagingSawood Alam
We describe challenges related to web archiving, replaying archived web resources, and verifying their authenticity. We show that Web Packaging has significant potential to help address these challenges and identify areas in which changes are needed in order to fully realize that potential.
Position Paper: https://arxiv.org/abs/1906.07104
Presented in the Internet Architecture Board's ESCAPE 2019 Workshop (Exploring Synergy between Content Aggregation and the Publisher Ecosystem)
https://www.iab.org/activities/workshops/escape-workshop/
A brief introduction to WARC File Format used for long-term Web Archival preservation. These slides were initially prepared to give a guest lecture in the CS 531 Web Server Design (Fall 2018) course at Old Dominion University.
MemGator - A Memento Aggregator CLI and Server in GoSawood Alam
MemGator - A Portable Concurrent Memento Aggregator CLI and Server Written in Go.
The corresponding poster can be found at http://www.cs.odu.edu/~salam/presentations/memgator-jcdl16-poster.pdf
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
Live-leakage (zombie resource) is an issue in archival replay of web pages. This work proposes a mechanism to avoid such live-leakage using ServiceWorker. This work was presented in WADL 2017 on June 22 in Toronto, Ontario, Canada.
Client-side Reconstruction of Composite Mementos Using ServiceWorkerSawood Alam
Live-leakage (zombie resource) is an issue in archival replay of web pages. This work proposes a mechanism to avoid such live-leakage using ServiceWorker. This work was presented in JCDL 2017 on June 20 in Toronto, Ontario, Canada.
A talk given to final year B.Tech. Computer Science students at Jamia Millia Islamia, New Delhi, India with the intent of spreading awareness about web archiving and digital preservation and motivating the students for research.
HTTP Mailbox - Asynchronous RESTful CommunicationSawood Alam
A Thesis Presentation to the Faculty of Old Dominion University in Partial Fulfillment of the Requirements for the Degree of Master of Science Computer Science
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
1. Sawood Alam
Internet Archive, San Francisco, CA
Michael L. Nelson and Michele C. Weigle
Old Dominion University, Norfolk, VA
Profiling Web Archival Voids
for Memento Routing
#MementoMap
#ArchivalVoids
@ibnesayeed
@WebSciDL
Supported in part by NSF Grant IIS-1526700
JCDL '21, September 27-30, 2021, Urbana-Champaign, Illinois
https://arxiv.org/abs/2108.03311
2. @ibnesayeed | @WebSciDL
MemGator Log Responses From Various Archives
2
93% of the requests
made from MemGator
to upstream archives
were wasteful.
Only about one third
of the requests to the
largest web archive
(IA) were a hit.
3. @ibnesayeed | @WebSciDL
Who Bears the Cost of Bad Routing Decisions?
3
Actual
Present in the Archive Not in the Archive
Predicted
Routed to the
Archive
True Positive (TP) False Positive (FP)
Not Routed to
the Archive
False Negative (FN) True Negative (TN)
FP: Wasteful (Infrastructure suffers)
FN: Disuse (Users suffer)
4. @ibnesayeed | @WebSciDL
What is Archived in Arquivo.pt?
What is Accessed from MemGator?
4
2B URI-Rs that have
1-9 mementos each in
Arquivo.pt were never
requested from ODU’s
MemGator server.
43 URI-Rs were
requested thousands
of times each, but
had zero mementos
in Arquivo.pt.
45 URI-Rs had tens
of mementos each
that were requested
hundreds of times.
5. @ibnesayeed | @WebSciDL
What is Archived in Arquivo.pt?
What is Accessed from MemGator?
5
Blind spot of a
usage-based
profile
Blind spot of a
content-based
profile
6. @ibnesayeed | @WebSciDL
MementoMap of Archival Holdings Profile
6
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
https://arxiv.org/abs/1905.12607
1.5% Relative Cost yields 60%
Accuracy.
Arquivo.pt can save 60%
wasted traffic by publishing a
119MB summary file of their
1.8T CDX files (containing 5B
mementos of 2B URI-Rs)
7. @ibnesayeed | @WebSciDL
Why Profile Archival Voids?
7
$ curl -I https://web.archive.org/web/https://quora.com/
HTTP/1.1 403 FORBIDDEN
Server: nginx/1.15.8
Date: Wed, 02 Dec 2020 20:39:33 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Server-Timing: captures_list;dur=0.150497
X-App-Server: wwwb-app58
X-ts: 403
The Internet Archive has
many “*.com” domains,
but it may not want to
capture or replay some.
8. @ibnesayeed | @WebSciDL
Sources Accessing TimeMaps
8
LANL’s usage-based
Archival Holdings
profiling reduced
requests significantly.
Profiling Archival
Voids would improve
it even further.
12. @ibnesayeed | @WebSciDL
Most Frequently Accessed URIs
12
Most of the traffic to
“fccn.pt” is originated
from the UptimeRobot
Always returned a “404 Not Found” response.
13. @ibnesayeed | @WebSciDL
404-Only Frequencies and Request Savings
13
An archival voids profile of 2.4k URIs, that were accessed hundreds of
times each or more, could have saved about 8.4% of wasted requests.
14. @ibnesayeed | @WebSciDL
Archival Voids Recommendations
14
● Keep Archival Voids profiles separate from Archival Holdings
● Update often
● Use specific keys with only high confidence
● Profile only resources that are high in demand
● Archives themselves are better sources of truth than external observers
https://arxiv.org/abs/2108.03311