A brief introduction to WARC File Format used for long-term Web Archival preservation. These slides were initially prepared to give a guest lecture in the CS 531 Web Server Design (Fall 2018) course at Old Dominion University.
This presentation provides an overview of the Memento "Time Travel for the Web" framework that is aligned with the stable version of the Memento protocol, specified in RFC 7089.
This introduction to Drupal 6 was presented to the Chicago Web Professionals meetup as the third in a series of CMS introductions (following WordPress and Joomla)
This presentation provides an overview of the Memento "Time Travel for the Web" framework that is aligned with the stable version of the Memento protocol, specified in RFC 7089.
This introduction to Drupal 6 was presented to the Chicago Web Professionals meetup as the third in a series of CMS introductions (following WordPress and Joomla)
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
Since the dawn of time, Linux has had to make do with inferior IO interfaces. Native Linux AIO supports only a niche application class (O_DIRECT), and even for that use case, it’s far too slow for modern storage. This talk will detail io_uring, a modern IO interface for Linux, that’s both fully featured and performant.
Jens Axboe
[KubeCon EU 2021] Introduction and Deep Dive Into ContainerdAkihiro Suda
Join containerd maintainers and reviewers in a combined introduction and deep dive session. They will discuss the overview and the recent updates of containerd as well as how it is being used by Kubernetes, Docker and other container-based systems. The brief introduction about its architecture and service design will be included. The talk will also deep dive into how to leverage contained by extending and customizing it for your use case with low-level plugins like remote snapshotters, as well as by implementing your own containerd client. Upcoming features and recent discussion in containerd community will also be covered.
- - -
https://kccnceu2021.sched.com/event/iE6v/introduction-and-deep-dive-into-containerd-kohei-tokunaga-akihiro-suda-ntt-corporation?iframe=no
Introduction to WordPress, Installation of WordPress, Overview of WordPress Dashboard, Post, Pages, Categories, Tags, User Roles, Comments, Careers in WordPress
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
Talk for USENIX LISA 2016 by Brendan Gregg.
"Linux 4.x Tracing Tools: Using BPF Superpowers
The Linux 4.x series heralds a new era of Linux performance analysis, with the long-awaited integration of a programmable tracer: Enhanced BPF (eBPF). Formally the Berkeley Packet Filter, BPF has been enhanced in Linux to provide system tracing capabilities, and integrates with dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). This has allowed dozens of new observability tools to be developed so far: for example, measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived.
In this talk I'll show you how to use BPF in the Linux 4.x series, and I'll summarize the different tools and front ends available, with a focus on iovisor bcc. bcc is an open source project to provide a Python front end for BPF, and comes with dozens of new observability tools (many of which I developed). These tools include new BPF versions of old classics, and many new tools, including: execsnoop, opensnoop, funccount, trace, biosnoop, bitesize, ext4slower, ext4dist, tcpconnect, tcpretrans, runqlat, offcputime, offwaketime, and many more. I'll also summarize use cases and some long-standing issues that can now be solved, and how we are using these capabilities at Netflix."
Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of Linux performance tools, touring common problems with system tools, metrics, statistics, visualizations, measurement overhead, and benchmarks. You might discover that tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many talks on tools that work, including giving the Linux PerformanceTools talk originally at SCALE. This is an anti-version of that talk, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice for verifying new performance tools, understanding how they work, and using them successfully.
Video: https://www.youtube.com/watch?v=JRFNIKUROPE . Talk for linux.conf.au 2017 (LCA2017) by Brendan Gregg, about Linux enhanced BPF (eBPF). Abstract:
A world of new capabilities is emerging for the Linux 4.x series, thanks to enhancements that have been included in Linux for to Berkeley Packet Filter (BPF): an in-kernel virtual machine that can execute user space-defined programs. It is finding uses for security auditing and enforcement, enhancing networking (including eXpress Data Path), and performance observability and troubleshooting. Many new open source tools that have been written in the past 12 months for performance analysis that use BPF. Tracing superpowers have finally arrived for Linux!
For its use with tracing, BPF provides the programmable capabilities to the existing tracing frameworks: kprobes, uprobes, and tracepoints. In particular, BPF allows timestamps to be recorded and compared from custom events, allowing latency to be studied in many new places: kernel and application internals. It also allows data to be efficiently summarized in-kernel, including as histograms. This has allowed dozens of new observability tools to be developed so far, including measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more.
This talk will summarize BPF capabilities and use cases so far, and then focus on its use to enhance Linux tracing, especially with the open source bcc collection. bcc includes BPF versions of old classics, and many new tools, including execsnoop, opensnoop, funcccount, ext4slower, and more (many of which I developed). Perhaps you'd like to develop new tools, or use the existing tools to find performance wins large and small, especially when instrumenting areas that previously had zero visibility. I'll also summarize how we intend to use these new capabilities to enhance systems analysis at Netflix.
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
Since the dawn of time, Linux has had to make do with inferior IO interfaces. Native Linux AIO supports only a niche application class (O_DIRECT), and even for that use case, it’s far too slow for modern storage. This talk will detail io_uring, a modern IO interface for Linux, that’s both fully featured and performant.
Jens Axboe
[KubeCon EU 2021] Introduction and Deep Dive Into ContainerdAkihiro Suda
Join containerd maintainers and reviewers in a combined introduction and deep dive session. They will discuss the overview and the recent updates of containerd as well as how it is being used by Kubernetes, Docker and other container-based systems. The brief introduction about its architecture and service design will be included. The talk will also deep dive into how to leverage contained by extending and customizing it for your use case with low-level plugins like remote snapshotters, as well as by implementing your own containerd client. Upcoming features and recent discussion in containerd community will also be covered.
- - -
https://kccnceu2021.sched.com/event/iE6v/introduction-and-deep-dive-into-containerd-kohei-tokunaga-akihiro-suda-ntt-corporation?iframe=no
Introduction to WordPress, Installation of WordPress, Overview of WordPress Dashboard, Post, Pages, Categories, Tags, User Roles, Comments, Careers in WordPress
Quick overview on the journey of storage w.r.t containers and emerging trends in container storage. A look at the requirements and architectural patterns of the new storage startups. Presented at the Docker meetup#24 at #GoJekIndia
View IT operations as a flow of data (Sources of Truth) thru work-cells (automation processes) to deliver value to the customer.
There should be only one source of truth for every piece of configuration data.
Device configurations are poor source of truth.
Presented at 3|SHARE's EVOLVE'14 - The Adobe Experience Manager Community Summit on Tuesday November 18th, 2014 at the Hard Rock Hotel in San Diego, CA. evolve14.com
Web scale infrastructures with kubernetes and flannelpurpleocean
La capacità di rispondere in poche frazioni di secondo alle richieste degli utenti - indipendentemente dal loro numero - è un fattore determinante per il successo dei servizi sul web. Secondo Amazon, bastano 100 millisecondi di latenza nella risposta per generare una perdita economica di circa l'1% sul
fatturato [1]. In base alle statistiche di Google AdWords, inoltre, il 2015 ha sancito l’ufficiale superamento del numero di interazioni mobile rispetto a quelle desktop [2], con la conseguente riduzione della durata media delle sessioni di navigazione web.
In uno scenario di questo tipo, la razionalizzazione dell’utilizzo delle risorse hardware e la capacità di scalare rispetto al numero di utenti sono fattori determinanti per il successo del business.
In questo talk racconteremo la nostra esperienza di migrazione di soluzioni e-commerce di tipo enterprise in Magento da un’architettura basata su VM tradizionali ad una di tipo software-defined basata su Kubernetes, Flannel e Docker. Discuteremo, quindi, delle reali difficoltà da noi incontrate nel porting su container di soluzioni in produzione e daremo evidenza di come, alla fine di questo lungo viaggio, i nostri sforzi siano stati concretamente premiati dall’aumento di resilienza, affidabilità e automazione della soluzione finale.
A supporto della conversazione, mostreremo i risultati dei benchmark da noi condotti per valutare la scalabilità della nuova architettura presentando delle evidenze delle reali capacità di Kubernetes come strumento di orchestrazione di servizi erogati in Docker container.
Concluderemo l’intervento presentando il nostro progetto di distribuzione geografica dei nodi master di Kubernetes facendo uso di reti SD-WAN per garantire performance e continuità di servizio della soluzione.
Less and faster – Cache tips for WordPress developersSeravo
Otto Kekäläinen, the code-loving CEO of Seravo held a webinar on May 12, 2020, that focused on the cache: what should a WordPress developer know and which are the best practices to follow?
CDX Summary: Web Archival Collection InsightsSawood Alam
Large web archival collections are often opaque about their holdings. We created an open-source tool called, CDX Summary, to generate statistical reports based on URIs, hosts, TLDs, paths, query parameters, status codes, media types, date and time, etc. present in the CDX index of a collection of WARC files. Our tool also surfaces a configurable number of potentially good random memento samples from the collection for visual inspection, quality assurance, representative thumbnails generation, etc. The tool generates both human and machine readable reports with varying levels of details for different use cases. Furthermore, we implemented a Web Component that can render generated JSON summaries in HTML documents. Early exploration of CDX insights on Wayback Machine collections helped us improve our crawl operations.
Venue: TPDL 2022
Recording: https://www.youtube.com/watch?v=K5i3XShqW6A
Video Archiving and Playback in the Wayback MachineSawood Alam
At the Internet Archive (IA) we collect static and dynamic lists of seeds from various sources (like Save Page Now, Wikipedia EventStream, Cloudflare, etc.) for archiving. Some of these seeds include web pages with videos on them. Those URLs are curated based on certain criteria to identify potential videos that should be archived or excluded. Candidate video page URLs for archiving are placed in a queue (currently using Kafka) to be consumed by a separate process. We maintain a persistent database of videos we have already archived, which is used both for status tracking as well as a seen-check system to avoid duplicate downloads of large media files that usually do not change. We use youtube-dl (or one of its forks) to download videos and their metadata. We archive the container HTML page, associated video metadata, any transcriptions, thumbnails, and at least one of the many video files with different resolutions and formats. These pieces are stored in separate WARC records (some with “response” type and others as “metadata”). Some popular video streaming services do not have static links to embed video files, which makes it difficult to identify and serve video files corresponding to their container HTML pages on archival replay. To glue related pieces together for replay we are currently using a key-value store, but exploring ways to get away with an additional index. We are using a custom video player and perform necessary rewriting in the container HTML page for a more reliable video playback experience. We create a daily summary of metadata of videos that we have archived and load it in a custom-built Video Archiving Insights dashboard to identify any issues or biases, which are utilized as a feedback loop for quality assurance and to enhance our curation criteria and archiving strategies. We are always looking forward to ways to improve the system that works at scale as well as means to interoperate.
Recording: youtube.com/watch?v=6MiYKOq_DKo
Profiling Web Archival Voids for Memento RoutingSawood Alam
Slides of the paper presentation, "Profiling Web Archival Voids for Memento Routing", for JCDL 2021.
Authors: Sawood Alam, Michele C. Weigle, Michael L. Nelson
Preprint: https://arxiv.org/abs/2108.03311
Recording: https://youtu.be/ImJWkndNoS8
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
Potential utilization of the emerging Web technology, Web Bundles, in Web archiving, presented at the IIPC WAC 2021 in Session 8 by Sawood Alam.
Recording: https://youtu.be/lQX9v9V0FRQ
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
Topic: Doctoral Dissertation Defense
Title: MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
Student: Sawood Alam
University: Old Dominion University
Date: Friday, December 4, 2020
Supporting Web Archiving via Web PackagingSawood Alam
We describe challenges related to web archiving, replaying archived web resources, and verifying their authenticity. We show that Web Packaging has significant potential to help address these challenges and identify areas in which changes are needed in order to fully realize that potential.
Position Paper: https://arxiv.org/abs/1906.07104
Presented in the Internet Architecture Board's ESCAPE 2019 Workshop (Exploring Synergy between Content Aggregation and the Publisher Ecosystem)
https://www.iab.org/activities/workshops/escape-workshop/
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
We introduce MementoMap, a framework to express and disseminate holdings of web archives (archive profiles) by themselves or third parties. The framework allows arbitrary, flexible, and dynamic levels of details in its entries that fit the needs of archives of different scales. This enables Memento aggregators to significantly reduce wasted traffic to web archives.
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure an archived resource has remained unaltered since the time it was captured. Some web archives do not allow users to access fixity information and, more importantly, even if fixity information is available, it is provided by the same archive from which the archived resources are requested. In this research, we propose two approaches, namely Atomic and Block, to establish and check fixity of archived resources.
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
In this work we propose MementoMap, a flexible and adaptive framework to efficiently summarize holdings of a web archive. We described a simple, yet extensible, file format suitable for MementoMap. We used the complete index of the arquivo.pt comprising 5B mementos (archived web pages/files) to understand the nature and shape of its holdings. We generated MementoMaps with varying amount of detail from its HTML pages that have an HTTP status code of 200 OK. Additionally, we designed a single-pass, memory-efficient, and parallelization-friendly algorithm to compact a large MementoMap into a small one and an in-file binary search method for efficient lookup. We analyzed more than three years of MemGator (a Memento aggregator) logs to understand the response behavior of 14 public web archives. We evaluated MementoMaps by measuring their Accuracy using 3.3M unique URIs from MemGator logs. We found that a MementoMap of less than 1.5% Relative Cost (as compared to the comprehensive listing of all the unique original URIs) can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive while maintaining 100% Recall (i.e., zero false negatives).
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
InterPlanetary Wayback (IPWB) facilitates permanence and collaboration in web archives by disseminating the contents of WARC files into the InterPlanetary File System (IPFS) network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. IPWB splits the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, builds a CDXJ index with references to the IPFS hashes returns, and combines the headers and payload from IPFS at the time of replay. We also explore the possibility of an index-free, fully decentralized collaborative web archiving system as the next step.
MemGator - A Memento Aggregator CLI and Server in GoSawood Alam
MemGator - A Portable Concurrent Memento Aggregator CLI and Server Written in Go.
The corresponding poster can be found at http://www.cs.odu.edu/~salam/presentations/memgator-jcdl16-poster.pdf
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
Live-leakage (zombie resource) is an issue in archival replay of web pages. This work proposes a mechanism to avoid such live-leakage using ServiceWorker. This work was presented in WADL 2017 on June 22 in Toronto, Ontario, Canada.
Client-side Reconstruction of Composite Mementos Using ServiceWorkerSawood Alam
Live-leakage (zombie resource) is an issue in archival replay of web pages. This work proposes a mechanism to avoid such live-leakage using ServiceWorker. This work was presented in JCDL 2017 on June 20 in Toronto, Ontario, Canada.
Introducing Web Archiving and WSDL Research GroupSawood Alam
My talk to introduce Web Archiving and the Web Science and Digital Libraries Research Group to some invited students from India for a summer workshop in Old Dominion University, Norfolk, VA
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
1. Sawood Alam
Web Science and Digital Libraries Research Group
Old Dominion University
Norfolk, Virginia, USA
@ibnesayeed
CS 531 Web Server Design
November 28, 2018
The Web ARChive (WARC)
File Format
2. Web ARChive (WARC): ISO 28500 File Format
2@ibnesayeed
https://github.com/iipc/warc-specifications
4. HTTP Response vs. WARC Record
4
HTTP headers
Payload
WARC headers
@ibnesayeed
5. Why WARC and not Plain Filesystem?
5@ibnesayeed
● Number of inodes
● Name collision
● Deduplication
● Rich metadata
● Optimized for long-term Web preservation
12. WARC with warcio
12@ibnesayeed
from warcio.capture_http import capture_http
import requests
with capture_http('example.warc.gz'):
requests.get('https://example.com/')
from warcio.archiveiterator import ArchiveIterator
with open('example.warc.gz', 'rb') as stream:
for record in ArchiveIterator(stream):
if record.rec_type == 'response':
print(record.rec_headers.get_header('WARC-Target-URI'))
Write a WARC file
Read from a WARC file
14. WebPackage: Similar, but not the same!
14@ibnesayeed
● Package a group of related HTTP requests and responses to transmit and store together
● Optionally sign messages to allow third parties to store and deliver asynchronously
● Make browsers verify signed packages using origins’ valid certificates
● Differences from WARC
○ Binary instead of textual
○ Not suitable for long-term preservation due to signing that would eventually expire
https://github.com/WICG/webpackage
15. Conclusions
15@ibnesayeed
● Web ARChive (WARC) is a well-supported and evolving ISO standard data format
● It is a text-based HTTP Message-like wrapper format
● It can store arbitrary number of HTTP request/response messages (and various other data types)
along with a rich set of metadata
● Optimized for long-term Web preservation
https://github.com/iipc/warc-specifications