You must have encountered the following image when using screaming frog.
Many websites do not have these parameters when crawling by screaming frog.
One of the most important issues for search engines is security.
Defeating Cross-Site Scripting with Content Security Policy (updated)Francois Marier
How a new HTTP response header can help increase the depth of your web application defenses.
Also includes a few slides on HTTP Strict Transport Security, a header which helps protects HTTPS sites from sslstrip attacks.
Same-origin policy is an important security concept of the modern browser languages like JavaScript but becomes an obstacle for developers when building complex client-side apps. Over time there have been lots of ingenious workarounds using JSON-P, IFRAME and proxies. As of January 2013 the well known Cross Origin Resource Sharing (CORS) comes as proposed standard by W3C and has now native support by all major browsers.
You must have encountered the following image when using screaming frog.
Many websites do not have these parameters when crawling by screaming frog.
One of the most important issues for search engines is security.
Defeating Cross-Site Scripting with Content Security Policy (updated)Francois Marier
How a new HTTP response header can help increase the depth of your web application defenses.
Also includes a few slides on HTTP Strict Transport Security, a header which helps protects HTTPS sites from sslstrip attacks.
Same-origin policy is an important security concept of the modern browser languages like JavaScript but becomes an obstacle for developers when building complex client-side apps. Over time there have been lots of ingenious workarounds using JSON-P, IFRAME and proxies. As of January 2013 the well known Cross Origin Resource Sharing (CORS) comes as proposed standard by W3C and has now native support by all major browsers.
Content Security Policy (CSP) is a browser security mechanism against content injection. Using the CSP header, browsers can restrict content from just the domains whitelisted in the policy. This session shares lessons learned with deploying CSP at Yahoo.
The basics of search engine marketing have been shared by Digital Marketer Riya Agarwal in this ppt. Contains a lot about SEO and introduction of Pay Per Click.
Web App Security Presentation by Ryan Holland - 05-31-2017TriNimbus
Web App Security - A presentation by Ryan Holland, Sr. Director, Cloud Architecture at Alert Logic for the Vancouver AWS User Group Meetup on May 31, 2017.
Arcomem training Specifying Crawls Beginnersarcomem
This presentation on Specifying Crawls is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Arcomem training Specifying Crawls Advancedarcomem
This presentation on Specifying Crawls is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Speaker 1: Ashwin Vamshi
Speaker 2: Abhinav Singh
Cloud services are built for increased collaboration and productivity, and provide capabilities like auto sync and API level communication. This has led enterprises to exclusively use SaaS, PaaS and IaaS services for storing and sharing critical and confidential data. End users as well as security products tend to place implicit trust in cloud vendors such as Microsoft, AWS, Google, and SaaS app vendors such as Box, Salesforce, DropBox. As a result, cybercriminals have started launching their attacks from these trusted cloud services. This talk will focus on how attackers are abusing these trusted cloud services to create Phishing attacks that are highly effective and hard to detect.
INTERFACE by apidays 2023 - Something Old, Something New, Colin Domoney, 42Cr...apidays
INTERFACE by apidays 2023
APIs for a “Smart” economy. Embedding AI to deliver Smart APIs and turn into an exponential organization
June 28 & 29, 2023
https://www.apidays.global/interface/
Something Old, Something New - OWASP API Security Top 10 in 2023
Colin Domoney, CTO at 42Crunch
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
Abstract : The number of web pages is increasing into millions and trillions around the world. To make
searching much easier for users, web search engines came into existence. Web Search engines are used to find
specific information on the World Wide Web. Without search engines, it would be almost impossible to locate
anything on the Web unless or until a specific URL address is known. This information is provided to search by
a web crawler which is a computer program or software. Web crawler is an essential component of search
engines, data mining and other Internet applications. Scheduling Web pages to be downloaded is an important
aspect of crawling. Previous research on Web crawl focused on optimizing either crawl speed or quality of the
Web pages downloaded. While both metrics are important, scheduling using one of them alone is insufficient
and can bias or hurt overall crawl process. This paper is all about design a new Web Crawler using VB.NET
Technology.
Keywords: Web Crawler, Visual Basic Technology, Crawler Interface, Uniform Resource Locator.
January 2017 presentation to the Toronto Wordpress group WP Todoers. Explains key steps in choosing an SSL certificate and then implementing it successfully on a Wordpress site. Intended for beginning and intermediate Wordpress users
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
Phishing is a form of social engineering where attackers deceive people into revealing sensitive information or installing malware such as ransomware.Phishing (pronounced: fishing) is an attack that attempts to steal your money, or your identity, by getting you to reveal personal information -- such as credit card numbers, bank information, or passwords -- on websites that pretend to be legitimate.
Content Security Policy (CSP) is a browser security mechanism against content injection. Using the CSP header, browsers can restrict content from just the domains whitelisted in the policy. This session shares lessons learned with deploying CSP at Yahoo.
The basics of search engine marketing have been shared by Digital Marketer Riya Agarwal in this ppt. Contains a lot about SEO and introduction of Pay Per Click.
Web App Security Presentation by Ryan Holland - 05-31-2017TriNimbus
Web App Security - A presentation by Ryan Holland, Sr. Director, Cloud Architecture at Alert Logic for the Vancouver AWS User Group Meetup on May 31, 2017.
Arcomem training Specifying Crawls Beginnersarcomem
This presentation on Specifying Crawls is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Arcomem training Specifying Crawls Advancedarcomem
This presentation on Specifying Crawls is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Speaker 1: Ashwin Vamshi
Speaker 2: Abhinav Singh
Cloud services are built for increased collaboration and productivity, and provide capabilities like auto sync and API level communication. This has led enterprises to exclusively use SaaS, PaaS and IaaS services for storing and sharing critical and confidential data. End users as well as security products tend to place implicit trust in cloud vendors such as Microsoft, AWS, Google, and SaaS app vendors such as Box, Salesforce, DropBox. As a result, cybercriminals have started launching their attacks from these trusted cloud services. This talk will focus on how attackers are abusing these trusted cloud services to create Phishing attacks that are highly effective and hard to detect.
INTERFACE by apidays 2023 - Something Old, Something New, Colin Domoney, 42Cr...apidays
INTERFACE by apidays 2023
APIs for a “Smart” economy. Embedding AI to deliver Smart APIs and turn into an exponential organization
June 28 & 29, 2023
https://www.apidays.global/interface/
Something Old, Something New - OWASP API Security Top 10 in 2023
Colin Domoney, CTO at 42Crunch
------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
Abstract : The number of web pages is increasing into millions and trillions around the world. To make
searching much easier for users, web search engines came into existence. Web Search engines are used to find
specific information on the World Wide Web. Without search engines, it would be almost impossible to locate
anything on the Web unless or until a specific URL address is known. This information is provided to search by
a web crawler which is a computer program or software. Web crawler is an essential component of search
engines, data mining and other Internet applications. Scheduling Web pages to be downloaded is an important
aspect of crawling. Previous research on Web crawl focused on optimizing either crawl speed or quality of the
Web pages downloaded. While both metrics are important, scheduling using one of them alone is insufficient
and can bias or hurt overall crawl process. This paper is all about design a new Web Crawler using VB.NET
Technology.
Keywords: Web Crawler, Visual Basic Technology, Crawler Interface, Uniform Resource Locator.
January 2017 presentation to the Toronto Wordpress group WP Todoers. Explains key steps in choosing an SSL certificate and then implementing it successfully on a Wordpress site. Intended for beginning and intermediate Wordpress users
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
Phishing is a form of social engineering where attackers deceive people into revealing sensitive information or installing malware such as ransomware.Phishing (pronounced: fishing) is an attack that attempts to steal your money, or your identity, by getting you to reveal personal information -- such as credit card numbers, bank information, or passwords -- on websites that pretend to be legitimate.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
The Evolution of Science Education PraxiLabs’ Vision- Presentation (2).pdfmediapraxi
The rise of virtual labs has been a key tool in universities and schools, enhancing active learning and student engagement.
💥 Let’s dive into the future of science and shed light on PraxiLabs’ crucial role in transforming this field!
Understanding and Mitigating the Security Risks of Content Inclusion in Web Browsers
1. Understanding and Mitigating the Security
Risks of Content Inclusion in Web Browsers
Sajjad Arshad
PhD Thesis Defense
Friday, 12 April 2019
2. Content Inclusion on the Web
➔ Websites include various types of content to create interactive user interfaces
◆ JavaScript
◆ Cascading Style Sheets (CSS)
➔ 93% of the most popular websites include JavaScript from external sources
◆ JavaScript libraries are hosted on fast content delivery networks (CDNs)
◆ Integration with advertising networks, analytics frameworks, and social media
➔ Browser extensions enhance browsers with additional capabilities
◆ Fine-grained filtering of content
◆ Access cross-domain content
◆ Perform network requests
2
3. Security Risks
➔ Malvertising by third-party content
◆ Launch drive-by downloads
◆ Redirect visitors to phishing sites
◆ Generate fraudulent clicks on ads
3
5. Security Risks
➔ Malvertising by third-party content
◆ Launch drive-by downloads
◆ Redirect visitors to phishing sites
◆ Generate fraudulent clicks on ads
➔ Ad(vertisement) injection by browser extensions
◆ Divert revenue from content publishers
◆ Harm the reputation of the publisher from the user’s perspective
◆ Expose users to malware and phishing
5
7. Security Risks
➔ Malvertising by third-party content
◆ Launch drive-by downloads
◆ Redirect visitors to phishing sites
◆ Generate fraudulent clicks on ads
➔ Ad(vertisement) injection by browser extensions
◆ Divert revenue from content publishers
◆ Harm the reputation of the publisher from the user’s perspective
◆ Expose users to malware and phishing
➔ Style injection by relative path overwrite (RPO)
◆ Sniffing users’ browsing histories
◆ Exfiltrate secrets from the website
7
8. Thesis Contributions
In this thesis, I investigate the feasibility and effectiveness of novel
approaches to understand and mitigate the security risks of content
inclusion for website publishers as well as their users. I show that
our novel techniques are complementary to the existing defenses.
➔ Detection of Malicious Third-Party Content Inclusions ⇒ Excision
➔ Identifying Ad Injection in Browser Extensions ⇒ OriginTracer
➔ Analysis of Style Injection by Relative Path Overwrite (RPO)
8
11. Content Security Policy
Content-Security-Policy: default-src 'self'; script-src: js.trusted.com
➔ Access control policy sent by web apps, enforced by browsers
➔ Whitelist of allowed origins to load resource from
➔ ISPs and browser extensions modify CSP rules
➔ Non-trivial to deploy
◆ Ad syndication or real-time ad auctions
◆ Arbitrary third-party resource inclusion by Browser extensions
11
15. Inclusion Sequence Classification
Goal: Given trained models, label inclusion sequences as either
benign or malicious
➔ Hidden Markov Models (HMM)
◆ Model inter-dependencies between resources in the sequence
◆ Trained one HMM for the benign class and one for the malicious class
◆ 20 states that are fully connected
➔ Training HMM is computationally expensive, but computing the
likelihood of a sequence is instead very efficient
◆ Good choice for real-time classification
15
16. Classification Features
R0 → R1 → … → Rn ⇒ [F0, …, F24] → [F0, …, F24] → … → [F0, …, F24]
➔ Convert the inclusion sequence into sequence of feature vectors
➔ 12 feature types from three categories (DNS, String, Role)
◆ Compute individual and relative features for each type (24 features)
◆ Continuous features are normalized on [0-1] and discretized
➔ Continuous relative feature values are computed by comparing the
individual value of the resource to its parent's individual value
◆ less, equal, or more
◆ none for the root resource
16
17. DNS Features
➔ Domain Level
◆ www.google.com has level 2
◆ Divide by maximum allowed levels (126)
➔ Alexa Ranking
◆ Divide the ranking by 1M
➔ Top-level Domain (TLD)
➔ Host Type
17
21. Role Features
➔ Three binary features
◆ Ad network
◆ Content delivery network (CDN)
◆ URL shortening service
➔ Manually compiled list
21
22. Implementation
➔ Modifications to Chromium browser
◆ Blink
◆ Extension Engine
➔ ~1,000 SLoC (C++) and several lines of JavaScript
➔ Tracking the start and termination of JavaScript execution
➔ Tracking content scripts injection and execution
➔ Tracks network requests
➔ Callbacks registered for events and timers
22
23. Data Collection
➔ Alexa Top 200K from June 2014 to May 2015
➔ Alexa Top 20 shopping sites with 292 ad-injecting extensions for one
week in June 16th-22nd, 2015
➔ Anti-cloaking
➔ Anti-fingerprinting
23
24. Building Labeled Dataset
➔ Trained classifiers using VirusTotal as ground truth
◆ host is reported malicious by at least 3 out of the 62 URL scanners
24
25. Detection Results
➔ 10-fold cross-validation
◆ FP ⇒ 0.59%
◆ FN ⇒ 6.61%
➔ Features effectiveness
◆ D ⇒ DNS
◆ S ⇒ String
◆ R ⇒ Role
25
26. Comparison with URL Scanners
➔ Compared detection results on new data from June 1 to July 14, 2015
➔ Found 89 suspicious hosts that were likely dedicated redirectors
◆ 44% were recently registered in 2015
◆ 23% no longer resolve to an IP address
➔ Detected 177 new malicious hosts later reported in VT
◆ 78% of the malicious hosts were not reported during the first week
26
28. Performance
➔ Automatically visited the Alexa Top 1K with original and modified
Chromium browsers for 10 times
➔ Installed 5 popular Chrome extensions
◆ Adblock Plus, Google Translate, Google Dictionary, Evernote
WebClipper, and Tampermonkey
➔ Average 12.2% page latency overhead
➔ 3.2 seconds delay on browser startup time
28
29. Usability
➔ 10 students that self-reported as expert Internet users
➔ Each participant explored 50 random websites from Alexa Top 500
◆ Excluded websites requiring a login or involving sensitive subject matter
➔ Out of 5,129 web pages visited:
◆ 31 malicious inclusions
◆ 83 errors (mostly high latency resource loads)
➔ No broken extension was reported
29
33. Motivation
➔ Centralized dynamic analysis is non-trivial
◆ Hiding behaviors during the analysis time, triggering ad injection
➔ Third-party content injection or modification is quite common
➔ Non-trivial to delineate between wanted and unwanted behavior
Users are best positioned to make this judgment
33
34. OriginTracer
OriginTracer adds fine-grained content provenance tracking to the
web browser
➔ Provenance tracked at level of individual DOM elements
➔ Indicates origins contributing to content injection and modification
➔ Trustworthy communication of this information to the user
34
38. Implementation
➔ Modifications to Chromium browser
➔ ~900 SLoC (C++), several lines of JavaScript
➔ Mediates DOM APIs for node creation and modification
➔ Mediates node insertion through document writes
➔ Callbacks registered for events and timers
38
39. User Study Setup
➔ Study population: 80 students of varying technical sophistication
➔ Participants exposed to six Chromium instances (unmodified and
modified), each with an ad-injecting extension installed
◆ Auto Zoom, Alpha Finder, X-Notifier, Candy Zapper, uTorrent,
Gethoneybadger
➔ Participants were asked to visit three retail websites
◆ Amazon, Walmart, Alibaba
39
41. Susceptibility to Ad Injection
Are users generally willing to click on the advertisements presented
to them?
41
42. Ability to Identify Injected Ads
Do content provenance indicators assist users in recognizing
injected advertisements?
42
43. Usability of Content Provenance
Would users be willing to adopt a provenance tracking system to
identify injected advertisements?
43
44. Performance
➔ Configured an unmodified Chromium and OriginTracer instance to
visit the Alexa Top 1K
◆ Broad spectrum of static and dynamic content on most-used websites
◆ Browsers configured with five benign extensions
➔ Average 10.5% browsing latency overhead
➔ No impact on browser start-up time
44
45. Usability
➔ Separate user study on 13 students of varying technical background
➔ Asked participants to browse 50 websites out of Alexa Top 500
➔ Asked users to report errors
◆ Type I: browser crash, page doesn’t load, etc.
◆ Type II: abnormal load time, page appearance not as expected
➔ Out of almost 2K URLs, two Type I and 27 Type II errors were
reported
➔ No broken extensions was reported
45
47. Relative Path Overwrite (RPO)
➔ Browser’s interpretation of URL may be different than the web server
◆ Browsers basically treat URLs as file system-like paths
◆ However, URL may not correspond to an actual server-side file system structure,
or web server may internally rewrite parts of the URL
➔ RPO exploits the semantic disconnect between browsers and web servers in
interpreting relative paths ⇒ path confusion
◆ Injects style (CSS) instead of script (JS)
◆ Turns a simple text injection vulnerability into a style sink
◆ “Self-reference”: Attacked document uses “itself” as stylesheet
➔ Threat model of RPO resembles that of XSS
◆ e.g., steal sensitive information
47
56. Scriptless (Style-based) Attacks
56
➔ Script injection is NOT always possible
◆ Input sanitization
◆ Browser-based XSS filters
◆ Content Security Policy (CSP)
➔ Successful attacks are possible by injecting CSS
◆ Exfiltrating credit card number and CSRF tokens (Heiderich et al., CCS 2012)
● CSS attribute accessor, content property, animation features, media queries
➔ Attacks typically consist of payload & injection technique
➔ Our work is not concerned about the payload
◆ Focus on how to inject (“transport”) the payload
57. Preconditions for Successful Attack
1. Relative stylesheet path (no base tag) ⇒ Candidate Identification
2. Crafted URL causes style reflection in server response ⇒
Vulnerability Detection
3. Browser loads and interprets injected style directives ⇒ Exploitability
Detection
57
58. Candidate Identification
➔ Common Crawl: extract pages with relative stylesheet path
◆ August 2016: >1.6B documents
◆ 203 M pages on nearly 6 M sites
➔ Filter 1: Alexa Top 1 million ranking
◆ 141 M pages on 223 K sites
➔ Filter 2: Group URLs using the same template
◆ Test only one random URL from each group
◆ 137 M pages on 222 K sites
58
59. Vulnerability Detection
➔ Developed a lightweight crawler based on Python Requests API
◆ Randomly selects one representative URL from each group
◆ Mutates the URL according to a number of path confusion techniques
● PAYLOAD ⇒ %0A{}body{background:NONCE}
◆ Requests the mutated URL
◆ Ignores the response if it contains <base> tag
◆ Extracts all relative stylesheet paths and expands them using the mutated URL
◆ requests each expanded stylesheet URL to find injected payload in the response
◆ Page would be vulnerable if at least one stylesheet’s response reflects the requested URL,
referrer URL, parameters, or cookie
➔ Path confusion techniques
◆ Path Parameter, Encoded Path, Encoded Query, Cookie
◆ We assume the page references relative stylesheet path ../style.css
59
60. Path Confusion - Path Parameter
➔ Some web frameworks (e.g., PHP, ASP, JSP) accept the input parameters as
a directory-like string following the name of the script in the URL
➔ Simply appends the payload as a subdirectory to the end of the URL
➔ CSS injection occurs if the page reflects page URL or referrer in the response
60
61. Path Confusion - Path Parameter
➔ Different web frameworks handle path parameters differently, which is why we
distinguish a few additional cases
◆ parameters separated by slashes in PHP/ASP and semicolons in JSP
61
62. Path Confusion - Encoded Path
➔ This targets web servers such as IIS that decode encoded slashes in the URL
for directory traversal, whereas web browsers DO NOT
➔ Use %2F, an encoded version of ‘/’, to inject our payload into the URL
➔ The canonicalized URL is equal to the original page URL
➔ CSS injection occurs if the page reflects page URL or referrer in the response
62
63. Path Confusion - Encoded Query
➔ We replace the URL query delimiter ‘?’ with its encoded version %3F so that
web browsers DO NOT interpret it as such
➔ We inject the payload into every value of the query string
➔ CSS injection happens if the page reflects either the URL, referrer, or any of
the query values in the HTML response
63
64. Path Confusion - Cookie
➔ Stylesheets referenced by a relative path are located in the same origin
◆ Cookies are sent when requesting the stylesheet
➔ CSS injection may be possible if:
◆ Attacker can create new cookies or tamper with existing ones, and
◆ The page reflects cookie values in the response
➔ Payload is injected into each cookie value
64
65. Exploitability Detection
➔ Verify whether the reflected CSS in the response is evaluated by the browser
◆ Built a crawler based on Google Chrome (and an extension for tainting cookie)
➔ Visit mutated vulnerable pages to check if injected style directives interpreted
◆ PAYLOAD ⇒ %0A}}}]]]{}body{background-image:url(/NONCE.gif)}
◆ Style is interpreted if injected image URL seen in network traffic
➔ Reflected CSS is not always interpreted by the browser
◆ Use of modern document types ⇒ browser doesn’t render page in quirks mode
➔ Overriding document types in Internet Explorer (IE)
◆ Load the page inside an iframe in Internet Explorer
◆ Used Fiddler for tainting cookies and recording HTTP requests/responses
◆ Turn on Compatibility View by setting “X-UA-Compatible” to “IE=EmulateIE7” via <meta> tag in
the parent page
65
66. Limitations
➔ We only looked for reflected, not stored, injection of style directives
➔ Manual analysis of a site might reveal more opportunities for style injection
that our crawler fails to detect automatically
➔ We did not analyze the vulnerability of pages not in Common Crawl
◆ We do not cover all sites, and we do not fully cover all pages within a site
➔ Results presented in this paper should be seen as a lower bound
➔ Our methodology can be applied to individual sites to analyze more pages
66
68. Alexa Ranking
➔ Six out of the ten largest sites are
represented in our candidate set
➔ Candidate set contains a higher
fraction of the largest sites and a
lower fraction of the smaller sites
➔ Our results better represent the
most popular sites, which receive
most visitors, and most potential
victims of RPO attacks
68
69. Relative Stylesheet Paths
➔ CDF of total and maximum number
of relative stylesheets per web page
and site, respectively
➔ 63.1% of the pages contain multiple
relative paths
◆ Increases the chances of finding a
successful RPO and style injection
point
69
70. Vulnerability Analysis
➔ A page is vulnerable if its response:
◆ Reflects the injected CSS
◆ Does not contain a base tag
➔ 1.2% of pages are vulnerable to at
least one of the injection techniques
➔ 5.4% of sites contain at least one
vulnerable page
➔ Path parameter is the most effective
technique against pages
➔ One third of the sites in Alexa Top 10,
8-10% in the Top 100K, and 4.9% in
100K-1M are vulnerable
70
71. Base Tag
➔ Correctly configured base tag can prevent path confusion
➔ Base tag was removed after path confusion in 603 pages on 60 sites
➔ Internet Explorer fetches two URLs for stylesheet
◆ One expanded according to the base URL specified in the tag
◆ One expanded using the page URL as the base
➔ Internet Explorer always applied the “confused” stylesheet, even when the
one based on the base tag URL loaded faster
71
72. Quirks Mode Doc. Types
➔ Browsers parse resources with a
non-CSS content type when the page
specifies a non-standard document type
(or none at all)
➔ Total of 4,318 distinct doc. Types
➔ Roughly a third result in quirks mode
◆ 1,271 (29.4%) force all the browsers
into quirks mode
◆ 1,378 (31.9%) cause at least one
browser to use quirks mode
➔ Internet Explorer’s framing trick forced
4,248 (98.4%) into quirks mode
72
73. Standardized Doc. Types
➔ ~1K doc. types result in quirks mode
➔ ~3K doc. types cause standards mode
➔ But, number of standardized doc. types is
several orders of magnitude smaller
◆ Only about 10 standards and quirks mode
doc. types are widely used in sites
◆ Majority are not standardized
◆ Differ from the standardized ones only by
small variations such as forgotten spaces or
misspellings
➔ 9.6% of pages use quirks modes
➔ 32.2% of sites contain at least one page
rendered in quirks mode
73
74. Exploitability Analysis
➔ Vulnerable pages that were exploitable
◆ 2.9% in Chrome
◆ 14.5% in Internet Explorer
● 5x more than in Chrome
➔ 6 highest-ranked sites were not exploitable
◆ Between 1.6% and 3.2% of sites in the
remaining buckets were exploitable
➔ IE is more effective except in cookie
◆ IE crawl was conducted one month later
◆ Anti-framing techniques
◆ Anti-MIME-sniffing header
74
75. Anti-Framing
1. X-Frame-Options response header (DENY, SAMEORIGIN, or ALLOW-FROM)
◆ 4,999 vulnerable pages on 391 sites used it correctly and prevented the attack
◆ However, 1,900 pages on 34 sites provided incorrect values for this header
● Out of which, 552 pages on 2 sites were exploited in Internet Explorer
2. frame-ancestors directive in Content Security Policy (not supported in IE)
◆ A whitelist of origins allowed to load the current page in a frame
3. Use JavaScript code to prevent framing of a page
◆ i.e., redirecting the top frame if the page is not the top window itself
◆ However, attackers can use the HTML5 sandbox attribute in the iframe tag and omit the
allow-top-navigation directive to render JavaScript frame-busting code ineffective
We did not implement any of these techniques to allow framing, which means that
more vulnerable pages could likely be exploited in practice
75
76. MIME Sniffing
➔ Many sites contain misconfigured content types
◆ Browsers attempt to infer the type based on the request context or file extension
● MIME sniffing, especially in quirks mode
➔ Setting X-Content-Type-Options: nosniff in response header block the
request if the requested type is:
◆ "style" and the MIME type is not "text/css", or
◆ "script" and the MIME type is not, i.e., “application/javascript”
➔ Only Firefox, Internet Explorer, and Edge respected this header at the time
◆ Chrome started supporting the header since January 2018
◆ IE blocked our injected CSS payload when nosniff was set even with framing trick
➔ 96,618 vulnerable pages across 232 sites had a nosniff response header
◆ 23 pages across 10 sites were exploitable in Chrome but not in Internet Explorer
76
77. Content Management Systems
➔ Many exploitable pages appeared to belong to well-known CMSes
◆ CMSes are installed on thousands of sites, fixing RPO vulnerability is impactful
➔ Detected 23 CMSes (Wappalyzer + manually)
◆ 41,288 pages across 1,589 sites
➔ Installed the latest versions (or used the online demos)
➔ Detected 4 exploitable CMSes
◆ 1 declared no document type
◆ 1 used a quirks mode document type
◆ 2 were exploited in IE using framing trick
◆ 40,255 pages across 1,197 sites (nearly 32k sites world-wide)
➔ Weaknesses were reported to the vendors
77
78. Mitigation Techniques
➔ Avoid path confusion
◆ Use only absolute (or root-relative) paths or <base> tag
➔ Avoid style injection
◆ Input sanitization (non-trivial)
● More targeted RPO attack variants can reference existing files
➔ Prevent stylesheets with syntax errors or no “text/css” content type
◆ Specify a modern document type: <!doctype html>
◆ Disable content type sniffing: X-Content-Type-Options
➔ Prevent Internet Explorer trick
◆ Disallow framing: X-Frame-Options
◆ Turn off compatibility view: X-UA-Compatible (IE=Edge)
78
79. Conclusion
➔ Excision
◆ Allows for preemptive blocking with moderate performance overhead
◆ Detected malicious hosts before appearing in the blacklists
➔ OriginTracer
◆ Allows users to make fine-grained trust decisions
◆ Evaluation shows it can be performed in an efficient and effective way
➔ RPO
◆ Style-based attacks require different countermeasures than XSS
◆ Easy-to-use and effective countermeasures exist to mitigate the attack
79
80. Thesis Publications
➔ Third-party Content Inclusion ⇒ Excision
◆ Sajjad Arshad, Amin Kharraz, William Robertson, “Include Me Out: In-Browser
Detection of Malicious Third-Party Content Inclusions”, Financial Cryptography and Data
Security (FC), 2016
➔ Ad Injection ⇒ OriginTracer
◆ Sajjad Arshad, Amin Kharraz, William Robertson, “Identifying Extension-based Ad
Injection via Fine-grained Web Content Provenance”, Research in Attacks, Intrusions
and Defenses (RAID), 2016
➔ Relative Path Overwrite (RPO)
◆ S. Arshad, Seyed Ali Mirheidari, Tobias Lauinger, Bruno Crispo, Engin Kirda, William
Robertson, “Large-Scale Analysis of Style Injection by Relative Path Overwrite”, The
Web Conference (WWW), 2018
80
81. Deep Crawling
➔ The Inclusion Tree crawler has been evolving
◆ Written in NodeJS
◆ Uses Chrome Remote Debugging protocol
◆ Publicly available (https://github.com/sajjadium/DeepCrawling)
➔ Web Security
◆ Tobias Lauinger, Abdelberi Chaabane, Sajjad Arshad, William Robertson, Christo
Wilson, Engin Kirda, “Thou Shalt Not Depend on Me: Analysing the Use of
Outdated JavaScript Libraries on the Web”, Network and Distributed System
Security Symposium (NDSS), 2017
81
82. Deep Crawling
➔ Tracking and Privacy
◆ Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo Wilson,
“Tracing Information Flows Between Ad Exchanges Using Retargeted Ads”,
USENIX Security Symposium, 2016
◆ Muhammad Ahmad Bashir, Sajjad Arshad, Christo Wilson, “Recommended For
You: A First Look at Content Recommendation Networks”, ACM Internet,
Measurement Conference (IMC), 2016
◆ Muhammad Ahmad Bashir, Sajjad Arshad, Engin Kirda, William Robertson,
Christo Wilson, “How Tracking Companies Circumvented Ad Blockers Using
WebSockets”, ACM Internet Measurement Conference (IMC), 2018
82
83. Other Works
➔ Malware Detection
◆ Amin Kharraz, Sajjad Arshad, Collin Muliner, William Robertson, Engin Kirda,
“UNVEIL: A Large-Scale, Automated Approach to Detecting Ransomware”,
USENIX Security Symposium, 2016
➔ Binary Analysis
◆ Reza Mirzazade farkhani, Saman Jafari, Sajjad Arshad, William Robertson, Engin
Kirda, Hamed Okhravi, “On the Effectiveness of Type-based Control Flow
Integrity”, Annual Computer Security Applications Conference (ACSAC), 2018
83