SlideShare a Scribd company logo
Simulating Realistic Web Traffic using NS-3
Charles Carrington Scott
Hampton University
Mentor: Walid Younes, PhD, Network Simulation Group
Abstract
Introduction
Results
Figure 2: httparchive.org snapshot (Apple.com)
Ø  The Network Simulation Group at Lawrence Livermore National Lab is working to build a
realistic model of web traffic between a users machine and the web server. This will be
performed by using a tool called NS-3. This tool is an open source software that allows you
to create and simulate networking simulation on an extensible platform. This large platform
is critical in order to fully simulate the many types of web traffic that exist. This module will
increase the realism in ns-3 by making it easy to represent traffic between the user and
servers, without actually representing actual content bytes.
Ø  Understanding how applications will execute in a network context, characterizing the
potential impact of these network threats, and predicting an effective defensive strategy are
critical capabilities we hope to achieve with the help of ns-3.
Ø  One important aspect to network simulation of this nature is understanding the many types
of requests that are sent to and back from a web server. Many of these requests happen to
be redirects to other web pages. Some of these redirects can be ads or even phishing
attacks. Taking a look at these proportions between the two will help aid in the
understanding of realistic web traffic.
Ø  The helpful resources that were used in order to collect the necessary information regarding each
website include: alexa.com, httparchive.org and sitereview.bluecoat.com.
•  alexa.com provides extensive data about the most popular websites from a sample of millions
of Internet users using over 25,000 different browser extensions. This tool was critical in terms
of the most popular sites that should be analyzed.
•  httparchive.org collects and permanently stores the Web’s digitized content. This source
provides a complete breakdown of many if not most websites
•  sitereview.bluecoat.com categorizes each website according to the nature of the site (i.e.
Technology, News/Media, and Shopping)
Ø  The histogram in Figure 4 shows an example of Category 20’s Redirects. During research it is found
that all categories tested resembles this model. It can be concluded there is a high percentage of
very low redirect counts in each of the tested categories.
Ø  Figure’s 5 and 6 shows the probability distribution between Redirects to Requests. This distribution
was found by dividing each websites Redirects to Requests. Using this model above we can
forecast the amount of Redirects each category may posses to Requests.Ø  To understand web traffic you must understand the types of requests that can exist in a topology
(see Figure 1). When a user clicks on a website a HTML document is fetched along with hundreds
of requests sent to and back from a web server to eventually populate what the user sees on that
specific webpage. This HTML document acts as a shell to the website in order so that requests can
complete the webpage. An issue that arises within these requests is that many redirects lie in a
significant fraction of websites. Redirects bring on a whole new issue as far as simulation is
concerned.
The diagrams presented in this section were created based on the data extracted from downloadable
databases according to the Python script. The data is divided between a number of different
categories of websites according to sitereview.bluecoat.com.
This research is far from over but the data presented shows that some things can be said in terms of
requests and redirects within we traffic. Below is a summary of the findings:
Ø  Most websites do not contain redirects
Ø  At least 10% of the websites within each category will have redirects
Ø  The produced models can provide an idea as to how many redirects will be performed for a given
amount of requests
Ø  We can determine the average number of requests
Future work will consist of taking a more in depth look into the relationships between website
categories and the proportions of redirects to requests. Also, taking a deeper look as to what these
redirects consist of (i.e. malicious sites, website ads). Essentially a Monte Carlo simulation will be
performed (on Figure’s 4 & 5) which will aid in producing a simulated model that will closely resemble
the realistic models produced here.
Ø  These redirects that seem to lie in
websites somewhat hide the true nature
of what these websites consist of. Thus
making it that much harder to reproduce
real web traffic simulations within our
modules. Taking a look at the proportions
between requests to redirects in websites
is the first step into tackling this tricky
aspect of web simulation. Breaking down
websites into certain categories (i.e.
Technology, News/Media, Shopping etc.)
will aid in the analysis of the data.
Creating models of this collected data will
give us a good breakdown of the amount
of redirects that exists in today’s web
traffic requests.
Ø  These tools were used in the collecting and the analyzing of this data. Next step, is parsing through
the thousands of websites using the recourses above from conveniently accessible downloadable
databases (i.e. httparchive.org). This was done by creating a simple Python script to parse through
the data while extracting out the number of redirects and request for each website.
Ø  The data is then plotted to aid in the understanding between the direct relationship between
categories of websites as referred from sitereview.bluecoat.com
Methods Results (Cont.)
Figure 1: Topology Example
Discussion/Future Work
0
200
400
600
800
1000
1200
1 3 5 7 9 11 13151719 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
# of Redirects
Category 20: Entertainment
Series2
Series1
0
20
40
60
80
100
120
140
160
180
200
1
12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
336
379
431
Redirect/Requests
Category 9: Business/Economy
Series1
Figure 4: # of Redirects Occurrences
Figure 5: Redirects/Requests probability distribution
Seri
0
20
40
60
80
100
120
140
160
180
1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
194
206
218
230
242
254
266
278
290
302
314
326
338
350
362
374
386
398
410
422
435
Redirects /Requests
Category 43: News/Media
Figure 6: Redirects/Requests probability distribution
Figure 3: httparchive.org snapshot (CNN.com)
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.	

 IM Review and Release Number:

More Related Content

What's hot

Bayesian networks for data integration in the absence of foreign keys
Bayesian networks for data integration in the absence of foreign keysBayesian networks for data integration in the absence of foreign keys
Bayesian networks for data integration in the absence of foreign keys
Shakas Technologies
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
butest
 
Usability Review of Mashup Tools
Usability Review of Mashup ToolsUsability Review of Mashup Tools
Usability Review of Mashup Tools
Tanya Ahmed
 
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNSINVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
IJCI JOURNAL
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
IJCNCJournal
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
Ghulam Imaduddin
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
gerogepatton
 
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
Brittne Kakulla, Ph.D.
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Seminar Report Mine
Seminar Report MineSeminar Report Mine
Seminar Report Mine
sachin narang
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
ijcsit
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
Mahesh Meniya
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
Monisha100
 
Summary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by BroderSummary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by Broder
Bhavesh Singh
 
Preventing private information inference attacks on social networks
Preventing private information inference attacks on social networksPreventing private information inference attacks on social networks
Preventing private information inference attacks on social networks
JPINFOTECH JAYAPRAKASH
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
home
 
FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
VaishaliSrigadhi
 
Flyer for Dr. Chirag Shah Speaker
Flyer for Dr. Chirag Shah SpeakerFlyer for Dr. Chirag Shah Speaker
Flyer for Dr. Chirag Shah Speaker
Laura A. Murray, MSIS, MBA
 
A survey of web metrics
A survey of web metricsA survey of web metrics
A survey of web metrics
unyil96
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 

What's hot (20)

Bayesian networks for data integration in the absence of foreign keys
Bayesian networks for data integration in the absence of foreign keysBayesian networks for data integration in the absence of foreign keys
Bayesian networks for data integration in the absence of foreign keys
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
 
Usability Review of Mashup Tools
Usability Review of Mashup ToolsUsability Review of Mashup Tools
Usability Review of Mashup Tools
 
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNSINVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
INVESTIGATING SIGNIFICANT CHANGES IN USERS’ INTEREST ON WEB TRAVERSAL PATTERNS
 
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
 
Social Network Analysis with Spark
Social Network Analysis with SparkSocial Network Analysis with Spark
Social Network Analysis with Spark
 
A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...A Comparative Analysis of Different Feature Set on the Performance of Differe...
A Comparative Analysis of Different Feature Set on the Performance of Differe...
 
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Seminar Report Mine
Seminar Report MineSeminar Report Mine
Seminar Report Mine
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Summary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by BroderSummary of Paper : Taxonomy of websearch by Broder
Summary of Paper : Taxonomy of websearch by Broder
 
Preventing private information inference attacks on social networks
Preventing private information inference attacks on social networksPreventing private information inference attacks on social networks
Preventing private information inference attacks on social networks
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
 
FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
 
Flyer for Dr. Chirag Shah Speaker
Flyer for Dr. Chirag Shah SpeakerFlyer for Dr. Chirag Shah Speaker
Flyer for Dr. Chirag Shah Speaker
 
A survey of web metrics
A survey of web metricsA survey of web metrics
A survey of web metrics
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
 

Similar to CyberDefPos_Scott

Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Denis Shestakov
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
butest
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
butest
 
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projectsM phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projects
Vijay Karan
 
M.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining ProjectsM.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining Projects
Vijay Karan
 
M.E Computer Science Data Mining Projects
M.E Computer Science Data Mining ProjectsM.E Computer Science Data Mining Projects
M.E Computer Science Data Mining Projects
Vijay Karan
 
H0314450
H0314450H0314450
H0314450
iosrjournals
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
OUM SAOKOSAL
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
anchalsinghdm
 
Effective Performance of Information Retrieval on Web by Using Web Crawling  
Effective Performance of Information Retrieval on Web by Using Web Crawling  Effective Performance of Information Retrieval on Web by Using Web Crawling  
Effective Performance of Information Retrieval on Web by Using Web Crawling  
dannyijwest
 
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUESTUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
IAEME Publication
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
IOSR Journals
 
H017124652
H017124652H017124652
H017124652
IOSR Journals
 
Web usage Mining Based on Request Dependency Graph
Web usage Mining Based on Request Dependency GraphWeb usage Mining Based on Request Dependency Graph
Web usage Mining Based on Request Dependency Graph
IRJET Journal
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET Journal
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
IOSR Journals
 
C03406021027
C03406021027C03406021027
C03406021027
theijes
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
Mumbai Academisc
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
ijmech
 
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
ijmech
 

Similar to CyberDefPos_Scott (20)

Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projectsM phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projects
 
M.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining ProjectsM.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining Projects
 
M.E Computer Science Data Mining Projects
M.E Computer Science Data Mining ProjectsM.E Computer Science Data Mining Projects
M.E Computer Science Data Mining Projects
 
H0314450
H0314450H0314450
H0314450
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
 
International conference On Computer Science And technology
International conference On Computer Science And technologyInternational conference On Computer Science And technology
International conference On Computer Science And technology
 
Effective Performance of Information Retrieval on Web by Using Web Crawling  
Effective Performance of Information Retrieval on Web by Using Web Crawling  Effective Performance of Information Retrieval on Web by Using Web Crawling  
Effective Performance of Information Retrieval on Web by Using Web Crawling  
 
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUESTUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
H017124652
H017124652H017124652
H017124652
 
Web usage Mining Based on Request Dependency Graph
Web usage Mining Based on Request Dependency GraphWeb usage Mining Based on Request Dependency Graph
Web usage Mining Based on Request Dependency Graph
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
C03406021027
C03406021027C03406021027
C03406021027
 
Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)Odam an optimized distributed association rule mining algorithm (synopsis)
Odam an optimized distributed association rule mining algorithm (synopsis)
 
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
DESIGN AND IMPLEMENTATION OF CARPOOL DATA ACQUISITION PROGRAM BASED ON WEB CR...
 
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
Design and Implementation of Carpool Data Acquisition Program Based on Web Cr...
 

CyberDefPos_Scott

  • 1. Simulating Realistic Web Traffic using NS-3 Charles Carrington Scott Hampton University Mentor: Walid Younes, PhD, Network Simulation Group Abstract Introduction Results Figure 2: httparchive.org snapshot (Apple.com) Ø  The Network Simulation Group at Lawrence Livermore National Lab is working to build a realistic model of web traffic between a users machine and the web server. This will be performed by using a tool called NS-3. This tool is an open source software that allows you to create and simulate networking simulation on an extensible platform. This large platform is critical in order to fully simulate the many types of web traffic that exist. This module will increase the realism in ns-3 by making it easy to represent traffic between the user and servers, without actually representing actual content bytes. Ø  Understanding how applications will execute in a network context, characterizing the potential impact of these network threats, and predicting an effective defensive strategy are critical capabilities we hope to achieve with the help of ns-3. Ø  One important aspect to network simulation of this nature is understanding the many types of requests that are sent to and back from a web server. Many of these requests happen to be redirects to other web pages. Some of these redirects can be ads or even phishing attacks. Taking a look at these proportions between the two will help aid in the understanding of realistic web traffic. Ø  The helpful resources that were used in order to collect the necessary information regarding each website include: alexa.com, httparchive.org and sitereview.bluecoat.com. •  alexa.com provides extensive data about the most popular websites from a sample of millions of Internet users using over 25,000 different browser extensions. This tool was critical in terms of the most popular sites that should be analyzed. •  httparchive.org collects and permanently stores the Web’s digitized content. This source provides a complete breakdown of many if not most websites •  sitereview.bluecoat.com categorizes each website according to the nature of the site (i.e. Technology, News/Media, and Shopping) Ø  The histogram in Figure 4 shows an example of Category 20’s Redirects. During research it is found that all categories tested resembles this model. It can be concluded there is a high percentage of very low redirect counts in each of the tested categories. Ø  Figure’s 5 and 6 shows the probability distribution between Redirects to Requests. This distribution was found by dividing each websites Redirects to Requests. Using this model above we can forecast the amount of Redirects each category may posses to Requests.Ø  To understand web traffic you must understand the types of requests that can exist in a topology (see Figure 1). When a user clicks on a website a HTML document is fetched along with hundreds of requests sent to and back from a web server to eventually populate what the user sees on that specific webpage. This HTML document acts as a shell to the website in order so that requests can complete the webpage. An issue that arises within these requests is that many redirects lie in a significant fraction of websites. Redirects bring on a whole new issue as far as simulation is concerned. The diagrams presented in this section were created based on the data extracted from downloadable databases according to the Python script. The data is divided between a number of different categories of websites according to sitereview.bluecoat.com. This research is far from over but the data presented shows that some things can be said in terms of requests and redirects within we traffic. Below is a summary of the findings: Ø  Most websites do not contain redirects Ø  At least 10% of the websites within each category will have redirects Ø  The produced models can provide an idea as to how many redirects will be performed for a given amount of requests Ø  We can determine the average number of requests Future work will consist of taking a more in depth look into the relationships between website categories and the proportions of redirects to requests. Also, taking a deeper look as to what these redirects consist of (i.e. malicious sites, website ads). Essentially a Monte Carlo simulation will be performed (on Figure’s 4 & 5) which will aid in producing a simulated model that will closely resemble the realistic models produced here. Ø  These redirects that seem to lie in websites somewhat hide the true nature of what these websites consist of. Thus making it that much harder to reproduce real web traffic simulations within our modules. Taking a look at the proportions between requests to redirects in websites is the first step into tackling this tricky aspect of web simulation. Breaking down websites into certain categories (i.e. Technology, News/Media, Shopping etc.) will aid in the analysis of the data. Creating models of this collected data will give us a good breakdown of the amount of redirects that exists in today’s web traffic requests. Ø  These tools were used in the collecting and the analyzing of this data. Next step, is parsing through the thousands of websites using the recourses above from conveniently accessible downloadable databases (i.e. httparchive.org). This was done by creating a simple Python script to parse through the data while extracting out the number of redirects and request for each website. Ø  The data is then plotted to aid in the understanding between the direct relationship between categories of websites as referred from sitereview.bluecoat.com Methods Results (Cont.) Figure 1: Topology Example Discussion/Future Work 0 200 400 600 800 1000 1200 1 3 5 7 9 11 13151719 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 # of Redirects Category 20: Entertainment Series2 Series1 0 20 40 60 80 100 120 140 160 180 200 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 210 221 232 243 254 265 276 287 298 309 336 379 431 Redirect/Requests Category 9: Business/Economy Series1 Figure 4: # of Redirects Occurrences Figure 5: Redirects/Requests probability distribution Seri 0 20 40 60 80 100 120 140 160 180 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 194 206 218 230 242 254 266 278 290 302 314 326 338 350 362 374 386 398 410 422 435 Redirects /Requests Category 43: News/Media Figure 6: Redirects/Requests probability distribution Figure 3: httparchive.org snapshot (CNN.com) This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. IM Review and Release Number: