SlideShare a Scribd company logo
1 of 20
Machine Learning Approach in
Web Proxy Cache
Replacement.
Sivaraj Nimishan
2011/CSC/016
Superviser
Sriskandarajah Shriparen
Web Proxy Caching
• Solution for improving the performance of Web-based systems is Web proxy caching
Cache Replacements
• In the proxy cache replacement, the proxy cache must
effectively decide which objects are worth caching or
replacing with other objects.
LRU
LFU
LFU-DA
GDSF
The least recently used objects are removed first.
Dynamic aging factor is incorporated into LFU.
Size, Cost of fetching, Dynamic aging factor integrated with
frequency
The least frequently used objects are removed first.
Squid
Squid log format
LRU : The LRU policies keeps recently referenced objects.
heap GDSF : The heap GDSF policy optimizes object hit rate by keeping smaller popular
objects in cache
heap LFUDA : The heap LFUDA policy keeps popular objects in cache regardless of their size
heap LRU : LRU policy implemented using a heap
timestamp
response time
client address
status codes
size
request method
URL client identity
Hierarchy Code
content type
Machine Learning
Support Vector Machine Decision tree
Data collection
Billion Triples Challenge 2012 Dataset
The dataset was crawled during May/June 2012. Several seed sets collected from mulitple sources.
Datahub A Data Ecosystem for Individuals, Teams and People
DBpedia
DBpedia is a crowd-sourced community effort to extract structured information from
Wikipedia and make this information available on the Web.
Freebase A community-curated database of well-known people, places, and things
Rest
The seed set for the Rest crawl contained all other URIs involved in a relation in the
DBpedia
Timbl
Timbl crawl consisted of Tim Berners-Lee's Friend of a Friend (FOAF)project.
(2 files)
Preprocessing
Data Set Size from to
Datahub 136.8MB [Thu Apr 26 20:07:13 2012] [Fri Apr 27 16:20:16 2012]
DBpedia 170.3MB [Tue May 1 07:46:29 2012] [Fri Apr 27 21:19:02 2012]
Freebase 123.6MB [Fri Apr 27 07:18:03 2012] [Mon Apr 30 12:31:49 2012]
Rest 32MB [Mon Apr 30 13:34:06 2012] [Mon Apr 30 18:46:04 2012]
Timbl 1 138.5MB [Sat May 5 21:05:02 2012] [Tue May 8 07:50:56 2012]
Timbl 2 179.5MB [Tue May 15 20:29:22 2012] [Wed May 23 04:53:27 2012]
Data Set Requests Cacheable requests %
Datahub 398547 181850 45.63 %
DBpedia 1382090 537038 38.86 %
Freebase 333956 145010 43.42 %
Rest 71972 18942 26.32 %
Timbl 1 889591 323451 36.36 %
Timbl 2 1675106 680952 40.65 %
Total 4751262 1887243 39.72 %
Preprocessing... successful entries with status codes 200
Preprocessing...
SWL Sliding Window Length of 30 minutes-( Romano and ElAarag)
Target attribute is obtained by backward-looking sliding window
1 ; if the object is revisited within the sliding window
Target attribute =
0 ; otherwise
Attributes Values
time 1335442301
duration 379
client 127.0.0.1
result_code TCP_MISS/200
size 1609
method GET
URL http://www.opencalais.com/robots.txt
{
a perl command used to convert the unix time-stamp to human-readable timestamp
tail access.log | perl -p -e 's/^([0-9]*)/"[".localtime($1)."]"/e'
Preprocessing...
access.log
connection.java
Labelinsert.java
InsertMongoDB.java
access.cs
v
mongoexport
Preprocessing...
Methodology
Performance Measure
Hit Ratio is the factor widely used in evaluating the
performance of web caching
i.e, Hit Ratio is defined as the percentage of requests
that can be satisfied by the cache.
Hit Ratio = * 100
Hit Ratio
Cacheable requests
Machine Learner
WSO2 Machine Learner is a product which helps
to manage and explore the data, build machine
learning models after analyzing the data using
machine learning algorithms, compare and manage
generated machine learning models and predict using
the built models.
Apache Spark is a fast and general engine for large-scale
data processing.
Easy graphical user interface for human-friendly viewing
Access the ML UI from a Web browser using the following URL: https://<ML_HOST>:<ML_PORT>/ml
to run ML : <PRODUCT_HOME>/bin/wso2server.sh
SVM Decision TreeParameters
100 : Iterations
0.001 : Learning Rate
1 : SGD Data Fraction
L1 : Reg Type
0.001 : Reg Parameter
Parameters
Max Depth : 30
Max Bins : depend on unique features
Impurity : gini/entropy
Data set Total
requests
Number of
hits
Hit ratio
Datahub2 54557 45357 83.13
Dbpedia 181114 105883 58.46
Freebase 43507 32527 74.76
Rest 5685 4428 77.88
Timbl 97039 42390 43.68
Timbl2 206708 135149 66.15
Data set Total
requests
Number of
hits
Hit ratio
Datahub2 54557 25470 46.68
Dbpedia 181114 118418 65.38
Freebase 43507 26359 60.58
Rest 5685 1519 26.71
Timbl 97039 58243 60.02
Timbl2 204288 96822 47.39
Conclusion
Data Set Requests Cacheable
requests
Hit Ratio(%)
Datahub 398547 181850 83.13
DBpedia 1382090 537038 65.38
Freebase 333956 145010 74.76
Rest 71972 18942 77.88
Timbl 1 889591 323451 60.02
Timbl 2 1675106 680952 66.15
In this study SVM and Decision
Tree approches were used to train
proxy logs files to classify the
contents of Web proxy cache.
The hit ratio calculated by the
classification decisions made by
the trained SVM and trained
Decision tree
The performance of Web caching
can be improved using supervised
machine learning.
Classifiers can be utilized to improve the hit ratio of traditional Web caching policies.
References
S. Romano and H. ElAarag, "A neural network proxy cache replacement strategy and its implementation in
the Squid proxy server", Neural Computing & Applications, Vol. 20, No. 1, (2011), pp. 59-78.
A. I. Vakali, "LRU-based algorithms for Web Cache Replacement"
W. Ali S. Sulaiman, and N. Ahmad "Performance Improvement of Least-Recently Used Policy in Web Proxy Cache
Replacement Using Supervised Machine Learning" Int. J. Advance. Soft Comput. Appl., Vol. 6, No.1 ,(2014)
Introducing Machine Learner https://docs.wso2.com/display/ML100/Introducing+Machine+Learner
Squid: Optimising Web Delivery http://www.squid-cache.org/
Machine learning in Web proxy caching

More Related Content

What's hot

The Weather of the Century Part 2: High Performance
The Weather of the Century Part 2: High PerformanceThe Weather of the Century Part 2: High Performance
The Weather of the Century Part 2: High PerformanceMongoDB
 
The Weather of the Century
The Weather of the CenturyThe Weather of the Century
The Weather of the CenturyMongoDB
 
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureCloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureAnkur Dave
 
The Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationThe Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationMongoDB
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010RonnBlack
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 

What's hot (6)

The Weather of the Century Part 2: High Performance
The Weather of the Century Part 2: High PerformanceThe Weather of the Century Part 2: High Performance
The Weather of the Century Part 2: High Performance
 
The Weather of the Century
The Weather of the CenturyThe Weather of the Century
The Weather of the Century
 
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows AzureCloudClustering: Toward a scalable machine learning toolkit for Windows Azure
CloudClustering: Toward a scalable machine learning toolkit for Windows Azure
 
The Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationThe Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: Visualization
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 

Similar to Machine learning in Web proxy caching

Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1Jungsu Heo
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰台灣資料科學年會
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidAlluxio, Inc.
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"Hideyuki Kawashima
 
LCA14: LCA14-205: Optimizing SQLite for Android mobile
LCA14: LCA14-205: Optimizing SQLite for Android mobileLCA14: LCA14-205: Optimizing SQLite for Android mobile
LCA14: LCA14-205: Optimizing SQLite for Android mobileLinaro
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...rschuppe
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 
Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...
Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...
Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...IIIT Hyderabad
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...Edge AI and Vision Alliance
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
Jecb sigmod2014
Jecb sigmod2014Jecb sigmod2014
Jecb sigmod2014Khai Tran
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar SlidesSumo Logic
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithmDipak Badhe
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxTier1 app
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertChris Adkin
 

Similar to Machine learning in Web proxy caching (20)

Restfs internals
Restfs internalsRestfs internals
Restfs internals
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
LCA14: LCA14-205: Optimizing SQLite for Android mobile
LCA14: LCA14-205: Optimizing SQLite for Android mobileLCA14: LCA14-205: Optimizing SQLite for Android mobile
LCA14: LCA14-205: Optimizing SQLite for Android mobile
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...
Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...
Content Moderation Across Multiple Platforms with Capsule Networks and Co-Tra...
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
Jecb sigmod2014
Jecb sigmod2014Jecb sigmod2014
Jecb sigmod2014
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
this-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptxthis-is-garbage-talk-2022.pptx
this-is-garbage-talk-2022.pptx
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
 
Diadem 1.0
Diadem 1.0Diadem 1.0
Diadem 1.0
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Machine learning in Web proxy caching

  • 1. Machine Learning Approach in Web Proxy Cache Replacement. Sivaraj Nimishan 2011/CSC/016 Superviser Sriskandarajah Shriparen
  • 2. Web Proxy Caching • Solution for improving the performance of Web-based systems is Web proxy caching
  • 3. Cache Replacements • In the proxy cache replacement, the proxy cache must effectively decide which objects are worth caching or replacing with other objects. LRU LFU LFU-DA GDSF The least recently used objects are removed first. Dynamic aging factor is incorporated into LFU. Size, Cost of fetching, Dynamic aging factor integrated with frequency The least frequently used objects are removed first.
  • 4. Squid Squid log format LRU : The LRU policies keeps recently referenced objects. heap GDSF : The heap GDSF policy optimizes object hit rate by keeping smaller popular objects in cache heap LFUDA : The heap LFUDA policy keeps popular objects in cache regardless of their size heap LRU : LRU policy implemented using a heap timestamp response time client address status codes size request method URL client identity Hierarchy Code content type
  • 5. Machine Learning Support Vector Machine Decision tree
  • 6. Data collection Billion Triples Challenge 2012 Dataset The dataset was crawled during May/June 2012. Several seed sets collected from mulitple sources. Datahub A Data Ecosystem for Individuals, Teams and People DBpedia DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. Freebase A community-curated database of well-known people, places, and things Rest The seed set for the Rest crawl contained all other URIs involved in a relation in the DBpedia Timbl Timbl crawl consisted of Tim Berners-Lee's Friend of a Friend (FOAF)project. (2 files)
  • 7. Preprocessing Data Set Size from to Datahub 136.8MB [Thu Apr 26 20:07:13 2012] [Fri Apr 27 16:20:16 2012] DBpedia 170.3MB [Tue May 1 07:46:29 2012] [Fri Apr 27 21:19:02 2012] Freebase 123.6MB [Fri Apr 27 07:18:03 2012] [Mon Apr 30 12:31:49 2012] Rest 32MB [Mon Apr 30 13:34:06 2012] [Mon Apr 30 18:46:04 2012] Timbl 1 138.5MB [Sat May 5 21:05:02 2012] [Tue May 8 07:50:56 2012] Timbl 2 179.5MB [Tue May 15 20:29:22 2012] [Wed May 23 04:53:27 2012]
  • 8. Data Set Requests Cacheable requests % Datahub 398547 181850 45.63 % DBpedia 1382090 537038 38.86 % Freebase 333956 145010 43.42 % Rest 71972 18942 26.32 % Timbl 1 889591 323451 36.36 % Timbl 2 1675106 680952 40.65 % Total 4751262 1887243 39.72 % Preprocessing... successful entries with status codes 200
  • 9. Preprocessing... SWL Sliding Window Length of 30 minutes-( Romano and ElAarag) Target attribute is obtained by backward-looking sliding window 1 ; if the object is revisited within the sliding window Target attribute = 0 ; otherwise Attributes Values time 1335442301 duration 379 client 127.0.0.1 result_code TCP_MISS/200 size 1609 method GET URL http://www.opencalais.com/robots.txt {
  • 10. a perl command used to convert the unix time-stamp to human-readable timestamp tail access.log | perl -p -e 's/^([0-9]*)/"[".localtime($1)."]"/e' Preprocessing...
  • 13. Performance Measure Hit Ratio is the factor widely used in evaluating the performance of web caching i.e, Hit Ratio is defined as the percentage of requests that can be satisfied by the cache. Hit Ratio = * 100 Hit Ratio Cacheable requests
  • 14. Machine Learner WSO2 Machine Learner is a product which helps to manage and explore the data, build machine learning models after analyzing the data using machine learning algorithms, compare and manage generated machine learning models and predict using the built models. Apache Spark is a fast and general engine for large-scale data processing. Easy graphical user interface for human-friendly viewing
  • 15. Access the ML UI from a Web browser using the following URL: https://<ML_HOST>:<ML_PORT>/ml to run ML : <PRODUCT_HOME>/bin/wso2server.sh SVM Decision TreeParameters 100 : Iterations 0.001 : Learning Rate 1 : SGD Data Fraction L1 : Reg Type 0.001 : Reg Parameter Parameters Max Depth : 30 Max Bins : depend on unique features Impurity : gini/entropy
  • 16.
  • 17. Data set Total requests Number of hits Hit ratio Datahub2 54557 45357 83.13 Dbpedia 181114 105883 58.46 Freebase 43507 32527 74.76 Rest 5685 4428 77.88 Timbl 97039 42390 43.68 Timbl2 206708 135149 66.15 Data set Total requests Number of hits Hit ratio Datahub2 54557 25470 46.68 Dbpedia 181114 118418 65.38 Freebase 43507 26359 60.58 Rest 5685 1519 26.71 Timbl 97039 58243 60.02 Timbl2 204288 96822 47.39
  • 18. Conclusion Data Set Requests Cacheable requests Hit Ratio(%) Datahub 398547 181850 83.13 DBpedia 1382090 537038 65.38 Freebase 333956 145010 74.76 Rest 71972 18942 77.88 Timbl 1 889591 323451 60.02 Timbl 2 1675106 680952 66.15 In this study SVM and Decision Tree approches were used to train proxy logs files to classify the contents of Web proxy cache. The hit ratio calculated by the classification decisions made by the trained SVM and trained Decision tree The performance of Web caching can be improved using supervised machine learning. Classifiers can be utilized to improve the hit ratio of traditional Web caching policies.
  • 19. References S. Romano and H. ElAarag, "A neural network proxy cache replacement strategy and its implementation in the Squid proxy server", Neural Computing & Applications, Vol. 20, No. 1, (2011), pp. 59-78. A. I. Vakali, "LRU-based algorithms for Web Cache Replacement" W. Ali S. Sulaiman, and N. Ahmad "Performance Improvement of Least-Recently Used Policy in Web Proxy Cache Replacement Using Supervised Machine Learning" Int. J. Advance. Soft Comput. Appl., Vol. 6, No.1 ,(2014) Introducing Machine Learner https://docs.wso2.com/display/ML100/Introducing+Machine+Learner Squid: Optimising Web Delivery http://www.squid-cache.org/

Editor's Notes

  1. The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.
  2. seed data is a collection of information that is used as training, testing, or as a template
  3. S. Romano and H. ElAarag, "A neural network proxy cache replacement strategy and its implementation in the Squid proxy server" Vol. 20, No. 1, (2011), pp. 59-78. 30 mins : can increase the training performance when using large training datasets. The idea is to use information about a Web object requested in the past to predict revisiting of suchWeb object within the sliding window.
  4. High Write Load MongoDB by default prefers high insert rate handle highly diverse data types, and manage applications more efficiently
  5. representational state transfer (REST) is the software architectural style of the World Wide Web. DAS: Data Analytics Server
  6. Stochastic gradient descent : simplest method to solve optimization problems i.e, optimization method for minimizing an objective function a step size in GD “Gini” to minimize misclassification “Entropy” for exploratory analysis these differ less than 2% of the time Gini”will tend to find the largest class, and “entropy” tends to find groups of classes that make up ~50% of the data