SlideShare a Scribd company logo
1 of 24
Download to read offline
1
AutoBLG: Automatic URL Blacklist
Generator Using Search Space
Expansion and Filters 	
Bo Sun1,Mitsuaki Akiyama2,Takeshi Yagi2,
    Mitsuhiro Hatada1,Tatsuya Mori1
1,Waseda University
2,NTT Secure Platform Laboratories
	
IEEE	
  ISCC	
  2015
Background(1)	
•  The estimated number of drive-by-download
attacks is 4.3 M per day
	
2	
  
7%
93%
The	
  number	
  of	
  web-­‐based	
  a1acks	
  
other	
  a0acks	
   drive-­‐by-­‐download	
  a0ack	
  
Background(2)	
•  What is Drive-by-download attack
3	
  
user	
Landing page URL       	
 Exploit URL
  	
Malware download URL
   	
download malware automatically	
exploit
vulnerabilities 	
Click
on URL
Background(3)	
•  What is URL Blacklist	
4	
  
user	
Landing page URL         	
 Exploit URL
 	
Malware download URL
Landing page URL
Exploit URL
	
URL Blacklist	
Matching	
Block	
Malware download URL
Background(4)	
•  However, URL Blacklist cannot cope with previously
unseen malicious URLs
•  It is crucial to keep the URLs updated to make a URL
blacklist effective
5	
  
To collect fresh malicious URLs
Background(5)	
6	
  
30 trillion
unique URLs
Wild Internet	
Web client honeypot	
Scan
7	
  
Goal	
• Our main objective is to accelerate the process of
generating a URL blacklist automatically.
Idea	
Existing
Malicious
URLs	
New
Malicious
URLs	
Search Space
Filter
(Machine Learning)	
Expansion	
 Reduction	
Input:	
 Output:
AutoBLG Framework	
•  Three primary components:
	
8	
  
Img	
  from	
  h0p://www.itguyswa.com.au/free-­‐anJvirus-­‐protecJon/	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  h0ps://www.virustotal.com/ja/	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  h0p://www.soumu.go.jp/main_content/000174846.pdf	
8	
  
URL Expansion URL Flirtation	
8	
  
URL Verification
URL Expansion(1)Seed	
9	
  
http://2339XXX.net/main
http://auth.veXXXXX.com
Seed	
Pre-
processing	
Passive DNS
Database	
Search
Engine	
Web
Crawler
URL Expansion(2)Pre-processing	
10	
  
11X.5X.1XX.XX4
2X.XXX.X99.X2
Seed	
Pre-
processing	
Passive DNS
Database	
Search
Engine	
Web
Crawler
URL Expansion(3)
Passive DNS Database	
11	
  
sediscoXXXXXX.gruXXX.com
vorXXXXXXX.zdjecXXXki.com
Seed	
Pre-
processing	
Passive DNS
Database	
Search
Engine	
Web
Crawler
URL Expansion(4)
Search Engine and Web Crawler	
12	
  
http://100XXXXXwebcam.bXXX.pl/island-XXX-wXX.html
http://100XXXXXwebcam.bXXX.pl/isteam-XXXX.html
Seed	
Pre-
processing	
Passive DNS
Database	
Search
Engine	
Web
Crawler
URL Filrtation	
13	
  Img from http://www.primalsecurity.net/0xc-python-tutorial-python-malware/	
Existing Malicious URLs	
Unknown URLs	
 Similarity Search	
HTML Features	
Bayesian sets
URL Verification 	
•  Three tools for verification of drive-by-
download attacks  	
Ø Web Client honeypot Marionette
Ø Antivirus Software
Ø Virustotal online service
14	
  
Performance Evaluation	
15	
  
•  The number of URL Expansion data: 59,394
•  No URL Filtration: more than 100 hours
•  URL Filtration in use: approximately 6 hours
To accelerate the process of generating
blacklist URLs by adopting a high performance filter
Results(1)	
16	
  
Web client
honeypot	
Antivirus software	
 Virustotal	
1.16%	
 3.8%	
 16.5%	
•  Web Client Honeypot : definitely malicious
Ø  it contained redirecting to the exploit web
pages
•  Antivirus Software : highly suspicious
Ø  they contained several HTTP objects that were
detected by the antivirus checkers; (malicious
JavaScript or executable malware)
•  VirusTotal : suspicious
Ø  need further manual inspection
Results(2)	
•  some URLs are identified by multiple tools
•  After eliminating duplications, of the 600 of extracted URLs,
106 URLs were detected as malicious or suspicious
•  Of the discovered 106 URLs, seven URLs are completely new
URLs that have not been listed in the VirusTotal
17	
  
Limitation and future work

	
18	
  
Item	
 Limitation	
 Future work	
Search Engine	
 Only get Top-50 search
results
To accelerate web
search engine process
Web Crawler
	
evaded by
‘cloaking techniques’	
To develop more
sophisticated tools
Query Pattern	
 Miss several malicious URLs To increase the number
of query patterns	
URL Verification
	
Only two version of browser
or plug-in
To adopt a low-
interaction honeypot
Online
operation
	
Not fully online due to URL
Expansion part	
To pipeline URL
expansion step
Summary	
•  We have proposed the AutoBLG framework
Ø  light-weight
Ø  new and previously unknown drive-by-download URLs
Ø  other suspicious URLs that need for further analysis
	
  
•  Key ideas
Ø  the use of search space expansion and filters
•  We proposed a high-performance filter
Ø  it reduced number of URLs to be investigated with
the dynamic analysis systems by 99%
Ø  while successfully finding new URLs that have not
been listed in the widely used popular URL
reputation system
19	
  
Thank you for your listening
	
20
URL Filtration(1)Feature Extraction	
21	
  
HTML Feature	
 Difference with pervious works	
The number of elements with a small area Frameset tags
border,frameborder,framespacing
The number of suspicious word in the script’s
content
some strings such as
shellcode ,shcode.
The number of URLs with a different domain Only count URL with different
domain.    	
The number of iframe and frame tags
      same	
The number of hidden elements
The number of meta refresh tags
The number of out-of-place elements
The number of embed and object tags
The presence of unescape behavior
The number of setTimeout functions
URL Filtration(2)Similarity Search 	
22	
  
Similarity Search:
Bayesian Sets
From web
space	
Toyota
Nissan
Honda	
BMW
Ford
Audi
Mitsubishi
Mazda
Volkswagen	
Google Sets	
From all
unknown
URLs	
Adopting several
existing malicious
URL as query
(Malicious URLs
that are created
with same Exploit
Kit)	
To output all URLs’
Score in
descending
order. The higher
score is, the more
probably URL is
Malicious
22	
  
The range of experiment	
23	
  
Preliminary
Experiment
Performance Evaluation	
URL Expansion	
 URL Filtration	
 URL
Verification	
•Commercial blacklist
•Pre-processing
•Passive DNS database
•Search Engine
•Web crawler	
•Feature Extraction
•Similarity Search	
•Web Client Honeypot
•Antivirus Software
•VirusTotal	
Steps in URL Expansion
Steps in URL Filtration Tools in URL Verification
23	
  
Preliminary Experiment 	
24	
  
100
101
102
103
Top-K URLs
0
1
2
3
ThenumberofMaliciousURLs
Query Pattern1
Query Pattern2
•  Experiment Data
Ø  The number of
benign URLs:10,000
Ø  The number of
malicious URLs:6
	
•  Experiment Result
Ø  The two query patterns identify different three
malicious URLs in top 300 scores respectively and
extract all the six malicious URLs totally
Ø  we considered the top 300 scores as the
  threshold for URL filtration.
24	
  

More Related Content

What's hot

Secure Web Services
Secure Web ServicesSecure Web Services
Secure Web Services
Rob Daigneau
 

What's hot (18)

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
What should I do when my website got hack?
What should I do when my website got hack?What should I do when my website got hack?
What should I do when my website got hack?
 
Secure Web Services
Secure Web ServicesSecure Web Services
Secure Web Services
 
How To Webinar - Sumo Logic API
How To Webinar - Sumo Logic APIHow To Webinar - Sumo Logic API
How To Webinar - Sumo Logic API
 
Pentesting RESTful webservices
Pentesting RESTful webservicesPentesting RESTful webservices
Pentesting RESTful webservices
 
Scaling-up and Automating Web Application Security Tech Talk
Scaling-up and Automating Web Application Security Tech TalkScaling-up and Automating Web Application Security Tech Talk
Scaling-up and Automating Web Application Security Tech Talk
 
An experiment in agile threat modelling
An experiment in agile threat modellingAn experiment in agile threat modelling
An experiment in agile threat modelling
 
資料科學在 Whoscall 產品體系中的角色
資料科學在 Whoscall 產品體系中的角色資料科學在 Whoscall 產品體系中的角色
資料科學在 Whoscall 產品體系中的角色
 
Web application attacks
Web application attacksWeb application attacks
Web application attacks
 
Building an API Security Ecosystem
Building an API Security EcosystemBuilding an API Security Ecosystem
Building an API Security Ecosystem
 
Talking About SSRF,CRLF
Talking About SSRF,CRLFTalking About SSRF,CRLF
Talking About SSRF,CRLF
 
Sumo Logic QuickStart Webinar - Get Certified
Sumo Logic QuickStart Webinar - Get CertifiedSumo Logic QuickStart Webinar - Get Certified
Sumo Logic QuickStart Webinar - Get Certified
 
Secure Your REST API (The Right Way)
Secure Your REST API (The Right Way)Secure Your REST API (The Right Way)
Secure Your REST API (The Right Way)
 
Web Application Firewall: Suckseed or Succeed
Web Application Firewall: Suckseed or SucceedWeb Application Firewall: Suckseed or Succeed
Web Application Firewall: Suckseed or Succeed
 
Dive in burpsuite
Dive in burpsuiteDive in burpsuite
Dive in burpsuite
 
Workshop : Application Security
Workshop : Application SecurityWorkshop : Application Security
Workshop : Application Security
 
Presentation on Web Attacks
Presentation on Web AttacksPresentation on Web Attacks
Presentation on Web Attacks
 
Website hacking and prevention (All Tools,Topics & Technique )
Website hacking and prevention (All Tools,Topics & Technique )Website hacking and prevention (All Tools,Topics & Technique )
Website hacking and prevention (All Tools,Topics & Technique )
 

Similar to AutoBLG by Sun Bo

Automated web patrol with strider honey monkeys finding web sites that exploi...
Automated web patrol with strider honey monkeys finding web sites that exploi...Automated web patrol with strider honey monkeys finding web sites that exploi...
Automated web patrol with strider honey monkeys finding web sites that exploi...
UltraUploader
 

Similar to AutoBLG by Sun Bo (20)

Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!Tune in for the Ultimate WAF Torture Test: Bots Attack!
Tune in for the Ultimate WAF Torture Test: Bots Attack!
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
 
Web App Security Presentation by Ryan Holland - 05-31-2017
Web App Security Presentation by Ryan Holland - 05-31-2017Web App Security Presentation by Ryan Holland - 05-31-2017
Web App Security Presentation by Ryan Holland - 05-31-2017
 
Web Application Penetration Tests - Information Gathering Stage
Web Application Penetration Tests - Information Gathering StageWeb Application Penetration Tests - Information Gathering Stage
Web Application Penetration Tests - Information Gathering Stage
 
CSS 17: NYC - Protecting your Web Applications
CSS 17: NYC - Protecting your Web ApplicationsCSS 17: NYC - Protecting your Web Applications
CSS 17: NYC - Protecting your Web Applications
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022
 
Automated web patrol with strider honey monkeys finding web sites that exploi...
Automated web patrol with strider honey monkeys finding web sites that exploi...Automated web patrol with strider honey monkeys finding web sites that exploi...
Automated web patrol with strider honey monkeys finding web sites that exploi...
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
 
Hacker Proof web app using Functional tests
Hacker Proof web  app using Functional testsHacker Proof web  app using Functional tests
Hacker Proof web app using Functional tests
 
Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozCon
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
Css sf azure_8-9-17-protecting_web_apps_stephen coty_al
Css sf azure_8-9-17-protecting_web_apps_stephen coty_alCss sf azure_8-9-17-protecting_web_apps_stephen coty_al
Css sf azure_8-9-17-protecting_web_apps_stephen coty_al
 
CSS17: Houston - Protecting Web Apps
CSS17: Houston - Protecting Web AppsCSS17: Houston - Protecting Web Apps
CSS17: Houston - Protecting Web Apps
 
對抗釣魚與詐騙網站的經驗談
對抗釣魚與詐騙網站的經驗談對抗釣魚與詐騙網站的經驗談
對抗釣魚與詐騙網站的經驗談
 
www-thecuneiform-com-insights-why-how-code-audit-is-important-for-our-website...
www-thecuneiform-com-insights-why-how-code-audit-is-important-for-our-website...www-thecuneiform-com-insights-why-how-code-audit-is-important-for-our-website...
www-thecuneiform-com-insights-why-how-code-audit-is-important-for-our-website...
 
Vulnerabilities are bugs, Let's test for them!
Vulnerabilities are bugs, Let's test for them!Vulnerabilities are bugs, Let's test for them!
Vulnerabilities are bugs, Let's test for them!
 
IRJET - Building Your Own Search Engine
IRJET -  	  Building Your Own Search EngineIRJET -  	  Building Your Own Search Engine
IRJET - Building Your Own Search Engine
 
Browser isolation (isc)2 may presentation v2
Browser isolation (isc)2 may presentation v2Browser isolation (isc)2 may presentation v2
Browser isolation (isc)2 may presentation v2
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 

AutoBLG by Sun Bo

  • 1. 1 AutoBLG: Automatic URL Blacklist Generator Using Search Space Expansion and Filters Bo Sun1,Mitsuaki Akiyama2,Takeshi Yagi2,     Mitsuhiro Hatada1,Tatsuya Mori1 1,Waseda University 2,NTT Secure Platform Laboratories IEEE  ISCC  2015
  • 2. Background(1) •  The estimated number of drive-by-download attacks is 4.3 M per day 2   7% 93% The  number  of  web-­‐based  a1acks   other  a0acks   drive-­‐by-­‐download  a0ack  
  • 3. Background(2) •  What is Drive-by-download attack 3   user Landing page URL        Exploit URL    Malware download URL     download malware automatically exploit vulnerabilities Click on URL
  • 4. Background(3) •  What is URL Blacklist 4   user Landing page URL          Exploit URL   Malware download URL Landing page URL Exploit URL URL Blacklist Matching Block Malware download URL
  • 5. Background(4) •  However, URL Blacklist cannot cope with previously unseen malicious URLs •  It is crucial to keep the URLs updated to make a URL blacklist effective 5   To collect fresh malicious URLs
  • 6. Background(5) 6   30 trillion unique URLs Wild Internet Web client honeypot Scan
  • 7. 7   Goal • Our main objective is to accelerate the process of generating a URL blacklist automatically. Idea Existing Malicious URLs New Malicious URLs Search Space Filter (Machine Learning) Expansion Reduction Input: Output:
  • 8. AutoBLG Framework •  Three primary components: 8   Img  from  h0p://www.itguyswa.com.au/free-­‐anJvirus-­‐protecJon/                                      h0ps://www.virustotal.com/ja/                                      h0p://www.soumu.go.jp/main_content/000174846.pdf 8   URL Expansion URL Flirtation 8   URL Verification
  • 11. URL Expansion(3) Passive DNS Database 11   sediscoXXXXXX.gruXXX.com vorXXXXXXX.zdjecXXXki.com Seed Pre- processing Passive DNS Database Search Engine Web Crawler
  • 12. URL Expansion(4) Search Engine and Web Crawler 12   http://100XXXXXwebcam.bXXX.pl/island-XXX-wXX.html http://100XXXXXwebcam.bXXX.pl/isteam-XXXX.html Seed Pre- processing Passive DNS Database Search Engine Web Crawler
  • 13. URL Filrtation 13  Img from http://www.primalsecurity.net/0xc-python-tutorial-python-malware/ Existing Malicious URLs Unknown URLs Similarity Search HTML Features Bayesian sets
  • 14. URL Verification •  Three tools for verification of drive-by- download attacks   Ø Web Client honeypot Marionette Ø Antivirus Software Ø Virustotal online service 14  
  • 15. Performance Evaluation 15   •  The number of URL Expansion data: 59,394 •  No URL Filtration: more than 100 hours •  URL Filtration in use: approximately 6 hours To accelerate the process of generating blacklist URLs by adopting a high performance filter
  • 16. Results(1) 16   Web client honeypot Antivirus software Virustotal 1.16% 3.8% 16.5% •  Web Client Honeypot : definitely malicious Ø  it contained redirecting to the exploit web pages •  Antivirus Software : highly suspicious Ø  they contained several HTTP objects that were detected by the antivirus checkers; (malicious JavaScript or executable malware) •  VirusTotal : suspicious Ø  need further manual inspection
  • 17. Results(2) •  some URLs are identified by multiple tools •  After eliminating duplications, of the 600 of extracted URLs, 106 URLs were detected as malicious or suspicious •  Of the discovered 106 URLs, seven URLs are completely new URLs that have not been listed in the VirusTotal 17  
  • 18. Limitation and future work
 18   Item Limitation Future work Search Engine Only get Top-50 search results To accelerate web search engine process Web Crawler evaded by ‘cloaking techniques’ To develop more sophisticated tools Query Pattern Miss several malicious URLs To increase the number of query patterns URL Verification Only two version of browser or plug-in To adopt a low- interaction honeypot Online operation Not fully online due to URL Expansion part To pipeline URL expansion step
  • 19. Summary •  We have proposed the AutoBLG framework Ø  light-weight Ø  new and previously unknown drive-by-download URLs Ø  other suspicious URLs that need for further analysis   •  Key ideas Ø  the use of search space expansion and filters •  We proposed a high-performance filter Ø  it reduced number of URLs to be investigated with the dynamic analysis systems by 99% Ø  while successfully finding new URLs that have not been listed in the widely used popular URL reputation system 19  
  • 20. Thank you for your listening 20
  • 21. URL Filtration(1)Feature Extraction 21   HTML Feature Difference with pervious works The number of elements with a small area Frameset tags border,frameborder,framespacing The number of suspicious word in the script’s content some strings such as shellcode ,shcode. The number of URLs with a different domain Only count URL with different domain.     The number of iframe and frame tags       same The number of hidden elements The number of meta refresh tags The number of out-of-place elements The number of embed and object tags The presence of unescape behavior The number of setTimeout functions
  • 22. URL Filtration(2)Similarity Search 22   Similarity Search: Bayesian Sets From web space Toyota Nissan Honda BMW Ford Audi Mitsubishi Mazda Volkswagen Google Sets From all unknown URLs Adopting several existing malicious URL as query (Malicious URLs that are created with same Exploit Kit) To output all URLs’ Score in descending order. The higher score is, the more probably URL is Malicious 22  
  • 23. The range of experiment 23   Preliminary Experiment Performance Evaluation URL Expansion URL Filtration URL Verification •Commercial blacklist •Pre-processing •Passive DNS database •Search Engine •Web crawler •Feature Extraction •Similarity Search •Web Client Honeypot •Antivirus Software •VirusTotal Steps in URL Expansion Steps in URL Filtration Tools in URL Verification 23  
  • 24. Preliminary Experiment 24   100 101 102 103 Top-K URLs 0 1 2 3 ThenumberofMaliciousURLs Query Pattern1 Query Pattern2 •  Experiment Data Ø  The number of benign URLs:10,000 Ø  The number of malicious URLs:6 •  Experiment Result Ø  The two query patterns identify different three malicious URLs in top 300 scores respectively and extract all the six malicious URLs totally Ø  we considered the top 300 scores as the   threshold for URL filtration. 24