This document proposes the AutoBLG framework to automatically generate URL blacklists. It has three main components:
1) URL Expansion uses existing malicious URLs to expand the search space through preprocessing, passive DNS databases, search engines and web crawlers.
2) URL Filtration reduces the expanded URLs using machine learning classifiers trained on HTML features and similarity searches.
3) URL Verification checks the filtered URLs for drive-by downloads using a honeypot, antivirus software, and VirusTotal.
The framework achieved a 99% reduction in URLs to verify while still finding new malicious URLs not in blacklists.