Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. KNOW-HOW Squid proxy server Implementing a home proxy server with Squid SAFE HARBOR A proxy server provides safer and more efficient surfing. Although commercial proxy solutions are available, all you really need is Linux and an old PC in the attic. and integrates easily with an iptables BY GEERT VAN PAMEL firewall. In my case, the Squid proxy server and the iptables firewall worked together to protect my network from I have had a home network for several unwanted popup ads, and block danger- intruders and dangerous HTML. You’ll years. I started with a router using ous URLs. find many useful discussions of firewalls Windows XP with ICS (Internet Con- A Squid proxy server filters Web traffic in books, magazines, and Websites. (See nection Sharing) and one multi-homed and caches frequently accessed files. A [1] and [2], for example.) The Squid Ethernet card. The main disadvantages proxy server limits Internet bandwidth proxy server, on the other hand, is not as were instability, low performance, and a usage, speeds up Web total lack of security. Troubleshooting access, and lets you Table 1: Recommended Hardware was totally impossible. Firewall configu- filter URLs. Centrally Necessary Components Specifics ration was at the mercy of inexperienced blocking advertise- Intel Pentium II CPU, or higher - users, who clicked randomly at security ments and dangerous Why not a spare Alpha Server? 350 MHz settings as if they were playing Russian downloads is cost 80 - 100 MB memory minimum more is better roulette. effective and transpar- 1 or more IDE disks (reuse 2 old disks: 1 GB I finally turned to Linux and set up an ent for the end user. system SW + swap & 3 GB for cache + /home disk) 4 GB minimum iptables firewall on a Pentium II com- Squid is a high per- 2 Ethernet cards, minihub, fast Ethernet modem, 100 Mbit/s if wireless router or hub possible puter acting as a router. The firewall sys- formance implementa- CDROM, DVD reader software is tem would keep the attackers off my net- tion of a free Open- mostly distri- work and log incoming and outgoing Source, full-featured buted via DVD traffic. Along with the iptables firewall, I proxy caching server. Use only normal straight LAN cables [no need for modem and also set up a Squid proxy server to Squid provides exten- cross cables] minihub cross improve Internet performance, filter out sive access controls themselves! 48 ISSUE 60 NOVEMBER 2005 W W W. L I N U X - M A G A Z I N E . C O M
  2. 2. Squid proxy server KNOW-HOW well documented, especially for small some of the important settings in the cache_mgr sysman home networks like mine. In this article, squid.conf file. I will show you how to set up Squid. First of all, you can prevent certain dns_nameservers metadata related to your configuration dns_testnames Getting Started from reaching the external world when fqdncache_size 1024 The first step is to find the necessary you surf the Web: hardware. Figure 1 depicts the network http_port 80 configuration of the Pentium II computer vi /etc/squid/squid.conf icp_port 0 I used as a firewall and proxy server. ... This firewall system should operate with anonymize_headers deny U http_port is the port used by the proxy minimal human intervention, so after From Server Via User-Agent server. You can choose anything, as long the system is configured, you’ll want to forwarded_for off as the configuration does not conflict disconnect the mouse, keyboard, and strip_query_terms on with other ports on your router. A com- video screen. You may need to adjust the mon choice is 8080 or 80. The Squid BIOS settings so that the computer will Note that you cannot anonymize Referer default, 3128, is difficult to remember. boot without a keyboard. The goal is to and WWW-Authenticate because other- We are not using cp_port, so we set it be able to put the whole system in the wise authentication and access control to 0. This setting synchronizes proxy attic, where you won’t hear it or trip mechanisms won’t work. servers. over it. From the minihub shown in Fig- forwarded_for off means that the IP With log_mime_hdrs on, you can ure 1, you can come “downstairs” to the address of the proxy server will not be make mime headers visible in the access. home network using standard UTP cable sent externally. log file. or a wireless connection. Table 1 shows With strip_query_terms on, you do not recommended hardware for the firewall log URL parameters after the ?. When Avoid Disk Contention machine. this parameter is set to off, the full URL Squid needs to store its cache some- Assuming your firewall is working, is logged in the Squid log files. This fea- where on the hard disk. The cache is a the next step is to set up Squid. Squid is ture can help with debugging the Squid tree of directories. With the cache_dir available from the Internet at [3] or one filters, but it can also violate privacy option in the squid.conf file, you can of its mirrors [4] as tar.gz (compile from rules. specify configuration settings such as the sources). You can easily install it using The next settings identify the Squid following: one of the following commands: host, the (internal) domain where the • disk I/O mechanism – aufs machine is operating, and the username • location of the squid cache on the disk rpm -i /cdrom/RedHat/RPMS/U of whoever is responsible for the server. – /var/cache/squid squid-2.4.STABLE7-4.i386.rpmU Note the dot in front of the domain. Fur- • amount of disk space that can be used # Red Hat 8 ther on, you find the name of the local by the proxy server – 2.5 GB DNS caching server, and the number of • number of main directories – 16 rpm -i /cdrom/Fedora/RPMS/U domain names to cache into the Squid • subdirectories – 256 squid-2.5.STABLE6-3.i386.rpm U server. For instance: # Fedora Core 3 visible_hostname squid cache_dir aufs U rpm -i /cdrom/.../U append_domain /var/cache/squid 2500 16 256 squid-2.5.STABLE6-6.i586.rpmU # SuSE 9.2 Internet At this writing, the current stable Squid version is 2.5. Configuring Squid Once Squid is installed, you’ll need to configure it. Squid has one central configuration file. Every time this file changes, the configuration must be reloaded with the command /sbin/init. d/squid reload. You can edit the configuration file with a text editor. You’ll find a detailed description of the settings inside the squid.conf file, although the discussion Local Network is sometimes very technical and difficult to understand. This section summarizes Figure 1: Ethernet basic LAN configuration. W W W. L I N U X - M A G A Z I N E . C O M ISSUE 60 NOVEMBER 2005 49
  3. 3. KNOW-HOW Squid proxy server The disk access method options are as maximum_object_sizeU follows: _in_memory 2048 KB Table 2: ACL Guidelines • ufs – classic disk access (too much I/O • the order of the rules is important can slow down the Squid server) Log Format Specification • first list all the deny rules • aufs – asynchronous UFS with threads, You can choose between Squid log for- • the first matching rule is executed • the rest of the rules are ignored less risk of disk contention mat and standard web server log format • the last rule should be an allow all • diskd – diskd daemon, avoiding disk using the parameter emulate_httpd_log. contention but using more memory When the parameter is set to on, stan- UFS is the classic UNIX file system I/O. dard web log format is used; if the pages, the more likely the page is to be We recommend using aufs to avoid I/O parameter is set to off, you get more cached. Because your own ISP is more bottlenecks. (When you use aufs, you details with the Squid format. See [7] for remote, the ISP is less likely to be cach- have fewer processes.) more on analyzing Squid log files. ing its competitor’s contents… # ls -ld /var/cache/squid Proxy Hierarchy cache_peer proxy.tiscali.beU lrwxrwxrwx 1 root rootU The Squid proxy can work in a hierarchi- parent 3128 3130 U 19 Nov 22 00:42 U cal way. If you want to avoid the parent no-query default /var/cache/squid -> U proxy for some destinations, you can cache_peer_domain U /volset/cache/squid allow a direct lookup. The browser will still use your local proxy! I suggest you keep the standard file loca- no-query means that you do not use, or tion for the squid cache /var/cache/ acl direct-domain U cannot use, ICP (the Internet Caching squid, then create a symbolic link to the dstdomain Protocol), see [8]. You can obtain the real cache directory. If you move the always_direct allow U same functionality using regular expres- cache to another disk for performance or direct-domain sions, but this gives you more freedom. capacity reasons, you only have to mod- ify the symbolic link. acl direct-path urlpath_regexU cache_peer proxy.tiscali.beU The disk space is distributed among -i "/etc/squid/direct-path.reg" parent 3128 3130 U all directories. You would normally look always_direct allow direct-path no-query default for even distribution across all directo- acl tiscali-proxy U ries, but in practice, some variation in Some ISPs allow you to use their proxy dstdom_regex -i U the distribution is acceptable. More com- server to visit their own pages even if$ plex setups using multiple disks are pos- you are not a customer. This can help cache_peer_access U sible, but for home use, one directory you speed up your visits to their pages. allow U structure is sufficient. The closer the proxy to the original tiscali-proxy Cache Replacement Listing 1: Blocking Unwanted Pages The proxy server uses an LRU (Least 01 acl block-ip dst "/etc/squid/block-ip.reg" Recently Used) algorithm. Detailed stud- ies by HP Laboratories [6] have revealed 02 deny_info filter_spam block-ip that an LRU algorithm is not always an 03 http_access deny block-ip intelligent choice. The GDSF setting 04 keeps small popular objects in cache, 05 acl block-hosts dstdom_regex -i "/etc/squid/block-hosts.reg" while removing bigger and lesser used objects, thus increasing the overall effi- 06 deny_info filter_spam block-hosts ciency. 07 http_access deny block-hosts 08 cache_replacement_policyU 09 acl noblock-url url_regex -i "/etc/squid/noblock-url.reg" heap GDSF 10 http_access allow noblock-url Safe_ports memory_replacement_policyU heap GDSF 11 12 acl block-path urlpath_regex -i "/etc/squid/block-path.reg" Big objects requested only once can 13 deny_info filter_spam block-path flush out a lot of smaller objects, there- 14 http_access deny block-path fore you’d better limit the maximum object size for the cache: 15 16 acl block-url url_regex -i "/etc/squid/block-url.reg" cache_mem 20 MB 17 deny_info filter_spam block-url maximum_object_sizeU 18 http_access deny block-url 16384 KB 50 ISSUE 60 NOVEMBER 2005 W W W. L I N U X - M A G A Z I N E . C O M
  4. 4. Squid proxy server KNOW-HOW • day & hour Listing 2: Making a Page always_direct allow U • browser type Invisible local-domain • username 01 vi /etc/squid/errors/filter_spam Listing 1 shows examples of commands 02 ... acl localnet-dst dst U that block unwanted pages. 03 <script language="JavaScript" The script in Listing 2 will make type="text/javascript"> always_direct allow U unwanted pages invisible: localnet-dst Whenever Squid executes the deny_ 04 <!-- info tag, it sends the file /etc/squid/ 05 window.status="Filter " + Filtering with Squid errors/filter_spam to the browser instead document.location; //.pathname; The preceding sections introduced of the real Web page… effectively filter- 06 // --> some important Squid configura- ing away the unwanted object. The trail- 07 </script> tion settings. You have already ing <!-- hides any other Squid error learned earlier in this article that messages in the body of the text. 08 <noscript><plaintext><!-- ACLs (Access Control Lists) can be Squid allows you to block content by used for allowing direct access to IP subnet. For instance, you can block The ACL could also include a regular pages without using the parent proxy. In sites with explicit sexual content; you expression (regex for short) with the this section, I’ll show you how to use could use whois [9] to help you identify URL using an url_regex construct. ACLs for more fine-grained access con- the subnets, then enter the subnets in For Squid, regular expressions can be trol. the /etc/squid/block-ip.reg file: specified immediately, or they can be in Table 2 provides some guidelines for a file name between double quotes, in creating ACL lists. It is a very good idea vi /etc/squid/block-ip.reg which case the file should contain one to only allow what-you-see-is-what-you- ... regex expression per line – no empty get (WYSIWYG) surfing. If you do not lines. The -i (ignore case) means that want to see certain pages or frames, then case-insensitive comparisons are used. you can automatically block the corre- If you are configuring a system with sponding URLs for those pages on the multiple proxies, you can specify a proxy server. round-robin to speed up page lookups You can filter on: and minimize the delay when one of the • domains of client or server servers is not available. Remember that • IP subnets of client or server To block advertisements or sex sites by most browsers issue parallel connections • URL path domain name, you can list regular when obtaining all the elements from a • Full URL including parameters expressions describing the sites in the single page. If you use multiple proxy • keywords file /etc/squid/block-hosts.reg, as shown servers to obtain these elements, your • ports in Listing 3. response time might be better. • protocols: HTTP, FTP It is also a good idea to block certain • methods: GET, POST, HEAD, CON- file types. For instance, you do not want cache_peer U NECT to allow .exe files, since these are some- parent 8080 7 U no-query round-robin Listing 3: Blocking by Domain Name cache_peer U 01 vi /etc/squid/block-hosts.reg 15 ^tracking. parent 8080 7 U 02 ... 16 no-query round-robin ... 03 ^a. 17 cache_peer 04 ^ad. 18$ parent 8080 7 U 05 ^adfarm. 19$ no-query round-robin 06 ^ads. 20$ FTP files are normally downloaded just 07 ^ads1. 21 ^metrics. once, so will not normally want to cache 08 ^al. 22 .metriweb...$ them, except when downloading repeat- edly. Also, local pages are not normally 09 ^as. 23 .metriweb....$ cached, since they already reside on 10$ 24 your network : 11 ^ss. 25$ 12 ^sa. 26$ acl FTP proto FTP always_direct allow FTP 13 ^sc. 27 side6 14 ^sm6. 28 acl local-domain dstdomain U W W W. L I N U X - M A G A Z I N E . C O M ISSUE 60 NOVEMBER 2005 51
  5. 5. KNOW-HOW Squid proxy server Listing 4: Blocking by Path or Extension 01 vi /etc/squid/block-path.reg 11 .hlp(?.*)?$ 21 .p[ir]f(?.*)?$ 02 ... 12 .hta(?.*)?$ 22 .reg(?.*)?$ 03 .ad[ep](?.*)?$ 13 .in[fs](?.*)?$ 23 .sc[frt](?.*)?$ 04 .ba[st](?.*)?$ 14 .isp(?.*)?$ 24 .sh[bs](?.*)?$ 05 .chm(?.*)?$ 15 .lnk(?.*)?$ 25 .url(?.*)?$ 06 .cmd(?.*)?$ 16 .md[abetwz](?.*)? 26 .vb([e])?(?.*)?$ 07 .com(?.*)?$ 17 .ms[cpt](?.*)?$ 27 .vir(?.*)?$ 08 .cpl(?.*)?$ 18 .nch(?.*)?$ 28 .wm[sz](?.*)?$ 09 .crt(?.*)?$ 19 .ops(?.*)?$ 29 .ws[cfh](?.*)?$ 10 .dbx(?.*)?$ 20 .pcd(?.*)?$ times executable zip files that install acl SSL_ports port 443 563 INFO software. Squid lets you block files by acl SSL_ports port 1863 U [1] Presentation for the HP-Interex user path, filename, or file extension, as # Microsoft Messenger group in Belgium on 17/03/2005 about shown in Listing 4. acl SSL_ports port 6346-6353 U “Implementing a home Router, Fire- Squid also lets you filter for regular # Limewire wall, Proxy server, and DNS Caching expressions used in the URL. Server using Linux” http://users. Of course, your filter may occasionally http_access allow U router/ turn up a false positive. You can add reg- CONNECT SSL_ports [2] Firewalls: http://www.linux-magazine. ular expressions for URLs you specifi- http_access deny U com/issue/40/Checkpoint_FW1_Fire- cally don’t want to block to /etc/squid/ CONNECT wall_Builder.pdf http://www. noblock-url.reg. Do not allow others to misuse your IPtables_Firewalling.pdf vi /etc/squid/noblock-url.reg cache! You only want your cache to be [3] About Squid in general: http://www. ... used by your own intranet. Users on the http://squid-docs. ^ external Internet should not be able to access your cache: html#AEN1685 http://www. You can find an up-to-date version of those configuration files at [11] acl localhost src U [4] Squid mirror sites: Protect your Ports acl localnet-src src U For security reasons, you should disable all ports and only allow well known web [5] Suse 9.2 Professional – DVD software ports using the syntax shown in Listing 5. http_access deny !localnet-src distribution http://www.linux-maga- The same can be done for connected ports. You can allow SSL ports when Allowing All the Rest DVD.pdf connected, and deny them otherwise. To allow only the protocols and the [6] For more information about the GDSF Remember that the normal HTTP proto- methods that you want: and LFUDA cache replacement poli- col is not connected. The client and the cies see: browser always establish a new connec- acl allow-proto proto HTTP techreports/1999/HPL-1999-69.html tion for every page visit. http_access deny !allow-proto techreports/98/HPL-98-173.html [7] Reporting and analysing Squid log Listing 5: Protecting Ports files: 01 acl Safe_ports port 80 # http issue/36/Charly_Column.pdf 02 acl Safe_ports port 21 # ftp [8] ICP – Internet Caching Protocol: http:// 03 acl Safe_ports port 2020 # BeOne Radio Protocol 04 acl Safe_ports port 2002 # Local server [9] The whois database: http://www.ripe. 05 acl Safe_ports port 8044 # Tiscali net/db/other-whois.html [10] About Regular Expressions: http:// 06 acl Safe_ports port 8080 # Turboline port scan 07 acl Safe_ports port 8081 # Prentice Hall module-re.html 08 [11] Example configuration files for Squid: 09 # Deny requests to unknown ports geertivp/pub/squid 10 http_access deny !Safe_ports 52 ISSUE 60 NOVEMBER 2005 W W W. L I N U X - M A G A Z I N E . C O M
  6. 6. Squid proxy server KNOW-HOW acl allow-method U cd /sbin Slides are available from [1] giving more method GET POST ln -s /etc/init.d details about the setup of the iptables http_access deny U firewall, the router, the DNS caching !allow-method When you have finished the configura- server, the DHCP server, and the NTP tion, use setup (Fedora), yast2 (SuSE), or server. The last rule should be an allow-all, an equivalent tool to activate the Squid If you are looking for better perfor- since the previous rule was a deny… service. Remember to reload the server mance, safer surfing, and a way to block when you change a file. access to dangerous Web content, try http_access allow all putting a Squid proxy server in your /sbin/init.d/squid reload attic. For the cost conscious among you: Remember after changing parameters to a Pentium II router consumes about 11 always restart the Squid server with the If anything does not work as expected, kWh/week. You should balance this following command: you can look for a reason in the cache expense against the increased security log file /var/log/squid/cache.log. and reduced headaches of operating /sbin/init.d/squid reload your own firewall with Squid proxy Conclusions caching. ■ For SuSE the /sbin/init.d folder is stan- This article is the result of a presentation dard. For Fedora, create a symbolic link: for the Belgium HP-Interex organization. Geert Van Pamel has worked as a project manager at Belgacom in Bel- gium since 1997. He has been a Enforce Parental Control and Block Spyware THE AUTHOR member of DECUS since 1985 and a You should configure your iptables fire- blocking HTTP outgoing traffic. board member of HP-Interex since wall so that it blocks all outgoing HTTP Spyware programs mostly use the HTTP 2002. He learned UNIX on a PDP traffic unless the proxy server is used. protocol (remember: port 80) for outgo- system in 1982, and he currently Since the proxy server is on the local net- ing connections, but it seems that works with Linux in a mixed envi- work, it allows all incoming requests from Spyware hardly ever uses the proxy ronment with other servers such as local browser clients. (because Spyware authors are too lazy to Tru64 UNIX, HP-UX, OpenVMS, Any attempt to bypass the Squid filters is inspect the rconfiguration). Spyware is NonStop Tandem, SUN, and NCR blocked by the FORWARD firewall rule thus blocked by the firewall rules. Teradata. ADVERTISEMENT W W W. L I N U X - M A G A Z I N E . C O M ISSUE 60 NOVEMBER 2005 53