Project News
● pfSense 2.3.3-p1 is out!
– A few beneficial improvements, security updates (OpenSSL, cURL)
– https://www.netgate.com/blog/pfsense-2-3-3-p1-release-now-available.html
● pfSense Blog moved to Netgate:
– https://www.netgate.com/blog/
– Primary reason was so we could get rid of WordPress, due to security concerns
– Now uses a static site generated by Jekyll
● Coreboot (“BIOS”) upgrade for many SG-device owners (2220, 2440, 4860, 8860, XG-2758)
– If you purchased one of these devices, you should have received an e-mail about the update
– Workaround for Intel C2000 Errata AVR.58
● Disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have external pull up resistor on SERIRQ PIN.
– A package is now available to handle it automatically (install via System > Package Manager)
● Training in Europe – https://netgate.com/training/
– Will include a preview of our upcoming remote device management platform
– Paris, FR - 21-22 Sep, 2017
– London, UK - 27-28 Sep, 2017
– Frankfurt, DE - 4-5 Oct, 2017
– Saint Petersburg, RU - 11-12 Oct, 2017
● Documentation for Let’s Encrypt ACME package
– https://doc.pfsense.org/index.php/ACME_package
About this Hangout
● Components of a High Availability Cluster
● Prerequisites
● Configuration of a cluster from default config
● Testing
● Troubleshooting
● Upgrading
Cluster Overview
Internet
WAN Switch
LAN Switch
Sync Interface
WAN
198.51.100.201
2001:db8::201
WAN
198.51.100.202
2001:db8::202
Shared WAN CARP IP
198.51.100.200
2001:db8::200
LAN
192.168.1.2
2001:db8:1::2
LAN
192.168.1.3
2001:db8:1::3
Shared LAN CARP IP
192.168.1.1
2001:db8:1::1
172.16.1.2
2001:db8:1:1::2
172.16.1.3
2001:db8:1:1::3
Primary Secondary
Internet
WAN Switch
LAN Switch
WAN HA IP Address
198.51.100.200
2001:db8::200
LAN HA IP Address
192.168.1.1
2001:db8:1::1
Cluster
Actual Layout Logical Layout
HA Components
● Logically, the two nodes become a single unit
from the perspective of the network
● IP Address Redundancy (CARP)
– Traffic to/from cluster should use VIP addresses
● Configuration Sync (XMLRPC)
– Keeps the node configurations similar
● State Sync (pfsync)
– Shares state information between all nodes
IP Address Redundancy (CARP)
●
CARP VIPs are shared between cluster nodes
●
Works similar to VRRP, can conflict with VRRP/HSRP
●
Heartbeats transmitted on all interfaces containing CARP VIPs (NOT SYNC IF!)
– Approx. 1 per second (base) + fraction (1/256th) of a sec (skew)
– If a secondary node stops receiving heartbeats or they are too slow, it will take over as master
– Skew adds time/slowness, secondary must use a higher skew (e.g. 100) than the primary (e.g. 0)
– Active/Passive only, no Active/Active
●
Traffic to cluster must route to CARP VIP address(es)
– Routed inbound traffic, VPNs, port forwards, local gateway, DNS
– Exceptions
●
Only access firewall GUI/SSH by interface IP addresses, not VIP!
●
Monitoring systems can check each node individually
●
Traffic from cluster should originate from CARP VIPs
– Outbound NAT, VPNs
●
If an interface containing a VIP loses link, that node will automatically demote itself due to the
problem (temporarily adds 240 to the skew)
●
On pfSense, preemptive failover is enabled by default so if an interface triggers a demotion, all
VIPs are demoted to trigger a complete failover
Configuration Sync (XMLRPC)
● Communicates via the Sync interface
● Copies some settings from primary to secondary on
save
● Does not sync interfaces or System > Advanced,
System > General, or most packages.
● Will sync rules, NAT, aliases, VPNs, many other
areas
● Not strictly required for HA, but makes the job much
easier
State Sync (pfsync)
● Communicates via the Sync interface
● State inserts, deletes, and updates exchanged between nodes
● State table on both nodes should be identical (or nearly so)
● If-bound states mean physical interface assignments must be identical
(or LAGGs to mask differences)
● When primary fails, connections continue to flow through secondary
since the state is already present
● Requires use of CARP VIPs with NAT, no traffic direct to/from node
● HA can work without it, but connections will be disrupted during
failover
● Not currently compatible with Limiters
Assumptions
● Two firewalls running pfSense software
– Could use more, but that isn't covered here and does not offer a
significant advantage
● Firewalls are at (or near) a default configuration
– Conversion to HA from an existing install is possible, but can be
tricky. See the July 2016 hangout for details
● Firewalls have identical interfaces assigned in identical order
● Firewalls have a non-conflicting configuration
– Different IP addresses in the same networks
– DHCP off on secondary until it is properly configured for HA
Pick a Sync Interface
● One interface will interconnect between units for XMLRPC and
pfsync traffic
● Name it the “Sync” interface
● Do not name it a “CARP interface”, no CARP/heartbeat traffic on it,
it does not factor into failover directly!
● Can consume a significant amount of bandwidth for state
synchronization, especially in environments with lots of state churn
● Technically optional but highly recommended
– If no physical interface is available, use a VLAN
– If no VLAN is possible, it could share a secure internal interface, but that
can still be dangerous/insecure
● pfsync has no authentication, any device on the segment could insert state data
Interface Assignments
● Nodes must have the same number of interfaces and they must be
assigned in identical order.
● The order of interfaces on Interfaces > Assignments must be
identical!
● For state synchronization to work, interfaces must be the same type
as well (e.g. igbX, emX, etc)
– If that is not possible, interfaces could be added to LAGG instances so the
assigned interfaces can match (e.g. lagg0 on both, rather than igb0 and em0)
● If the interface order does not match…
– Areas such as firewall rules will appear to sync to the “wrong” interface on
other nodes
– Rules or other settings might “disappear” or otherwise break/misbehave
(really ending up on the wrong interface)
IP Address Requirements
●
CARP requires a static IP address WAN for full functionality
– DHCP or PPPoE WAN may work in some cases, but not seamless failover
– For IPv6, static addressing is a hard requirement; DHCPv6 is not feasible
●
Ideally, three IP addresses per subnet per address family (except Sync)
– One IP address of each family type per node (e.g. one IPv4, one IPv6), plus at least one CARP VIP or each family
for each interface
– Each WAN should be a /29 or larger for IPv4, a standard IPv6 /64 is sufficient, but no smaller than 126
– Sync interface has no CARP and thus does not need an additional IP address, can be IPv4 only
●
Single IPv4 address CARP is possible but not generally recommended
– For WANs, it means only the active master may communicate out for gateway monitoring, updates, package
installs, etc.
●
OK for secondary or additional WANs so long as the firewalls do not need to reach outbound individually on that circuit
● If only an IPv4 /30 is possible on WAN, feel free to use it (with the above caveats)
– LANs generally do not have IP address shortages so it can work there, but could break DHCP failover
● All nodes MUST connect to the all WANs identically, it is not feasible to have the primary on one ISP
and the secondary on a different ISP
– See the July 2016 hangout for Multi-WAN HA
●
Reminder: IPv6 clusters must have separate routed subnets for each interface
– Local subnets can all be under one larger subnet (e.g. LAN, DMZ, etc all under a /60 routed in via WAN)
Check VHID/VRID Usage
● CARP/VRRP/HSRP use similar mechanisms
● MAC address of CARP VIP is determined by VHID
– 00:00:5e:00:01:<VHID in hex>
– See
https://docs.google.com/spreadsheets/d/17CqR6iAAXHXfU0h4uatzuY0w7k0ijy22dES6
joLdTL0/edit?usp=sharing
● Overlapping IDs will cause MAC conflicts among other issues
● To find if any are in use...
– Diag > Packet Capture, set Protocol to CARP
– Capture for several seconds minimum
– If any packets are observed, check VHID/VRID (may need to load capture in
Wireshark)
– Note the used ID(s) and avoid using them with CARP VIPs
Basic Setup Pre-requisites
●
Give each node a unique hostname
– Ex: fw-a/fw-b, fw-pri/fw-sec, fw-1/fw-2
●
Adjust IP addresses on each node so they do not directly conflict
– Ex: Primary LAN to 192.168.1.2, Secondary to .3
●
DHCP Must be disabled on the secondary until it is configured for HA
●
GUI must be running same protocol and port on both nodes
– Ex: HTTPS on port 443
●
The sync account (e.g. admin) cannot be disabled and must have the same password on
both nodes
– On 2.4, any user can be used for synchronization, provided it has the “System - HA node sync”
privilege
●
Both nodes must have a static IP address WAN configured in the same WAN subnet with a
proper gateway and so on
●
Both nodes must have DNS configured properly either using the DNS Resolver with
forwarding disabled, or by having DNS servers set under System > General Setup
Switch Setup
●
CARP uses multicast, so the switch cannot block, filter, limit, or otherwise
interfere with multicast
– IGMP snooping on some switches can conflict and may need to be disabled
● Nearly all CARP status problems, such as dual master scenarios, are due to
switch or other layer 2 issues
● The switch must, at least:
– Allow Multicast traffic to be sent and received by the firewall without interference on
ports using CARP VIPs.
– Allow traffic to be sent and received by the firewall using multiple MAC addresses
– Allow the CARP VIP MAC address to move between ports
●
Virtual/Hypervisor switches often have issues with one or more of the above
and require adjustments such as enabling Promiscuous Mode, Forged
Transmits, and allowing MAC address changes
Configuration!
● Reminder: Keep secondary disconnected from any network until it has a basic
non-conflicting interface configuration
– Otherwise it could cause problems with DHCP, IP address conflicts, etc.
● Setup Sync interface
● Configure pfsync
● Configure XMLRPC
● Add CARP VIPs
● Setup Manual Outbound NAT
● Setup DHCP
● Setup DHCPv6 / Router Advertisements
● VPNs, other services
● Adding more Interfaces
Setup Sync Interface
● Interface config
– Enable the interface, name it Sync
– Set for a static IPv4 address in the chosen sync subnet
● Ex: Primary as 172.16.1.2/24, secondary as .3
– IPv6 is optional here, can be used for XMLRPC sync
– Do not check Block Private Networks or Bogons
● Add firewall rules for sync
– On Primary:
● Add rule to pass TCP/443 to on the Sync address for GUI
● Add rule to pass pfsync from Sync net to any.
● Optionally add a rule to pass ICMP echoreq to/from Sync net
– On Secondary:
● Add rule to pass any protocol from Sync net to any
● Rule is different so it's obvious when it has been replaced by XMLRPC sync!
Configure pfsync
● Enable on BOTH nodes!
● Navigate to System > High Avail Sync
● State Synchronization Settings (pfsync) section
● Check Synchronize States
● Set Synchronize Interface to Sync
● Set the pfsync Synchronize Peer IP to the IP address of the other
node
– Ex: On the Primary, enter the IP address of the secondary, 172.16.1.3, and
vice versa.
– Technically this setting is optional. Without it set, state sync is sent via
multicast rather than unicast. With only two nodes, unicast is more reliable
● Click Save
Configure XMLRPC
● Enable only on the Primary node!
● Navigate to System > High Avail Sync
● Configuration Synchronization Settings (XMLRPC Sync) section
● Set Synchronize Config to IP to the Sync interface IP address on the secondary node
(e.g. 172.16.1.3)
● Set Remote System Username to admin
– On 2.4+, any user will work so long as they are admin or have the “System - HA node sync” privilege
● Note that this account MUST exist on the secondary for sync to function! It is easier to use admin for now and
change it after.
– On 2.3.x and earlier, admin is the only user that will work
● Set Remote System Password to the sync account password
● Check the boxes for each area to synchronize
● Click Save
● On the secondary, check and see if the rule changes on the Sync interface carried over
● From here on, do not make any changes on the secondary in an area that will sync!
Add CARP VIPs
● Navigate to Firewall > Virtual IPs on Primary node
● Add a CARP type VIP for each interface except Sync
● Skew on primary should be 0/1, secondary will end up higher via XMLRPC sync which adjusts the skew
when copying
● Example:
– WAN CARP VIP 198.51.100.200/24, VHID Group 200, random password/confirm, Base=1, Skew=0
– LAN CARP VIP 192.168.1.1/24, VHID=1, random password/confirm, Base=1, Skew=0
– Subnet mask must match the interface subnet!
● If a VIP is sensitive to latency (e.g. Secondary is in another building), try moving Base up by 1 until stability
is achieved
● Check Status > CARP
– If CARP shows disabled on either node, enable it
– Primary should show MASTER, Secondary BACKUP
● Repeat for more VIPs as needed
● If there will be many VIPs in a single interface + subnet, use IP alias VIPs w/CARP VIP as their parent
interface
– Reduces CARP advertisements, and they switch as a group instead of individually
●
Do NOT add CARP VIPs to interfaces that will be down/disabled! This will cause the firewall to demote
itself, believing it has a problem.
Setup NAT
●
Navigate to Firewall > NAT, Outbound tab on Primary node
●
Change Mode to Manual or Hybrid
●
In hybrid mode:
– Add new rules to translate from LAN(s) source
– Set the Translation to the CARP WAN VIP
●
In manual mode:
– Edit each rule for a local interface (e.g. LAN)
– Set the Translation to the CARP WAN VIP
●
DO NOT SET A SOURCE OF “ANY” on the NAT rules!
●
An RFC1918 alias helps here for source (192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8)
●
After adding/editing all rules, click Apply Changes
●
If/when additional interfaces are added in the future, rules must be added manually!
●
Add port forwards, 1:1 NAT if needed. May need more WAN VIPs if using multiple IP
addresses
Setup DHCP (IPv4)
● Navigate to Services > DHCP Server, LAN tab on Primary
● Set DNS Server to the LAN CARP VIP
● Set Gateway to the LAN CARP VIP
● Set the Failover Peer IP to the actual LAN IP address of the
secondary node, e.g. 192.168.1.3
● Click Save
● Repeat for additional local interfaces if necessary
● Gateway must be the CARP VIP, DNS if using the firewall for DNS
also
● Navigate to Status > DHCP Leases, check pool status, should be
“normal”/”normal”
Setup DHCPv6 / RA
● DHCPv6 has no concept of failover, so setup is tricky
– DHCPv6 & RA settings will not sync due to this
– There is no formal spec/RFC yet, only a draft with no implementation
● Navigate to Services > DHCPv6 Server & RA on Primary node
● Two main options:
– Set RA to Unmanaged (SLAAC) and let clients determine their own addresses
– Set RA to Managed + DHCPv6 independently using separate local pools
● e.g. Pri: x:x:x:x::1:0000-x:x:x:x::1:FFFF / Sec: x:x:x:x::2:0000-x:x:x:x::2:FFFF
● Gateway is handled by router advertisements, two choices there as well
– On both, bind to CARP VIP, use Normal router priority
● radvd will start/stop with CARP status (preferred method)
– Bind to LAN, set primary to High priority, set secondary to Low
● Set DNS to LAN CARP VIP in RA and DHCPv6 (if used)
VPNs, Additional Interfaces
● If the firewalls are using the default DNS (DNS Resolver, Unbound) you must visit Services > DNS
Resolver and press Save at least once, otherwise local clients cannot use the CARP VIP for DNS
resolution.
● For VPNs and other local services, if they require binding to only one IP address, set the Interface to a
CARP VIP (e.g. IPsec, OpenVPN)
● Support in packages varies but some have support for CARP VIPs, XMLRPC, or CARP status detection
● ACME package
– Install it only on the primary
– Make a cert with SAN entries for both hosts individually and the CARP VIP all in a single cert
– Cert will sync to secondary automatically via XMLRPC since certificates already sync
● When adding a new local interface:
– Assign the interface on both nodes identically
– Enable the interface on both nodes, using different IP addresses within the same subnet
– Add a CARP VIP inside the new subnet (Primary node only)
– Add firewall rules (Primary node only)
– Add Manual Outbound NAT for a source of the new subnet, utilizing the CARP VIP for translation (Primary node
only)
– Configure the DHCP/DHCPv6/RA server for the new subnet, utilizing the CARP VIP for DNS and Gateway roles
(Optional, Primary node only)
Testing
● Verify that a client on the LAN can pass through the cluster (ping / browse to Internet host)
● Verify XMLRPC by checking if a setting syncs, and via Status > Filter Reload, Force Config
Sync
●
Verify CARP by checking Status > CARP
– If any VIPs show as INIT, then an interface is down. Fix the interface or remove the VIP.
– Don’t be tempted by the “Reset Demotion Factor” button, it’s not a permanent fix.
●
Verify state sync by checking pfsync nodes on Status > CARP and contents of Diag > States
● Testing Failover:
– Status > CARP, enter maintenance mode or disable CARP on primary
– Check status on secondary, should now be MASTER
– Test LAN client connectivity, DHCP, etc
– Enable CARP on Primary
– Retest connectivity
● Downloading a file, streaming audio, or streaming video will most likely continue uninterrupted.
VoIP-based phone calls may have a slight disruption as they are not buffered like the others.
Troubleshooting
● Review the config
● Check CARP status, if VIP is INIT, check interface link, edit/save/apply VIP
● Check for conflicting VHIDs
● Check subnet mask on CARP VIPs
● Switch/L2 issues
– Ensure boxes are on the correct switch/VLAN/L2
– Try to ping between the nodes on the affected interface
– Ensure switches are properly trunking, if applicable
– Try another switch (especially if using a modem/CPE switch)
– Disable IGMP snooping, broadcast/multicast storm control, etc.
– For vswitches, check promiscuous mode, forged transmits, MAC changes
● Check system logs, firewall logs, notices
Upgrading
●
Review changelog/blog/upgrade guide
– https://doc.pfsense.org/index.php/Upgrade_Guide
– https://doc.pfsense.org/index.php/Redundant_Firewalls_Upgrade_Guide
●
Take a backup
– Cannot stress this enough
● If the cluster is running an older (2.2.x or before) version, disable XMLRPC on primary
●
Upgrade secondary
● Check secondary
– Check logs, status pages, and so on to ensure everything looks OK
●
Switch CARP to maintenance mode on primary (sticks across reboots)
● Test secondary, ensure proper connectivity and service function
– If something fails, simply switch out of maintenance mode on the primary then repair the secondary
●
Upgrade primary
● Exit maintenance mode on primary
● Test again
●
If XMLRPC was disabled, enable it again, then test sync
Conclusion
● July 2016 hangout in the archive has some
advanced HA topics:
– Multi-WAN with HA
– Converting an existing single firewall to HA cluster
● For IPv6 basics, see the July 2015 hangout
● Questions?
● Ideas for hangout topics? Post on forum,
comment on the blog posts, Reddit, etc