High Availability
Using CARP, XMLRPC, and pfsync
June 2015 Hangout
Jim Pingle
Project Notes
● pfSense 2.2.3 is out!
– Lots of beneficial improvements, security enhancements
– Fix for FS corruption on power loss or crash.
– https://blog.pfsense.org/?p=1810
● Hangouts, Book, AutoConfigBackup for 10 hosts now available for 1yr at no
cost for those registering SG series hardware and C2758
● Book is being actively updated, new online HTML format, downloads in
PDF, ePub, Mobi still available
● SG-8860 1U now shipping
– Same as the existing SG-8860 but in a 1U chassis
– Similar to C2758, but with 6x Intel 1Gb network ports
● SG-2220 expected to begin shipping soon
● SG-4860 1U also coming soon
About this Hangout
● A lot to cover, so may move fast
● Components of a High Availability Cluster
● Prerequisites
● Configuration of a cluster from default config
● Testing
● Troubleshooting
● Upgrading
Cluster Overview
Internet
WAN Switch
LAN Switch
Sync Interface
WAN
198.51.100.201
WAN
198.51.100.202
Shared WAN CARP IP
198.51.100.200
LAN
192.168.1.2
LAN
192.168.1.3
Shared LAN CARP IP
192.168.1.1
172.16.1.2 172.16.1.3
Primary Secondary
HA Components
● IP Address Redundancy (CARP)
● Configuration Sync (XMLRPC)
● State Sync (pfsync)
IP Address Redundancy (CARP)
● CARP VIPs are shared between cluster nodes
● Works similar to VRRP
● Heartbeats transmitted on interfaces with CARP VIPs
– Approx. 1 per second (base) + fraction of a sec (skew)
– If a secondary node stops receiving heartbeats or they are too slow, it will take over as
master
– Active/Passive only, no Active/Active
● Traffic to cluster should route to CARP VIPs
– Routed inbound traffic, VPNs, port forwards, local gateway, DNS
– Exception: Only access firewall GUI/SSH by interface IP addresses, not VIP!
● Traffic from cluster should originate from CARP VIPs
– Outbound NAT, VPNs
● Can conflict with VRRP/HSRP
Configuration Sync (XMLRPC)
● Communicates via the Sync interface
● Copies settings from primary to secondary on save
● Does not sync interfaces or System > Advanced,
System > General, or most packages.
● Will sync rules, NAT, aliases, VPNs, many other
areas
● Not strictly required for HA, but makes the job much
easier
State Sync (pfsync)
● Communicates via the Sync interface
● State inserts, deletes, etc exchanged between nodes
● State table on both nodes should be nearly, if not, identical
● If-bound states mean physical interface assignments must be
identical (or LAGGs)
● When primary fails, connections continue to flow through
secondary
● Requires use of CARP VIPs with NAT, no traffic direct to/from node
● HA can work without it, but connections will be disrupted during
failover
Assumptions
● Two pfSense firewalls
– Could use more, but that isn't covered here and does
not offer a significant advantage
● Devices are at (or near) a default configuration
– Conversion to CARP from an existing install is
possible, but can be tricky. May cover some other
time.
● Devices have identical interfaces assigned in
identical order
Pick a Sync Interface
● One interface will interconnect between units for XMLRPC and
pfsync traffic
● Name it the “Sync” interface
● Don't call it a “CARP interface”, no CARP traffic on it, it does not
factor into failover directly
● Can consume a non-trivial amount of bandwidth for state
synchronization, especially in environments with lots of state churn
● Technically optional but highly recommended
– If no physical interface is available, use a VLAN
– If no VLAN is possible, it could share a secure internal interface, but that
can still be dangerous/insecure
Interface Assignments
● Nodes must have the same number of interfaces and they must
be assigned in identical order.
●
The order of interfaces on Interfaces > (assign) must be identical
●
For state synchronization to work, interfaces should be the same
type as well (e.g. igbX, emX, etc)
– If that is not possible, interfaces could be added to LAGG instances so
the assigned interfaces can match (e.g. lagg0 on both, rather than igb0
and em0)
● If the interface order does not match, then areas such as firewall
rules can sync to the “wrong” interface on other nodes, among
other issues
IP Address Requirements
● Ideally, three IP addresses per subnet (except Sync)
– One IP address per node, plus at least one CARP VIP for each interface
– Each WAN should be a /29 or larger
– Sync interface has no CARP and thus does not need an additional IP address
● Single IP address CARP is possible on 2.2+, but not generally recommended
– For WANs, it means only the active master may communicate out for gateway
monitoring, updates, package installs, etc.
– LANs generally do not have IP address shortages but it can work fine there
– If only a /30 is possible on WAN, feel free to use it (with the above caveats)
● CARP requires a static IP WAN for full functionality
– DHCP or PPPoE WAN may work in some cases, but not seamless failover
● All nodes MUST connect to the same WANs identically, it is not feasible to have
the primary on one ISP and the secondary on a different ISP
Check VHID/VRID Usage
● CARP/VRRP/HSRP use similar mechanisms
● MAC address of CARP VIP is determined by VHID
● Overlapping IDs can cause MAC conflicts among other
issues
● To find if any are in use...
– Diag > Packet Capture, set for CARP/VRRP
– Capture for several seconds minimum
– If any packets are observed, check VHID/VRID (may need to load
capture in Wireshark)
– Note the used ID(s) and avoid using them with CARP VIPs
Basic Setup Pre-requisites
● Give each node a unique hostname
– Ex: fw-a/fw-b, fw-pri/fw-sec, Rocket/Groot, Batman/Robin, Pinky/Brain...
● Adjust IP addresses so they do not directly conflict
– Ex: Primary LAN to 192.168.1.2, Secondary to .3
● GUI must be running same protocol and port on both nodes
– Ex: HTTPS on port 443
● The admin account cannot be disabled and must have the same
password on both nodes
● Both nodes must have a static IP address WAN configured in the same
WAN subnet with a proper gateway and so on
● Both nodes must have DNS configured properly under System > General
Setup
Switch Setup
● CARP uses multicast, so the switch cannot block, filter, limit, or otherwise
interfere with multicast
– IGMP snooping on some switches can conflict and may need to be disabled
● Nearly all CARP status problems, such as dual master scenarios, are due to
switch or other layer 2 issues
● The switch must, at least:
– Allow Multicast traffic to be sent and received without interference on ports using
CARP VIPs.
– Allow traffic to be sent and received using multiple MAC addresses
– Allow the CARP VIP MAC address to move between ports
● Virtual/Hypervisor switches often have issues with one or more of the above
and require adjustments such as enabling Promiscuous Mode, Forged
Transmits, and allowing MAC address changes
Configuration!
● Setup Sync interface
● Configure pfsync
● Configure XMLRPC
● Add CARP VIPs
● Setup Manual Outbound NAT
● Setup DHCP
● VPNs, other services
● Adding more Interfaces
Setup Sync Interface
● Interface config
– Enable the interface, name it Sync
– Set for a static IP address in the chosen sync subnet
● Ex: Primary as 172.16.1.2/24, secondary as .3
– Do not check block Private Networks or Bogons
● Add firewall rules for sync
– On Primary:
● Add rule to pass TCP/443 to on the Sync address for GUI
● Add rule to pass pfsync from Sync net to any.
● Optionally add a rule to pass ICMP echo to/from Sync net
– On Secondary:
● Add rule to pass any proto from Sync net to any
● Rule is different so it's obvious when it has been replaced by config sync!
Configure pfsync
● System > High Avail Sync
● Enable on BOTH nodes
● Check Synchronize States
● Set Synchronize Interface to Sync
● Set the pfsync Synchronize Peer IP to the IP address of the
other node
– Ex: On the Primary, enter the IP address of the secondary, 172.16.1.3,
and vice versa.
– Technically this setting is optional. Without it set, state sync is sent via
multicast rather than unicast. With only two nodes, unicast can be more
reliable
● Click Save
Configure XMLRPC
● System > High Avail Sync
● Enable only on the Primary node!
● Set Synchronize Config to IP to the secondary node's Sync interface IP
(e.g. 172.16.1.3)
● Set Remote System Username to admin
– Must use admin, no other user will work
● Set Remote System Password to the admin password
● Check the boxes for each area to synchronize
● Click Save
● On the secondary, check and see if the rule changes on the Sync interface
carried over
● From here on, do not make any changes on the secondary in an area that
will sync
Add CARP VIPs
● Firewall > Virtual IPs
● Add a CARP type VIP for each interface except Sync
● Example:
– WAN CARP VIP 198.51.100.200/24, VHID Group 200, random password,
Base=1, Skew=0
– LAN CARP VIP 192.168.1.1/24, VHID=1, random password, Base=1, Skew=0
– Subnet mask must match the interface subnet!
● If a VIP is sensitive to latency (e.g. Secondary is in another building), try
moving Base up by 1 until stability is achieved
● Check Status > CARP
– If CARP shows disabled on either node, enable it
– Primary should show MASTER, Secondary BACKUP
● Repeat for more VIPs as needed
Setup NAT
● Firewall > NAT, Outbound tab
● Change Mode to Manual
● Edit each rule for a local interface (e.g. LAN)
● Set the Translation to the CARP WAN VIP
● After editing all rules, click Apply Changes
● If/when additional interfaces are added in the future,
rules must be added manually!
● Add port forwards, 1:1 NAT if needed. May need more
WAN VIPs if using multiple IP addresses
Setup DHCP
● Services > DHCP Server, LAN tab on Primary
● Set DNS Server to the LAN CARP VIP
● Set Gateway to the LAN CARP VIP
● Set the Failover Peer IP to the actual LAN IP address of the
secondary node, e.g. 192.168.1.3
● Click Save
● Repeat for additional local interfaces if necessary
● Gateway must be the CARP VIP, DNS if using the firewall for DNS
also
● Status > DHCP Leases, check pool status, should be
“normal”/”normal”
VPNs, Additional Interfaces
● If using the default DNS (DNS Resolver, Unbound) you must visit Services > DNS Resolver
and press Save at least once, otherwise local clients cannot use the CARP VIP for DNS
resolution.
● For VPNs and other local services, if they require binding to only one IP address, set the
Interface to a CARP VIP (e.g. IPsec, OpenVPN)
● Support in packages varies but some have support for CARP VIPs, XMLRPC, or CARP status
detection
● When adding a new local interface:
– Assign the interface on both nodes identically
– Enable the interface on both nodes, using different IP addresses within the same subnet
– Add a CARP VIP inside the new subnet (Primary node only)
– Add firewall rules (Primary node only)
– Add Manual Outbound NAT for a source of the new subnet, utilizing the CARP VIP for translation
(Primary node only)
– Configure the DHCP server for the new subnet, utilizing the CARP VIP for DNS and Gateway roles
(Optional, Primary node only)
Testing
● Verify that a client on the LAN can pass through the cluster
● Verify XMLRPC by checking if a setting syncs, and via Status > Filter Reload, Force
Config Sync
● Verify CARP by checking Status > CARP
● Verify state sync by checking pfsync nodes on Status > CARP and contents of Diag
> States
● Testing Failover:
– Status > CARP, disable CARP on primary
– Check status on secondary, should now be MASTER
– Test LAN client connectivity, DHCP, etc
– Enable CARP on Primary
– Retest connectivity
● Downloading a file, streaming audio, or streaming video will most likely continue
uninterrupted. VoIP-based phone calls may have a slight disruption as they are not
buffered like the others.
Troubleshooting
● Review the config
● Check CARP status, if VIP is INIT, check interface link, edit/save/apply VIP
● Check for conflicting VHIDs
● Check subnet mask on CARP VIPs
● Switch/L2 issues
– Ensure boxes are on the correct switch/VLAN/L2
– Try to ping between the nodes on the affected interface
– Ensure switches are properly trunking, if applicable
– Try another switch (especially if using a modem/CPE switch)
– Disable IGMP snooping, broadcast/multicast storm control, etc.
● Check system logs, firewall logs, notices
Upgrading
● Review changelog/blog/upgrade guide
● Take a backup
– Cannot stress this enough
● Upgrade secondary
● Test secondary
● Switch CARP to maintenance mode on primary
● Upgrade primary
● Exit maintenance mode on primary
● Test again
Conclusion
● Questions?
● Ideas for hangout topics? Post on forum,
comment on the blog posts, Reddit, etc

High Availability - pfSense Hangout June 2015

  • 1.
    High Availability Using CARP,XMLRPC, and pfsync June 2015 Hangout Jim Pingle
  • 2.
    Project Notes ● pfSense2.2.3 is out! – Lots of beneficial improvements, security enhancements – Fix for FS corruption on power loss or crash. – https://blog.pfsense.org/?p=1810 ● Hangouts, Book, AutoConfigBackup for 10 hosts now available for 1yr at no cost for those registering SG series hardware and C2758 ● Book is being actively updated, new online HTML format, downloads in PDF, ePub, Mobi still available ● SG-8860 1U now shipping – Same as the existing SG-8860 but in a 1U chassis – Similar to C2758, but with 6x Intel 1Gb network ports ● SG-2220 expected to begin shipping soon ● SG-4860 1U also coming soon
  • 3.
    About this Hangout ●A lot to cover, so may move fast ● Components of a High Availability Cluster ● Prerequisites ● Configuration of a cluster from default config ● Testing ● Troubleshooting ● Upgrading
  • 4.
    Cluster Overview Internet WAN Switch LANSwitch Sync Interface WAN 198.51.100.201 WAN 198.51.100.202 Shared WAN CARP IP 198.51.100.200 LAN 192.168.1.2 LAN 192.168.1.3 Shared LAN CARP IP 192.168.1.1 172.16.1.2 172.16.1.3 Primary Secondary
  • 5.
    HA Components ● IPAddress Redundancy (CARP) ● Configuration Sync (XMLRPC) ● State Sync (pfsync)
  • 6.
    IP Address Redundancy(CARP) ● CARP VIPs are shared between cluster nodes ● Works similar to VRRP ● Heartbeats transmitted on interfaces with CARP VIPs – Approx. 1 per second (base) + fraction of a sec (skew) – If a secondary node stops receiving heartbeats or they are too slow, it will take over as master – Active/Passive only, no Active/Active ● Traffic to cluster should route to CARP VIPs – Routed inbound traffic, VPNs, port forwards, local gateway, DNS – Exception: Only access firewall GUI/SSH by interface IP addresses, not VIP! ● Traffic from cluster should originate from CARP VIPs – Outbound NAT, VPNs ● Can conflict with VRRP/HSRP
  • 7.
    Configuration Sync (XMLRPC) ●Communicates via the Sync interface ● Copies settings from primary to secondary on save ● Does not sync interfaces or System > Advanced, System > General, or most packages. ● Will sync rules, NAT, aliases, VPNs, many other areas ● Not strictly required for HA, but makes the job much easier
  • 8.
    State Sync (pfsync) ●Communicates via the Sync interface ● State inserts, deletes, etc exchanged between nodes ● State table on both nodes should be nearly, if not, identical ● If-bound states mean physical interface assignments must be identical (or LAGGs) ● When primary fails, connections continue to flow through secondary ● Requires use of CARP VIPs with NAT, no traffic direct to/from node ● HA can work without it, but connections will be disrupted during failover
  • 9.
    Assumptions ● Two pfSensefirewalls – Could use more, but that isn't covered here and does not offer a significant advantage ● Devices are at (or near) a default configuration – Conversion to CARP from an existing install is possible, but can be tricky. May cover some other time. ● Devices have identical interfaces assigned in identical order
  • 10.
    Pick a SyncInterface ● One interface will interconnect between units for XMLRPC and pfsync traffic ● Name it the “Sync” interface ● Don't call it a “CARP interface”, no CARP traffic on it, it does not factor into failover directly ● Can consume a non-trivial amount of bandwidth for state synchronization, especially in environments with lots of state churn ● Technically optional but highly recommended – If no physical interface is available, use a VLAN – If no VLAN is possible, it could share a secure internal interface, but that can still be dangerous/insecure
  • 11.
    Interface Assignments ● Nodesmust have the same number of interfaces and they must be assigned in identical order. ● The order of interfaces on Interfaces > (assign) must be identical ● For state synchronization to work, interfaces should be the same type as well (e.g. igbX, emX, etc) – If that is not possible, interfaces could be added to LAGG instances so the assigned interfaces can match (e.g. lagg0 on both, rather than igb0 and em0) ● If the interface order does not match, then areas such as firewall rules can sync to the “wrong” interface on other nodes, among other issues
  • 12.
    IP Address Requirements ●Ideally, three IP addresses per subnet (except Sync) – One IP address per node, plus at least one CARP VIP for each interface – Each WAN should be a /29 or larger – Sync interface has no CARP and thus does not need an additional IP address ● Single IP address CARP is possible on 2.2+, but not generally recommended – For WANs, it means only the active master may communicate out for gateway monitoring, updates, package installs, etc. – LANs generally do not have IP address shortages but it can work fine there – If only a /30 is possible on WAN, feel free to use it (with the above caveats) ● CARP requires a static IP WAN for full functionality – DHCP or PPPoE WAN may work in some cases, but not seamless failover ● All nodes MUST connect to the same WANs identically, it is not feasible to have the primary on one ISP and the secondary on a different ISP
  • 13.
    Check VHID/VRID Usage ●CARP/VRRP/HSRP use similar mechanisms ● MAC address of CARP VIP is determined by VHID ● Overlapping IDs can cause MAC conflicts among other issues ● To find if any are in use... – Diag > Packet Capture, set for CARP/VRRP – Capture for several seconds minimum – If any packets are observed, check VHID/VRID (may need to load capture in Wireshark) – Note the used ID(s) and avoid using them with CARP VIPs
  • 14.
    Basic Setup Pre-requisites ●Give each node a unique hostname – Ex: fw-a/fw-b, fw-pri/fw-sec, Rocket/Groot, Batman/Robin, Pinky/Brain... ● Adjust IP addresses so they do not directly conflict – Ex: Primary LAN to 192.168.1.2, Secondary to .3 ● GUI must be running same protocol and port on both nodes – Ex: HTTPS on port 443 ● The admin account cannot be disabled and must have the same password on both nodes ● Both nodes must have a static IP address WAN configured in the same WAN subnet with a proper gateway and so on ● Both nodes must have DNS configured properly under System > General Setup
  • 15.
    Switch Setup ● CARPuses multicast, so the switch cannot block, filter, limit, or otherwise interfere with multicast – IGMP snooping on some switches can conflict and may need to be disabled ● Nearly all CARP status problems, such as dual master scenarios, are due to switch or other layer 2 issues ● The switch must, at least: – Allow Multicast traffic to be sent and received without interference on ports using CARP VIPs. – Allow traffic to be sent and received using multiple MAC addresses – Allow the CARP VIP MAC address to move between ports ● Virtual/Hypervisor switches often have issues with one or more of the above and require adjustments such as enabling Promiscuous Mode, Forged Transmits, and allowing MAC address changes
  • 16.
    Configuration! ● Setup Syncinterface ● Configure pfsync ● Configure XMLRPC ● Add CARP VIPs ● Setup Manual Outbound NAT ● Setup DHCP ● VPNs, other services ● Adding more Interfaces
  • 17.
    Setup Sync Interface ●Interface config – Enable the interface, name it Sync – Set for a static IP address in the chosen sync subnet ● Ex: Primary as 172.16.1.2/24, secondary as .3 – Do not check block Private Networks or Bogons ● Add firewall rules for sync – On Primary: ● Add rule to pass TCP/443 to on the Sync address for GUI ● Add rule to pass pfsync from Sync net to any. ● Optionally add a rule to pass ICMP echo to/from Sync net – On Secondary: ● Add rule to pass any proto from Sync net to any ● Rule is different so it's obvious when it has been replaced by config sync!
  • 18.
    Configure pfsync ● System> High Avail Sync ● Enable on BOTH nodes ● Check Synchronize States ● Set Synchronize Interface to Sync ● Set the pfsync Synchronize Peer IP to the IP address of the other node – Ex: On the Primary, enter the IP address of the secondary, 172.16.1.3, and vice versa. – Technically this setting is optional. Without it set, state sync is sent via multicast rather than unicast. With only two nodes, unicast can be more reliable ● Click Save
  • 19.
    Configure XMLRPC ● System> High Avail Sync ● Enable only on the Primary node! ● Set Synchronize Config to IP to the secondary node's Sync interface IP (e.g. 172.16.1.3) ● Set Remote System Username to admin – Must use admin, no other user will work ● Set Remote System Password to the admin password ● Check the boxes for each area to synchronize ● Click Save ● On the secondary, check and see if the rule changes on the Sync interface carried over ● From here on, do not make any changes on the secondary in an area that will sync
  • 20.
    Add CARP VIPs ●Firewall > Virtual IPs ● Add a CARP type VIP for each interface except Sync ● Example: – WAN CARP VIP 198.51.100.200/24, VHID Group 200, random password, Base=1, Skew=0 – LAN CARP VIP 192.168.1.1/24, VHID=1, random password, Base=1, Skew=0 – Subnet mask must match the interface subnet! ● If a VIP is sensitive to latency (e.g. Secondary is in another building), try moving Base up by 1 until stability is achieved ● Check Status > CARP – If CARP shows disabled on either node, enable it – Primary should show MASTER, Secondary BACKUP ● Repeat for more VIPs as needed
  • 21.
    Setup NAT ● Firewall> NAT, Outbound tab ● Change Mode to Manual ● Edit each rule for a local interface (e.g. LAN) ● Set the Translation to the CARP WAN VIP ● After editing all rules, click Apply Changes ● If/when additional interfaces are added in the future, rules must be added manually! ● Add port forwards, 1:1 NAT if needed. May need more WAN VIPs if using multiple IP addresses
  • 22.
    Setup DHCP ● Services> DHCP Server, LAN tab on Primary ● Set DNS Server to the LAN CARP VIP ● Set Gateway to the LAN CARP VIP ● Set the Failover Peer IP to the actual LAN IP address of the secondary node, e.g. 192.168.1.3 ● Click Save ● Repeat for additional local interfaces if necessary ● Gateway must be the CARP VIP, DNS if using the firewall for DNS also ● Status > DHCP Leases, check pool status, should be “normal”/”normal”
  • 23.
    VPNs, Additional Interfaces ●If using the default DNS (DNS Resolver, Unbound) you must visit Services > DNS Resolver and press Save at least once, otherwise local clients cannot use the CARP VIP for DNS resolution. ● For VPNs and other local services, if they require binding to only one IP address, set the Interface to a CARP VIP (e.g. IPsec, OpenVPN) ● Support in packages varies but some have support for CARP VIPs, XMLRPC, or CARP status detection ● When adding a new local interface: – Assign the interface on both nodes identically – Enable the interface on both nodes, using different IP addresses within the same subnet – Add a CARP VIP inside the new subnet (Primary node only) – Add firewall rules (Primary node only) – Add Manual Outbound NAT for a source of the new subnet, utilizing the CARP VIP for translation (Primary node only) – Configure the DHCP server for the new subnet, utilizing the CARP VIP for DNS and Gateway roles (Optional, Primary node only)
  • 24.
    Testing ● Verify thata client on the LAN can pass through the cluster ● Verify XMLRPC by checking if a setting syncs, and via Status > Filter Reload, Force Config Sync ● Verify CARP by checking Status > CARP ● Verify state sync by checking pfsync nodes on Status > CARP and contents of Diag > States ● Testing Failover: – Status > CARP, disable CARP on primary – Check status on secondary, should now be MASTER – Test LAN client connectivity, DHCP, etc – Enable CARP on Primary – Retest connectivity ● Downloading a file, streaming audio, or streaming video will most likely continue uninterrupted. VoIP-based phone calls may have a slight disruption as they are not buffered like the others.
  • 25.
    Troubleshooting ● Review theconfig ● Check CARP status, if VIP is INIT, check interface link, edit/save/apply VIP ● Check for conflicting VHIDs ● Check subnet mask on CARP VIPs ● Switch/L2 issues – Ensure boxes are on the correct switch/VLAN/L2 – Try to ping between the nodes on the affected interface – Ensure switches are properly trunking, if applicable – Try another switch (especially if using a modem/CPE switch) – Disable IGMP snooping, broadcast/multicast storm control, etc. ● Check system logs, firewall logs, notices
  • 26.
    Upgrading ● Review changelog/blog/upgradeguide ● Take a backup – Cannot stress this enough ● Upgrade secondary ● Test secondary ● Switch CARP to maintenance mode on primary ● Upgrade primary ● Exit maintenance mode on primary ● Test again
  • 27.
    Conclusion ● Questions? ● Ideasfor hangout topics? Post on forum, comment on the blog posts, Reddit, etc