Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distributed Firewall (DFW)
Mike Svoboda
Sr. Staff Engineer, Production Infrastructure Engineering
LinkedIn: https://www.li...
Agenda for today’s discussion
Slides
5-8
Problem 1: Moving machines around in the datacenter
to create a DMZ
Slides
11-29
...
What Motivated LinkedIn to create
DFW?
Problem 1:
Moving machines around in the
datacenter to create DMZ
Script Kiddie Hacking: Easy network attack vectors
• Port Scanning – What is the remote device responding to?
• Enumeratio...
Wake up call!
PHYSICALLY MOVING MACHINES IN THE DATACENTER DOESN’T SCALE!
• Providing additional layers of network securit...
Production Network Security
TREAT THE PRODUCTION NETWORK AS IF IT’S THE PUBLIC INTERNET.
• Milton in the finance departmen...
Production Network Security
TREAT THE PRODUCTION NETWORK AS IF IT’S THE PUBLIC INTERNET.
• The hacker who has control of M...
Problem 2:
Horizontal vs. Vertical Network Design
The Vertical Network Architecture
• Big iron switches deployed at
the entry point of the
datacenter with uplink access
to ...
The Vertical Network Architecture
• Each packet between environments has to flow through
thousands of rules before hitting...
The Vertical Network Architecture
• Traffic shifts become problematic, as not all ACLs
exists in every CRT.
• TCAM tables ...
The Horizontal Network Architecture
• Instead of scaling vertically, scale horizontally using
interconnected pods. Ofter m...
1
Present: Altair Design
Pod 1
ToRX ToR32ToRYToR1
Pod X
ToRX ToR32ToRYToR1
Pod Y
ToRX ToR32ToRYToR1
Pod 64
ToRX ToR32ToRYT...
Non-Blocking Parallel Fabrics
1
Fabric 4
Fabric 3
Fabric 2
Fabric 1
ToR
ServerServerServerServerServerServerServerServerSe...
1
5 Stage Clos
1
ToR
1
ToR
1025
ToR
1024
ToR
2048
1
Leaf
1
Leaf
256
Leaf
128
Leaf
129
1
Spine
1
Spine
128
2
Fabric
1
Fabric
4
2
Fabric
1
Fabric
4
2
ToR
1
ToR
1025
ToR
1024
ToR
2048
Leaf
1
Leaf
256
Leaf
128
Leaf
129
Fabric
1
Fabric
4
~2400 switches to support ~100,000 ...
Tier 1
ToR - Top of the Rack
Broadcom Tomahawk 32x 100G
10/25/50/100G Attachement
Regular Server Attachement 10G
Each Cabi...
Tier 2
Leaf
Broadcom Tomahawk 32x 100G
Non-Blocking Topology:
32x downlinks of 50G to serve 32 ToR
32x uplinks of 50G to p...
Tier 3
Spine
Broadcom Tomahawk 32x 100G
Non-Blocking Topology:
64 downlinks to provide 1:1 Over-subscription
To serve 64 p...
2
Simplifying the picture
2
Simplifying the picture
Fabric1..4 Spine1
Leaf 129..132
ToR 1025
Leaf 1..4
ToR 1
Where do we put the Firewall in this architecture?
• Since we’ve scaled the network horizontally, there’s no “choke point”...
What is Distributed Firewall (DFW)?
What is DFW?
• Software Defined Networking (SDN)
• The applications deployed to the machine / container create a unique se...
Advantages of DFW
• Fully distributed. More network I/O throughput, CPU horsepower, scales linearly.
• Datacenter space is...
Advantages of DFW
• Each node contains a small subset of rules vs. the CRT
network firewall containing tens of thousands.
...
New Business Capabilities
• Pre-security zone. Functionality that only host based firewalls could provide:
• Blackhole: Ta...
New Business Capabilities
• Decommission datacenters in a controlled manner
• Allow authorized users to keep applications ...
ACLDB
• Centralized database that feeds sources
of truth by scraping CMDB and delivers
JSON data containers to each machin...
High Level Architecture
• Only inbound traffic is filtered. All loopback / outbound traffic will always be immediately
pas...
High Level Architecture
• Pre-security zone: Functionalities referenced on “new business capabilities slide”
• Security zo...
DFW is stateless. Precompute the ruleset, every execution
• Every execution of DFW builds the iptables / ipset configurati...
Work with the humans, not against them
• Since automation is constantly enforcing its known good state, we need to plan fo...
References:
• Altair Network Design: https://www.slideshare.net/shawnzandi/linkedin-openfabric-project-
interop-2017
• Eng...
Q/A session
Production ready implementation / Demo the technology:
Zener: https://www.zener.io/lisa17
BOF: Distributed, So...
2017 - LISA - LinkedIn's Distributed Firewall (DFW)
2017 - LISA - LinkedIn's Distributed Firewall (DFW)
Upcoming SlideShare
Loading in …5
×

2017 - LISA - LinkedIn's Distributed Firewall (DFW)

4,476 views

Published on

How LinkedIn scaled their network horizontally by leveraging Distributed Firewall and a spine/leaf network infrastructure.

Published in: Technology
  • Be the first to comment

2017 - LISA - LinkedIn's Distributed Firewall (DFW)

  1. 1. Distributed Firewall (DFW) Mike Svoboda Sr. Staff Engineer, Production Infrastructure Engineering LinkedIn: https://www.linkedin.com/in/mikesvoboda/
  2. 2. Agenda for today’s discussion Slides 5-8 Problem 1: Moving machines around in the datacenter to create a DMZ Slides 11-29 Problem 2: Horizontal vs Vertical Network design Slides 30-40 What is Distributed Firewall? Slide 42 References Q/A Session
  3. 3. What Motivated LinkedIn to create DFW?
  4. 4. Problem 1: Moving machines around in the datacenter to create DMZ
  5. 5. Script Kiddie Hacking: Easy network attack vectors • Port Scanning – What is the remote device responding to? • Enumeration- Gather information about services running on the target machine • Data Extraction- Pull as much valuable information from the remote service as possible
  6. 6. Wake up call! PHYSICALLY MOVING MACHINES IN THE DATACENTER DOESN’T SCALE! • Providing additional layers of network security to an application requires either physically moving machines around in the datacenter, or rewire network cables to create DMZs. • DFW to complement existing network firewall ACL systems, not replace it. • Additional layer of security in our infrastructure to complement existing systems. How can we respond? Move the machines into a DMZ behind a network firewall, limiting network connectivity?
  7. 7. Production Network Security TREAT THE PRODUCTION NETWORK AS IF IT’S THE PUBLIC INTERNET. • Milton in the finance department clicked on a bad email attachment and now has malware on his workstation. Thanks Milton, appreciate that. • Milton’s workstation resides inside the internal office network, which has the ability to connect to application resources in Staging, Q/A, or Production servers. • Milton is one employee out of thousands.
  8. 8. Production Network Security TREAT THE PRODUCTION NETWORK AS IF IT’S THE PUBLIC INTERNET. • The hacker who has control of Milton’s machine was able to exploit one application out of thousands, and now has full production network access. • The hacker can take their time analyzing various production services, probing what responds to API calls. • What are the details behind the Equifax leak(s)?
  9. 9. Problem 2: Horizontal vs. Vertical Network Design
  10. 10. The Vertical Network Architecture • Big iron switches deployed at the entry point of the datacenter with uplink access to LinkedIn’s internal networks. • More big iron switches at the second and their tier of the network. • This image is a logical representation, at minimum, 1k servers, upwards of 5k. DATACENTER CLUSTERS PER ENVIRONMENT
  11. 11. The Vertical Network Architecture • Each packet between environments has to flow through thousands of rules before hitting a match. • Firewall admin has to fit the entire security model into their brain. This is error prone and difficult to update. • TCAM tables are stored in hardware silicon. We’re limited on the complexity that can be enforced. • Hardware ASICs are fast, but expensive! Deploying big iron costs millions of dollars! DATACENTER CLUSTERS PER ENVIRONMENT
  12. 12. The Vertical Network Architecture • Traffic shifts become problematic, as not all ACLs exists in every CRT. • TCAM tables can only support complexity of the environment they host, not all “PROD” ACLs. It could support the “PROD1” logical implementation of linkedin.com, but not “PROD2” and “PROD3” application fabrics. • Human cost of hand maintaining per-application CRT ACLs rises exponentially. MULTIPLE CLUSTERS PER DATACENTER
  13. 13. The Horizontal Network Architecture • Instead of scaling vertically, scale horizontally using interconnected pods. Ofter multiple paths for machines to communicate with each other. • Allow datacenter engineering to maximize resources • The “cluster” is too large of a deployment. Sometimes we need to add capacity to an environment down to the cabinet level BUILD PODS INSTEAD OF CLUSTERS
  14. 14. 1 Present: Altair Design Pod 1 ToRX ToR32ToRYToR1 Pod X ToRX ToR32ToRYToR1 Pod Y ToRX ToR32ToRYToR1 Pod 64 ToRX ToR32ToRYToR1 Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1Leaf4Leaf3Leaf2Leaf1 Spine32SpineYSpineXSpine1 Spine1 SpineX SpineY Spine32 Spine1 SpineX SpineY Spine32Spine32SpineYSpineXSpine1 ToR Leaf Spine True 5 Stage Clos Architecture (Maximum Path Length: 5 Chipsets to Minimize Latency) Moved complexity from big boxes to our advantage, where we can manage and control! Single SKU - Same Chipset - Uniform IO design (Bandwidth, Latency and Buffering) Dedicated control plane, OAM and CPU for each ASIC
  15. 15. Non-Blocking Parallel Fabrics 1 Fabric 4 Fabric 3 Fabric 2 Fabric 1 ToR ServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServer ToR ServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServerServer
  16. 16. 1 5 Stage Clos
  17. 17. 1 ToR 1 ToR 1025 ToR 1024 ToR 2048
  18. 18. 1 Leaf 1 Leaf 256 Leaf 128 Leaf 129
  19. 19. 1 Spine 1 Spine 128
  20. 20. 2 Fabric 1 Fabric 4
  21. 21. 2 Fabric 1 Fabric 4
  22. 22. 2 ToR 1 ToR 1025 ToR 1024 ToR 2048 Leaf 1 Leaf 256 Leaf 128 Leaf 129 Fabric 1 Fabric 4 ~2400 switches to support ~100,000 bare metal servers
  23. 23. Tier 1 ToR - Top of the Rack Broadcom Tomahawk 32x 100G 10/25/50/100G Attachement Regular Server Attachement 10G Each Cabinet: 96 Dense Compute units Half Cabinet (Leaf-Zone) 48x 10G port for servers + 4 uplinks of 50G Full Cabinet: 2x Single ToR Zones: 48 + 48 = 96 Servers 2 Project Falco ToR Server Leaf Spine Spine Leaf Leaf Leaf Spine Spine
  24. 24. Tier 2 Leaf Broadcom Tomahawk 32x 100G Non-Blocking Topology: 32x downlinks of 50G to serve 32 ToR 32x uplinks of 50G to provide 1:1 Over-subscription 2 Project Falco ToR Server Leaf Spine Spine Leaf Leaf Leaf Spine Spine
  25. 25. Tier 3 Spine Broadcom Tomahawk 32x 100G Non-Blocking Topology: 64 downlinks to provide 1:1 Over-subscription To serve 64 pods (each pod 32 ToR) 100,000 Servers: Each pod (Approximately 1550 Compute) 2 Project Falco ToR Server Leaf Spine Spine Leaf Leaf Leaf Spine Spine
  26. 26. 2 Simplifying the picture
  27. 27. 2 Simplifying the picture Fabric1..4 Spine1 Leaf 129..132 ToR 1025 Leaf 1..4 ToR 1
  28. 28. Where do we put the Firewall in this architecture? • Since we’ve scaled the network horizontally, there’s no “choke point” like we had with the vertical network architecture • We want to be able to mix / match security zones in the same rack to maximize space / power • We want to have a customized security profile, down to the per server or container (network namespace) that is unique for the deployed applications. • Reject any requests from less trusted zones to hitting anything in PROD by default without defined ACLs.
  29. 29. What is Distributed Firewall (DFW)?
  30. 30. What is DFW? • Software Defined Networking (SDN) • The applications deployed to the machine / container create a unique security profile. • Deny incoming by default. Allow all loopback. Allow all outbound. • Whitelist incoming application ports to accept connections from the same security zone • Cross security zone communication requires human created ACLs based on our Topology application deployment system • As deployment actions happen across the datacenter, host based firewalls detect these conditions and update their rulesets accordingly. • The underlying firewall implementation is irrelevant • Currently using ipfilter (iptables) and nftables on Linux, but could expand to ipf, pf, Windows, etc.
  31. 31. Advantages of DFW • Fully distributed. More network I/O throughput, CPU horsepower, scales linearly. • Datacenter space is fully utilized and physical network flattened. Logical network is quite different. • The VLANs the top-of-rack switch exposes determine the security zones the attached machines belong to, not the massive vertical network cluster. Multiple security zones co- located in the same rack. New security zones trivial to create. • Only expose the network ports defined in our CMDB application deployment system • Further limit accessibility to the network ports via upstream consumers by consuming the application call graph. • Able to canary / ramp ACL changes down to the per host or container, no big bang modifications required.
  32. 32. Advantages of DFW • Each node contains a small subset of rules vs. the CRT network firewall containing tens of thousands. • Authorized users can modify the firewall on-demand without disabling it. • Communicate keep-alive executions and notify if a machine stops executing DFW. (hardware failure, etc.) • ACL complexity is localized to the service that requires it.
  33. 33. New Business Capabilities • Pre-security zone. Functionality that only host based firewalls could provide: • Blackhole: Take an application listening on port 11016 from taking any traffic, or block specific upstream consumers. • QoS: sshd and Zookeeper network traffic should get priority over Apache Kafka network I/O • Pinhole: Based on the callgraph, only allow upstream consumers to access my application on port 11016
  34. 34. New Business Capabilities • Decommission datacenters in a controlled manner • Allow authorized users to keep applications online, with DFW rejecting all inbound / outbound application traffic. Allow SSH / sudo / infrastructure services to stay online. • Conntrackd data exposed • IPv6 support comes for free! • Using ipset list:sets, every rule in DFW is written referencing the IPv4 and IPv6 ip addresses / netblocks in parallel. As the company shifts from IPv4 to IPv6 and new AAAA records come online, DFW automatically inserts these addresses and the firewalls permit the IPv6 traffic.
  35. 35. ACLDB • Centralized database that feeds sources of truth by scraping CMDB and delivers JSON data containers to each machine. • JSON containers land on machines via automated file transfers • Intra-security zone communication (What can communicate inside PROD?) • Inter-security zone communication (What is allowed to reach into PROD?)
  36. 36. High Level Architecture • Only inbound traffic is filtered. All loopback / outbound traffic will always be immediately passed. • Network security will be enforced by filtering inbound traffic at the known destination. • DFW rejects traffic, we do not drop traffic. The source host knows that its been rejected with a ICMP port unreachable event. • Build safeguards. Don’t firewall off 30k machines and become unrecoverable without pulling power to the whole datacenter.
  37. 37. High Level Architecture • Pre-security zone: Functionalities referenced on “new business capabilities slide” • Security zone: Mimic the existing network firewalls, allowing PROD  PROD communication. Rules are written as “accept from any” as we jump into a new iptables chain once the source machine resides in PROD netblocks. • Post security zone: Inter-security zone rules maintained in ACLDB. “Allow 5x machines in ZONE1 to hit 10x machines in PROD… • The rules placed in /etc/sysconfig/iptables and /etc/sysconfig/ip6tables are identical, since they reference list:set ipsets, which in turn reference the necessary IPv4 and IPv6 sub-ipsets.
  38. 38. DFW is stateless. Precompute the ruleset, every execution • Every execution of DFW builds the iptables / ipset configuration from scratch, compares to live state in the kernel • Current state of iptables / ipsets does not matter. • Users could flush ruleset, reboot, add or delete entries, destroy or create ipsets. We use auditd to monitor for setsockopt() system calls for unexpected rule insertions. • Next execution of DFW, we converge from whatever current state is to the intended state either scheduled or on discovery of setsockopt() calls. • Debugging is simple. Firewall issues after DFW execution is not from a “previous state issue.” Current state needs a behavior change for things to work. • Whitelist network ports, is the source machine connecting to me in my security zone, or do I need to add a rule in ACLDB to permit the traffic?
  39. 39. Work with the humans, not against them • Since automation is constantly enforcing its known good state, we need to plan for emergency situations where Authorized Users has to modify the firewall on demand • Example 1: Authorized users needs to whitelist a network port ASAP to stop an outage • Authorized user adds a destination network port to a specific ipset, which immediately starts whitelisting that traffic within the same security zone (PROD  PROD port 9000). Allows time to register the network port with the application in our CMDB application deployment system. DFW cleans this ipset automatically. • Example 2: Authorized users wants to blackhole an application without stopping / shutting it down • Shutting down an application corrupts memory state, which could be useful for developers to debug. Adding destination port 9000 into this ipset allows the application to remain online, but reject all incoming requests. • Example 3: Deployment actions • Chicken and egg – DFW depends on application deployment system to determine mapping to servers. At deployment time, a ipset gets modified to immediately whitelist the traffic. DFW cleans this ipset IPTABLES RULES REFERENCE TYPICALLY EMPTY IPSETS, EXPECTING HUMAN INPUT.
  40. 40. References: • Altair Network Design: https://www.slideshare.net/shawnzandi/linkedin-openfabric-project- interop-2017 • Eng blog post on Altair: https://engineering.linkedin.com/blog/2016/03/project-altair--the- evolution-of-linkedins-data-center-network • Programmable Data Center: https://engineering.linkedin.com/blog/2017/03/linkedin_s-approach- to-a-self-defined-programmable-data-center • Facebook’s Spine and Leaf: https://code.facebook.com/posts/360346274145943/introducing- data-center-fabric-the-next-generation-facebook-data-center-network/ • Facebook’s Spine and Leaf: https://www.youtube.com/watch?v=mLEawo6OzFM • Milton from Office Space: http://www.imdb.com/title/tt0151804/
  41. 41. Q/A session Production ready implementation / Demo the technology: Zener: https://www.zener.io/lisa17 BOF: Distributed, Software Defined Security in the Modern Data Center Thursday, November 2, 9:00 pm–10:00 pm, Marina Room LinkedIn: https://www.linkedin.com/in/mikesvoboda

×