Новый подход к построению ЦОД. Демонстрация MetaFabric

1,153 views

Published on

Презентация для доклада, сделанного в рамках конференции Juniper New Network Day 01.01.2014.

Докладчик -- Senior Network Engineer компании Juniper Networks Иван Лысогор.

Видеозапись этого доклада с онлайн-трансляции конференции вы можете увидеть здесь: http://www.youtube.com/watch?v=yBXWI8YyKss&hd=1

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,153
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
61
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Новый подход к построению ЦОД. Демонстрация MetaFabric

  1. 1. METAFABRIC ARCHITECTURE Ivan Lysogor Systems Engineer
  2. 2. 2 Copyright © 2013 Juniper Networks, Inc. INTRODUCING THE METAFABRIC ARCHITECTURE VM VM VM VirtualPhysical VM VM VM VirtualPhysical VM VM VM VM VM VM Virtual Virtual My on-premises data center My hosted service provider My managed service provider My cloud service provider VM VM VM VirtualPhysical VM VM VM VirtualPhysical SIMPLE. OPEN. SMART.
  3. 3. 3 Copyright © 2013 Juniper Networks, Inc. METAFABRIC ARCHITECTURE PILLARS Easy to deploy & use Save time, improve performance Maximize flexibility Simple SmartOpen
  4. 4. 4 Copyright © 2013 Juniper Networks, Inc. METAFABRIC ARCHITECTURE PORTFOLIO Flexible building blocks; simple switching fabricsSwitching Universal data center gatewaysRouting Smart automation and orchestration toolsManagement Simple and flexible SDN capabilitiesSDN Adaptive security to counter data center threatsData Center Security Reference architectures and professional servicesSolutions & Services
  5. 5. 5 Copyright © 2013 Juniper Networks, Inc. METAFABRIC REFERENCE ARCHITECTURE Validated and tested designs Version 1.0 – virtualized (VMware) Enterprise data center with key partners (IBM, EMC, F5) Reduce risk – accelerate customer adoption
  6. 6. 6 Copyright © 2013 Juniper Networks, Inc. Virtual Chassis Fabric Up to 20 members QFX5100 DEPLOYMENT OPTIONS Spine-Leaf … Virtual Chassis Up to 10 members QFabric Managed as a Single Switch Layer 3 Fabric L3 Fabric QFX5100 … Up to 128 members
  7. 7. 7 Copyright © 2013 Juniper Networks, Inc. QFX5100 PLATFORM Q4 2013 Q1 2014  1.5GHz Dual Core Intel Sandy Bridge X86 CPU  8GB Memory, 2x16GB SSD  Innovated Junos software architecture  Redundant, hot-swappable AC or DC power supply  Redundant, hot-swappable fan tray  AFI (FRU to port side) or AFO (Port to FRU side) airflow  Beacon LED, no LCD panel  L2/L3 line rate forwarding  10GbE/40GbE and FCoE  Feature-rich Junos, full L2/L3 protocol, MPLS 48 X 1/10GbE 6 x 40GbE 24 X 40GbE Slot 1 Slot 2 96 X 1/10GbE 8x40GbE 4 x 40GbE QSFP module
  8. 8. 8 Copyright © 2013 Juniper Networks, Inc. ADVANCED JUNOS SOFTWARE ARCHITECTURE Provides the foundation for advanced functions • ISSU (In-Service Software Upgrade) • Other Juniper applications for additional service in a single switch • Third-party application • Can bring up the system much faster Linux Kernel (Centos) Host NW Bridge KVM JunOS VM (Active) JunOS VM (Standby) 3rd Party Application Juniper Apps
  9. 9. 9 Copyright © 2013 Juniper Networks, Inc. ISSU (IN-SERVICE-SOFTWARE-UPGRADE) • Master Junos VM controls the hardware–PFE and FRU on the system • Master issues upgrade command • System launches a new Junos VM with new image as backup • All states are synchronized to the new backup Junos • Detach PFE from current master, then attach to backup Junos (hot move) • The PFE control component in new master will control the forwarding • Stop the new backup VM PFE Contro l Master/ Backup Election Other JUNOS process MASTER VM PFE Contro l Other JUNOS process Master/ Backup Election HOST OS OTHER HARDWAREPFE hardware Partition for PFE warm boot Backup VM Software Bridge
  10. 10. 10 Copyright © 2013 Juniper Networks, Inc. INSIGHT TECHNOLOGY Hotspot & microburst impacts application performance  Not visible with traditional counters  Network operation is blind folded Captures microburst events which exceed defined thresholds Adjustable sampling intervals Reports the microburst events instantaneously via  CLI  Syslog  Log file (human readable format)  Streaming (Java Script Object Notification, CSV, TSV formats) Time QueueDepthorQueueLatency Buffer Utilization Monitoring And Reporting High Threshold Low Threshold Microburst
  11. 11. 11 Copyright © 2013 Juniper Networks, Inc. UNIFIED FORWARDING TABLE • Flexibly allocate L2 MAC, L3 host and LPM (Longest Prefix Match) resources from a single pool • L3 host holds /32 IPv4 or /128 IPv6 routes • LPM table holds any routes not handled by L3 host table • Optimized forwarding table size based on deployment scenarios • Use system resource efficientlyUFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM L2 MAC LPML3 Host UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM L2 MAC LPML3 Host
  12. 12. 12 Copyright © 2013 Juniper Networks, Inc. UNIFIED FORWARDING TABLE UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM 288K (L2 MAC) 16K (LPM) 16K (L3 Host) UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM 160K (L2 MAC) 16K (LPM) 144K (L3 Host) UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM 224K (L2 MAC) 16K (LPM) 80K (L3 Host) UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM 96K (L2 MAC) 16K (LPM) 208K (L3 Host) UFT (Unified Forwarding Table) L2 MAC + L3 Host + LPM 32K (L2 MAC) 128K (LPM) 16K (L3 Host) Profile 1: l2-heavy-one Profile 3: l2-heavy-three (Default) Profile 2: l2-heavy-two Profile 4: l3-heavy Profile 5: LPM-heavy* *under test, may come after FRS
  13. 13. 13 Copyright © 2013 Juniper Networks, Inc. Simple Network Architecture  Zero-touch provisioning  Ops/event scripts  Python  Network Director API Network Automation  VMware  Puppet, Chef  OpenStack  CloudStack Data Center Automation AUTOMATION* *Not all features will be available at FRS
  14. 14. 14 Copyright © 2013 Juniper Networks, Inc. JUNOS ENHANCED AUTOMATION IMAGE  Junos Enhanced Automation image provides increased flexibility to our large Data Center customers  VeriExec disabled on Junos Flex enables customers to run unsigned binaries on QFX 5100  Ability to run Python/Ruby with custom Libraries like Collectd/Ganglia/Monit/etc  Puppet and Chef packaged with Junos Flex to help MSDCs automate configuration
  15. 15. 15 Copyright © 2013 Juniper Networks, Inc. VIRTUAL CHASSIS FABRIC
  16. 16. 16 Copyright © 2013 Juniper Networks, Inc. VCF ESSENTIALS 1RU, 48 SFP+ & 1 QIC Node #1 Node #16Node #3Node #2 Active Node #4 Backup  Single device to manage  Accessible from any member of fabric  In band Virtual Backplane to enable Junos LC-RE communications  Multi-path forwarding LogicalPhysical
  17. 17. 17 Copyright © 2013 Juniper Networks, Inc. VCF BUILDING BLOCKS EX4300 (1GE) QFX5100-24Q(40GE)QFX5100-48S(10GE) QFX5100-48S(10GE) QFX3500(10GE) QFX3600(40GE) VCF 10/40GE spine nodes VCF 1/10/40GE leaf nodes QFX5100-24Q(40GE)
  18. 18. 18 Copyright © 2013 Juniper Networks, Inc. VCF BUILDING BLOCKS - COMPATIBILITY MATRIX Scales to 20 members Platform VCF spine node VCF leaf node QFX5100-24Q ✓ ✓ QFX5100-48S ✓ ✓ QFX5100-96S ✓ ✓ QFX3500 ✗ ✓ QFX3600 ✗ ✓ EX4300 ✗ ✓
  19. 19. 19 Copyright © 2013 Juniper Networks, Inc. VCF SCALE All QFX5100 Mixed Spine QFX5100-24Q QFX5100-24Q QFX5100-48S Leaf QFX5100-48S QFX5100-24Q QFX5100-96S QFX5100-48S QFX5100-24Q QFX5100-96S QFX3500 & QFX3600 EX4300 EX4300 Scale QFX5100 Lowest Common Scale root@opus# set chassis forwarding-options ? Possible completions: l2-profile-one MAC: 288K L3-host: 16K LPM: 16K l2-profile-three MAC: 160K L3-host: 88K LPM: 16K l2-profile-two MAC: 224K L3-host: 56K LPM: 16K l3-profile MAC: 96K L3-host: 120K LPM: 16K lpm-profile MAC: 32K L3-host: 16K LPM: 128K L2 MAC 128K L3 Host 8k L3LPM 16K L3 Multicast4K IPv6 scale= IPv4 LPM/4 QFX3500/3600 Scale L2 MAC 64K L3 Host 32k L3LPM 16K L3 Multicast16K EX4300 Scale
  20. 20. 20 Copyright © 2013 Juniper Networks, Inc. DEPLOYMENT FLEXIBILITY 10G 1/10/40G 1G 10G 40G 10/40G spine nodes & 1/10/40G leaf nodes 10G POD 1/10/40G POD 1G POD Spine Node QFX5100-24Q QFX5100-24Q QFX5100-48S Leaf Node QFX5100-48S QFX5100-24Q QFX5100-96S QFX3500 & QFX3600 QFX5100-48S QFX5100-24Q QFX5100-96S QFX3500 & QFX3600 EX4300 EX4300 QFX5100-24Q QFX5100-24Q QFX5100-48S 1GE, 10GE & 40GE all in one fabric
  21. 21. 21 Copyright © 2013 Juniper Networks, Inc. OPERATIONAL SIMPLICITY - PLUG ‘N’ PLAY member 1 { role routing-engine; serial-number SER1ALNUM1; } member 2 { role routing-engine; serial-number SER1ALNUM2; } member 3 { role routing-engine; serial-number SERIALNUM3; } member 4 { role routing-engine; serial-number SERIALNUM4; } 1RU, 48 SFP+ & 1 QIC Non- Factory Default or 3rd PartySpine nodes & leaf nodes are auto provisioned Factory-default device will join the fabric Non factory-default device will not join the fabric Configuration and image synchronization
  22. 22. 22 Copyright © 2013 Juniper Networks, Inc. HA - RESILIENT CONTROL & DATA PLANE Active Hot- Backup Backup Control Plane Redundancy Quaternary RE (routing engine) redundancy Resilient In-Band Control plane GRES ,NSR, NSB uplink redundancy 1RU, 48 SFP+ & 1 QIC Data Plane Redundancy O VM VM VM vSwitch Virtual Server O VM VM VM vSwitch Virtual Server Server multi-homing Active-active uplink forwarding server multi-homing uplink redundancy Redundant Routing engines Backup
  23. 23. 23 Copyright © 2013 Juniper Networks, Inc. FORWARDING PLANE (SMART TRUNKS) Automatic fabric trunks • Fabric links automatically aggregated into trunks (LAGs) Fabric trunk types • Next Hop (NH)-trunks: from local to direct neighbors • Remote Destination (RD)-trunks: from local to a remote destination PFE Weights based path (instead of NH link) bandwidth ratio to avoid fabric congestion 1RU, 48 SFP+ & 1 QIC SW 5 SW 16 SW 1 SW 2 SW 4SW 3 L1 L2 L3 L4 L16
  24. 24. 24 Copyright © 2013 Juniper Networks, Inc. HA - HITLESS UPGRADE WITH ISSU Today Upgrade one rack/node at a time Applications run on half bandwidth Long maintenance window Upgrade multiple racks at a time Application run on full bandwidth Shorter maintenance window Does not require hardware redundancy Hitless upgrade using single switch VCF
  25. 25. 25 Copyright © 2013 Juniper Networks, Inc. O VM VM VM vSwitch Virtual Server O VM VM VM vSwitch Virtual Server Bare Metal 1RU, 48 SFP+ & 1 QIC Services GWWAN/Core VCF ARCHITECTURE PROVIDES  Predictable application performance  Deterministic latency  Resilient multi-path  High bi-sectional bandwidth  Smart leafs (local switching)  Network ports on spine switches  Mixed 1/10/40G fabric  Integrated control plane  Integrated RE  GRES/NSR/NSB  Plug-and-play fabric  Analytics on fabric ports
  26. 26. 26 Copyright © 2013 Juniper Networks, Inc. NG DC INTERCONNECT- EVPN
  27. 27. 27 Copyright © 2013 Juniper Networks, Inc. Scenario with VMTO enabled PRIVATE MPLS WAN PRIVATE MPLS WAN VLAN 10 VLAN 10 VLAN 10VLAN 10 Scenario without VMTO VM MOBILITY TRAFFIC OPTIMIZATION DC1 DC2 DC1 DC2
  28. 28. 28 Copyright © 2013 Juniper Networks, Inc. SRX VPLS DEPLOYMENT OPTIONS WITH MX – TODAY NAT FW LB IPSec SRX Switch MX Series NAT FW LB IPSec Switch MX Series MC-LAG NAT FW LB IPSec SRX Switch MX Series LAG VC LAG LAG IP, MPLS IP, MPLS IP, MPLS LAG LAG >1 VPLS devices VPLS controlled Active- Standby Per VLAN A A A ASS >1 VPLS devices MC-LAG controlled Active- Standby on LAN Per VLAN One VPLS device Active forwarding through all links of LAG LAG
  29. 29. 29 Copyright © 2013 Juniper Networks, Inc. DC 2 VLAN 10 10.10.10.100/24 DC 3 10.10.10.200/24 VLAN 10 Server 2 Server 3 Server 1 PRIVATE MPLS WAN DC 1 20.20.20.100/24 Active VRRP DG: 10.10.10.1 Standby VRRP DG: 10.10.10.1 Standby VRRP DG: 10.10.10.1 Standby VRRP DG: 10.10.10.1 DCI WITH VPLS AND VRRP Task: Server 3 in Data Center 3 needs to send packets to Server 1 in Data Center 1. Problem: Server 3’s active Default Gateway for VLAN 10 is in Data Center 2. Effect: 1. Traffic must travel via Layer 2 from Data Center 3 to Data Center 2 to reach VLAN 10’s active Default Gateway. 2. The packet must reach the Default Gateway in order to be routed towards Data Center 1. This results in duplicate traffic on WAN links and suboptimal routing – hence the “Egress Trombone Effect.” VLAN 20
  30. 30. 30 Copyright © 2013 Juniper Networks, Inc. EVPN provides standard-based VLAN Extension over a shared IP/MPLS network. http://datatracker.ietf.org/doc/draft-ietf-l2vpn- evpn/?include_text=1 EVPN REQUIREMENTS (ON TOP OF VPLS) All-Active Multi-Homing Better Control Over MAC Learning ARP/ND Flooding Minimization L3 Egress Traffic Forwarding Optimization L3 Ingress Traffic Forwarding Optimization All available paths should be used (CE-PE, PE-PE) MAC learning happens in control plane Proxy ARP support Usage of Default Gateway Extended Community Automatic advertisement of host routes into L3 VPN
  31. 31. 31 Copyright © 2013 Juniper Networks, Inc. DC 2 VLAN 10 10.10.10.100/24 DC 3 10.10.10.200/24 VLAN 10 Server 2 Server 3 Server 1 PRIVATE MPLS WAN DC 1 20.20.20.100/24 Active RVI DG: 10.10.10.1 Active RVI DG: 10.10.10.1 Active RVI DG: 10.10.10.1 Active RVI DG: 10.10.10.1 EVPN: NO EGRESS TROMBONE EFFECT Task: Server 3 in Datacenter 3 needs to send packets to Server 1 in Datacenter 1. Solution: Virtualize and distribute the Default Gateway so it is active on every router that participates in the VLAN. Effect: 1. Egress packets can be sent to any router on VLAN 10 allowing the routing to be done in the local datacenter. This eliminates the “Egress Trombone Effect” and creates the most optimal forwarding path for the Inter-DC traffic. VLAN 20
  32. 32. 32 Copyright © 2013 Juniper Networks, Inc. EVPN TEST TOPOLOGY EVPN
  33. 33. 33 Copyright © 2013 Juniper Networks, Inc. SUPPORTED CE-PE TOPOLOGY Do not try to configure MC-LAG on PEs Do not try to configure single LAG towards two PEs CE (qfabric) PE1 (MX240-3) MPLS PE2 (MX240-4) Supported CE-PE configPE1/PE2 config CE config
  34. 34. 34 Copyright © 2013 Juniper Networks, Inc. HOW TO PREVENT DUPLICATE COPIES ON MULTI- HOMED SEGMENTS? Designated Forwarder (DF) is elected for each EVI or entire Ethernet Segment. DF is responsible for forwarding of BUM traffic CE1 PE1 PE2 MPLS PE3 CE2 LAG
  35. 35. 35 Copyright © 2013 Juniper Networks, Inc. EVI LOAD BALANCING Per default ALL CE links will be actively used for traffic forwarding. Half of EVIs will have PE1 as DF and another half PE2 as DF. PE2 PE1
  36. 36. 36 Copyright © 2013 Juniper Networks, Inc. VM EGRESS TRAFFIC OPTIMIZATION EVPN advantages over VPLS: - No need for VRRP, Multi-homing VPLS, MC-LAG (less machinery and protocol dependencies) - IRB within EVPN VRF is configured on all PEs with a same IP address (copy&paste IRB config on all PEs) - Each PE has a mapping between Default GW IP and all PEs MACs - If VM moves from DC1 to DC2 it continue to use “old” MAC address from PE located in DC1. However, both PEs in DC2 forward traffic destined to this MAC locally. IRB MAC on MX240-4 IRB MAC on MX480-3 IRB MAC on MX480-4
  37. 37. 37 Copyright © 2013 Juniper Networks, Inc. EVPN ROUTE TYPE 2: MAC ADVERTISEMENT ROUTE If you need to decode pcaps with EVPN NLRIs then you could use dissector I put into Wireshark GIT repository: https://code.wireshark.org/review/#/c/296/
  38. 38. 38 Copyright © 2013 Juniper Networks, Inc. DC 2 VLAN 10 10.10.10.100/24 DC 3 10.10.10.200/24 VLAN 10 Server 2 Server 3 Server 1 PRIVATE MPLS WAN DC 1 20.20.20.100/24 WITHOUT VMTO: INGRESS TROMBONE EFFECT Task: Server 1 in Datacenter 1 needs to send packets to Server 3 in Datacenter 3. Problem: Datacenter 1’s edge router prefers the path to Datacenter 2 for the 10.10.10.0/24 subnet. It has no knowledge of individual host IPs. Effect: 1. Traffic from Server 1 is first routed across the WAN to Datacenter 2 due to a lower cost route for the 10.10.10.0/24 subnet. 2. Then the edge router in Datacenter 2 will send the packet via Layer 2 to Datacenter 3. 10.10.10.0/24 Cost 5 10.10.10.0/24 Cost 10 Route Mas k Cost Next Hop 10.10.10.0 24 5 Datacenter 2 10.10.10.0 24 10 Datacenter 3 DC 1’s Edge Router Table Without VMTO VLAN 20
  39. 39. 39 Copyright © 2013 Juniper Networks, Inc. DC 2 VLAN 10 10.10.10.100/24 DC 3 10.10.10.200/24 VLAN 10 VLAN 20 Server 2 Server 3 Server 1 PRIVATE MPLS WAN DC 1 20.20.20.100/24 WITH VMTO: NO INGRESS TROMBONE EFFECT Effect: 1. Ingress traffic destined for Server 3 is sent directly across the WAN from Datacenter 1 to Datacenter 3. This eliminates the “Ingress Trombone Effect” and creates the most optimal forwarding path for the Inter-DC traffic. Task: Server 1 in Datacenter 1 needs to send packets to Server 3 in Datacenter 3. Solution: In addition to sending a summary route of 10.10.10.0/24 the datacenter edge routers also send host routes which represent the location of local servers. 10.10.10.0/24 Cost 5 10.10.10.0/24 Cost 10 Route Mas k Cost Next Hop 10.10.10.0 24 5 Datacenter 2 10.10.10.0 24 10 Datacenter 3 10.10.10.10 0 32 5 Datacenter 2 10.10.10.20 0 32 5 Datacenter 3DC 1’s Edge Router Table WITH VMTO 10.10.10.100/32 Cost 5 10.10.10.200/32 Cost 5
  40. 40. 40 Copyright © 2013 Juniper Networks, Inc. REFERENCES MetaFabric Solution Brief: http://www.juniper.net/us/en/local/pdf/solutionbriefs/3510 495-en.pdf MetaFabric 1.0 Reference Architecture: http://www.juniper.net/us/en/local/pdf/reference- architectures/8030012-en.pdf MetaFabric 1.0 Design and Implementation Guide: http://www.juniper.net/us/en/local/pdf/design- guides/8020020-en.pdf

×