Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Performance Aware SDNBay Area NetworkVirtualization Meetup PhaalInMon Corp.May 2013
  2. 2. Why monitor performance?“If you can’t measure it, you can’t improve it”Lord Kelvin
  3. 3. TimeCapacityDemandStatic provisioning$ Unused capacity$$$ Service failure$$ Unused capacity
  4. 4. $$ SavingsTimeCapacityDemandDynamic provisioning
  5. 5. Feedback controlMeasureControlSystemdesiredoutputmeasuredoutput
  6. 6. Controllability and ObservabilityBasic concept is simple, a stable feedback control system requires:1. ability to influence all important system states (controllable)2. ability to monitor all important system states (observable)
  7. 7. It’s hard to stay on the road if you can’t see theroad, or keep to the speed limit without aspeedometerIt’s hard to stay on the road or maintainspeed if your brakes, engine or steering failControllability and Observability driving exampleObservabilityControllabilityStates location, speed, direction, ...
  8. 8. Effect of delay on stabilityMeasurement delay Planning delayTimeConfiguration delayDisturbance Response delayEffectLoop delayDDoS launched Identify target, attacker Black hole, mark, re-route? Switch CLI commands Route propagation Traffic droppedComponents of loop delaye.g. Slow reaction time causestired / drunk / distracteddriver to weave, very slowreaction time and they leavethe road
  9. 9. What is sFlow?“In God we trust. All others bring data.”Dr. Edwards Deming
  10. 10. Industry standard measurement technology integrated in switches
  11. 11. Open source agents for hosts, hypervisors and applicationsHost sFlow project ( is centerof an ecosystem of related open source projects embeddingsFlow in popular operating systems and applications
  12. 12. Network (maintained by hardware in network devices)- MIB-2 ifTable: ifInOctets, ifInUcastPkts, ifInMulticastPkts, ifInBroadcastPkts, ifInDiscards, ifInErrors, ifUnkownProtos,ifOutOctets, ifOutUcastPkts, ifOutMulticastPkts, ifOutBroadcastPkts, ifOutDiscards, ifOutErrorsHost (maintained by operating system kernel)- CPU: load_one, load_five, load_fifteen, proc_run, proc_total, cpu_num, cpu_speed, uptime, cpu_user, cpu_nice,cpu_system, cpu_idle, cpu_wio, cpu_intr, cpu_sintr, interupts, contexts- Memory: mem_total, mem_free, mem_shared, mem_buffers, mem_cached, swap_total, swap_free, page_in, page_out,swap_in, swap_out- Disk IO: disk_total, disk_free, part_max_used, reads, bytes_read, read_time, writes, bytes_written, write_time- Network IO: bytes_in, packets_in, errs_in, drops_in, bytes_out, packet_out, errs_out, drops_outApplication (maintained by application)- HTTP: method_option_count, method_get_count, method_head_count, method_post_count, method_put_count,method_delete_count, method_trace_count, method_connect_count, method_other_count, status_1xx_count,status_2xx_count, status_3xx_count, status_4xx_count, status_5xx_count, status_other_count- Memcache: cmd_set, cmd_touch, cmd_flush, get_hits, get_misses, delete_hits, delete_misses, incr_hits, incr_misses,decr_hists, decr_misses, cas_hits, cas_misses, cas_badval, auth_cmds, auth_errors, threads, con_yields,listen_disabled_num, curr_connections, rejected_connections, total_connections, connection_structures, evictions,reclaimed, curr_items, total_items, bytes_read, bytes_written, bytes, limit_maxbytesStandard counters
  13. 13. Simple- standard structures - densely packed blocks of counters- extensible (tag, length, value)- RFC 1832: XDR encoded (big endian, quad-aligned, binary) - simple to encode/decode- unicast UDP transportMinimal configuration- collector address- polling intervalCloud friendly- flat, two tier architecture: many embedded agents → central “smart” collector- sFlow agents automatically start sending metrics on startup, automatically discovered- eliminates complexity of maintaining polling daemons (and associated configurations)Scaleable push protocol
  14. 14. • Counters tell you there is aproblem, but not why.• Counters summarizeperformance by dropping highcardinality attributes:- IP addresses- URLs- Memcache keys• Need to be able to efficientlydisaggregate counter byattributes in order tounderstand root cause ofperformance problems.• How do you get this datawhen there are millions oftransactions per second?Counters aren’t enoughWhy the spike in traffic?(100Gbit link carrying 14,000,000 packets/second)
  15. 15. • Random sampling is lightweight• Critical path roughly cost ofmaintaining one counter:if(--skip == 0) sample();• Sampling is easy to distributeamong modules, threads,processes without anysynchronization• Minimal resources required tocapture attributes of sampledtransactions• Easily identify top keys,connections, clients, servers,URLs etc.• Unbiased results with knownaccuracyBreak out traffic by client, server and port(graph based on samples from100Gbit link carrying 14,000,000 packets/second)sFlow also exports random samples
  16. 16. Integrated data modelPacket HeaderPacket HeaderSource DestinationTCP/UDP Socket TCP/UDP SocketMAC Address MAC AddressSampled Packet HeadersI/F CountersPower, Temp.NETWORKHOSTCPUMemoryI/OPower, Temp.Adapter MACsAPPLICATIONSampled TransactionsTransaction CountersTCP/UDP SocketIndependent agents sFlow analyzer joins data for integrated view
  17. 17. Virtual ServersApplicationsApache/PHPTomcat/JavaMemcachedVirtual NetworkServersNetworkEmbedded monitoring of allswitches, all servers, allapplications, all the timeConsistent measurementsshared between multiplemanagement toolsComprehensive visibility
  18. 18. Software Defined Networking“You can’t control what you can’t measure”Tom DeMarco
  19. 19. MonitorFeedback control loop with sFlow and OpenFlowlow configuration delaylow measurement delayTogether, sFlow and OpenFlow provide the observability andcontrollability to enable SDN applications targeting low latencycontrol problems like load balancing and DDoS mitigationlow planning delaySDN application
  20. 20. packetsdecode hash sendflow cache flushsampleNetFlow/IPFIXsendpolli/f counterssample• sFlow exports packet samples immediately• sFlow also exports interface counters• NetFlow exports flow data on end of flow, active-timeout or inactive-timeout• NetFlow data generation requires significant resources on switch that canbe better applied to increase size of forwarding table(s)• OpenFlow metering has similar architecture to NetFlow and similarlimitationssFlow and NetFlow/IPFIX in a switch
  21. 21. InMon sFlow-RTactive timeout active timeoutNetFlowOpenvSwitchSolarWinds Real-Time NetFlow Analyzer• sFlow does not use flow cache, so realtime charts more accurately reflect traffic trend• NetFlow spikes caused by flow cache active-timeout for long running connectionsRapid detection of large flowsFlow cache active timeout delays large flow detection,limits value of signal for real-time control applications
  22. 22. Network OSApplicationOpen APIsApplicationApplicationData PlaneControl PlaneConfiguration Forwarding VisibilityNETCONF/OF-ConfigOpen APIsHostssFlow adds actionable visibility to SDN stackActionable = complete + timely
  23. 23. REST APIMetricsFlow DefinitionsThresholdsInMonsFlow-RTREST APIOpenFlowControllerLoad Balancer DDoS ProtectionREST ApplicationsOpen “Southbound” APIsData PlaneControl PlaneHostsOpen “Northbound” APIsSDN ApplicationsSDN feedback control applications
  24. 24. ovs-vsctl set-controller br0 tcp: — –id=@sflow create sflow agent=eth0 target=”” sampling=1000 polling=20 — set bridge br0 sflow=@sflowConnect switches to central control planee.g connect Open vSwitch to OpenFlow controllere.g. connect Open vSwitch to sFlow analyzerMinimal configuration to connect switches tocontrollers, intelligence resides in external software
  25. 25. • DDoS mitigation• Load balancing large flows• Optimizing virtual networks• Packet brokersPerformance aware SDN application examplesEmerging opportunity for SDN applications to leverageembedded instrumentation and control capabilities and deliverscaleable performance management solutionsMany more use cases, particularly if you broaden thescope to the SDDC (software defined data center)
  26. 26. Components of a DDoS flood attack1. Command to attack target sent overcontrol network2. Large number of compromised hostsstart sending traffic to target3.Traffic converges on access link,overwhelming capacity and denyingaccess
  27. 27. thresholdattack startsdetectedcontrol implemented attack eliminated Case 1: DDoS mitigationpackets/secondpackets/secondsustained 6M packets/second attack(30 Gigabits/second)
  28. 28. ECMP/LAG multi-path traffic distribution = hash(packet fields) % linkgroup.sizeselected_link = linkgroup[index]Hash collisions reduce effective cross sectional bandwidth1:1 subscription ratio doesn’t eliminate blocking, collisionprobabilities are high, even with large numbers of paths
  29. 29. Birthday ParadoxWhat is the chance that at least two people in a room will share a birthday?50/50 chance with 23 people, virtual certainty with the 90 people in this room.This is a “paradox” because the probability seems remarkably high consideringthat there are 365 possible birthdays (366 if you include Feb 29) and 23 peoplerepresents just over 6% of the theoretical maximum and 90 people is only 25%. collision probabilities are surprisingly high
  30. 30. number of long lived large flows responsible for bulk of load SDNcontroller todetect andeliminatecollisions byadjustingforwarding pathsUse Case 2: Load balancing large flowsNot just ECMP, also LAG/MLAG,Wireless and WAN etc.
  31. 31. Network virtualization networkof tunnels used tocarry inter-hypervisor trafficacross physicalnetwork, GRE,NVGRE,VxLANetc.Network topology hidden behind APIs, not just Nicira/VMware,but OpenStack Quantum etc.
  32. 32. VMToVM FromFWLBaa bb ccddVirtual networkpacket pathsLack of topologyawareness resultsin randomplacement ofVMsTraffic matrix on physicalnetwork appears randomRandom traffic patternsappear to need a completely flat physicalnetwork topology, i.e. non-blocking betweenall node pairs (fat tree, CLOS)- expensive (cost, power, space)- limited scaleability- limited flexibility- not easily achieved in practice (large flows)
  33. 33. VMToVM FromLargesttenantLargest tenantUse Case 3: Network awareVM placementVM2 VM1VM1 VM2SDN providesnetwork topologyand load informationthat allowsVMs to beoptimally placedResulting sparse, highly structured trafficmatrix efficiently maps into physicalresources, allows SDN controller todeliver predictable performance andworkload isolation of OpenFlow to opticalcircuit switches allows network to berewired for actual demand
  34. 34. Traffic is sparse for each tenantTraffic within each tenant’s virtual network is similarly sparse, e.g.Hadoop above, or scale out web, cache, storage clusters
  35. 35. Use Case 4: Packet brokerONS 2013: DEMon Software Defined Distributed Ethernet Monitoring System, Rich Groves, Microsoft• Offloading basic traffic monitoring to sFlow takes pressure off capture network• Visibility into traffic volumes before triggering capture• Trigger capture based on non OpenFlow 12 tuple fields (e.g. tenant IP, VNI etc)• Trigger on very large match lists (lists of compromised hosts etc.)
  36. 36. Testbed setup
  37. 37. wget -xvzf sflow-rt.tar.gzcd sflow-rtJava 1.6+Python+ Requests library, and install sFlow-RT
  38. 38. PoolDemo data from small test lab10.0.0.30Hyper-VVMs:,,,, -,,, HTTP, Memcached, PHP, JavavSwitches: Open vSwitch, Hyper-V extensible vSwitchOther sFlow sources10.0.0.253
  39. 39. sFlow-RT REST API commands
  40. 40. /metric/;,min:load_one/json?os_name=linux,windows&cpu_num=2scopefunction values type filter/metric/ typevaluedatasourceMetricsSingle metric:Metric query:scope ALL or semicolon delimited list (unordered)values comma delimited list (ordered) with optional prefixmax:, min:, sum:, avg:, var:, sdev:, med:, q1:, q2:, q3:, iqr: or any:filter select metrics based on attribute values
  41. 41. Defining flow metricsKeysValueFilterframes bytes durationavg:bytescount:ipsourceipsource,ipdestination,tcpsourceport,tcpdestinationporttcpdestinationport=80,8080 & destinationgroup=internalmask:ipsource:24Namemetric name for results, i.e. /metric/ALL/name/jsonipsource.1 tunneled addressmask address (e.g. result = of distinct source addressesaverage packet sizeuuidsrc UUID associated with flow sourcehostnamesrc ~ .*vm.* | sourcegroup != external
  42. 42. Create, Read, Update, Delete, List (CRUDL)Create (HTTP PUT/POST)Read (HTTP GET)Update (HTTP PUT)Delete (HTTP DELETE)curl -H "Content-Type:application/json" -X PUT --data "{keys:ipsource,value:bytes}" "http://localhost:8008/flow/src/json"curl "http://localhost:8008/flow/src/json"{"keys": "ipsource","n": 5,"value": "bytes"}curl -H "Content-Type:application/json" -X PUT --data "{keys:macsource,value:frames}" "http://localhost:8008/flow/src/json"curl -X DELETE "http://localhost:8008/flow/src/json"curl --data "name=src&keys=ipsource&value=bytes" -X POST "http://localhost:8008/flow/html"List (HTTP GET)curl "http://localhost:8008/flow/json"
  43. 43. Command examples
  44. 44. Use browser for exploration
  45. 45. Use browser for exploration
  46. 46. Use browser for exploration
  47. 47. import requestseventurl = http://localhost:8008/events/json?maxEvents=10&timeout=60eventID = -1while 1 == 1:r = requests.get(eventurl + "&eventID=" + str(eventID))if r.status_code != 200: breakevents = r.json()if len(events) == 0: continueeventID = events[0]["eventID"]events.reverse()for e in events:print str(e[eventID]) + , + str(e[timestamp]) + , +e[thresholdID] + , + e[metric] + , + str(e[threshold]) + ,+ str(e[value]) + , + e[agent] + , + e[dataSource]Tail events using HTTP “long” pollingextras/
  48. 48. Define flow keysDDoS Protectiondefine address groupsdefine flowsdefine thresholdswhile(running) {receive threshold eventmonitor flowdeploy controlmonitor flowrelease control}OpenFlowControllerREST APIsFlow-RTREST API12346587REST operation flow chart
  49. 49. Large flow detection script (initialization)import requestsimport jsonrt = http://localhost:8008groups = {external:[],internal:[]}flows = {keys:ipsource,ipdestination,value:frames,filter:sourcegroup=external&destinationgroup=internal}threshold = {metric:ddos,value:400}r = requests.put(rt + /group/json,data=json.dumps(groups))r = requests.put(rt + /flow/ddos/json,data=json.dumps(flows))r = requests.put(rt + /threshold/ddos/json,data=json.dumps(threshold))...extras/
  50. 50. Large flow detection script (monitor events)...eventurl = rt + /events/json?maxEvents=10&timeout=60eventID = -1while 1 == 1:r = requests.get(eventurl + "&eventID=" + str(eventID))if r.status_code != 200: breakevents = r.json()if len(events) == 0: continueeventID = events[0]["eventID"]events.reverse()for e in events:thresholdID = e[thresholdID]if "ddos" == thresholdID:r = requests.get(rt + /metric/ + e[agent] + / + e[dataSource] + .+ e[metric] + /json)metrics = r.json()if len(metrics) > 0:evtMetric = metrics[0]evtKeys = evtMetric.get(topKeys,None)if(evtKeys and len(evtKeys) > 0):topKey = evtKeys[0]key = topKey.get(key, None)value = topKey.get(value,None)print e[metric] + "," + e[agent] + , + key + , + str(value)
  51. 51. Next StepsBuild your own test bed:1. sFlow-RT is already installed on your laptop, capable of monitoring thousands ofswitches (remember to turn off demo.pcap and enable UDP port 6343 on your firewall)2. Enable sFlow in your network (OVS, Hyper-V, physical switches, Install Host sFlow agents + application agents:Apache,NGINX,Apache, HAProxy etc. with the broader sFlow community: don’t have to have access to a physical test lab, build a Mininet / Open vSwitch virtual testlab, e.g. out more about sFlow:
  52. 52. Questions?