OAM:
Application-driven
Evolution
Yaakov (J) Stein
Chief Scientist
RAD Data Communications




                          OAM: Application-driven Evolution Slide 1
Why do we pay for services ?

 Generally good (and frequently much better than toll quality)
 voice service is available free of charge (Skype, Fring, Nimbuzz…)
So why does anyone pay for voice services ?
Similarly, one can get free
• (WiFi) Internet access
• Email boxes
• File storage and sharing
• Web hosting
• Software services

So why pay ?


                                                          OAM: Application-driven Evolution Slide 2
Paying for QoS

The simple answer is that one doesn’t pay for the
service, one pays for Quality of Service guarantees
In our voice model:
         price
                                 toll quality
                 with mobility




                                                QoS
                          BE
But what does QoS mean and why are we willing to pay
for it ?
To explain, we need to review some history
                                                      OAM: Application-driven Evolution Slide 3
Father of the telephone

Everyone knows that the father of the telephone was
Alexander Graham Bell (along with his assistant Mr.
Watson)
But Bell did not invent the telephone network
Bell and Watson sold pairs of phones to customers


The father of the telephone network was Theodore Vail




                                                OAM: Application-driven Evolution Slide 4
Theodore Vail
Theodore Who?
 • Son of Alfred Vail (Morse’s coworker)
 • Ex-General Superintendent of US Railway Mail Service
 • First general manager of Bell Telephone
 • Father of the PSTN
Why is he so important?
 • Organized PSTN
 • Established principle of reinvestment in R&D
 • Established Bell Telephones IPR division
 • Executed merger with Western Union to form AT&T
 • Solved the main technological problems
       Use of copper wire
       Use of twisted pairs
 • Organized telephony as a service (like the postal service!)
  Vailism is the philosophy that public services should be run as
        closed centralized monopolies for the public good
                                                            OAM: Application-driven Evolution Slide 5
What’s the difference ?

In the Bell-Watson model the customer pays
once, but is responsible for :
• Installation (wires, wiring)
• Operations (power, fault repair, performance
                                                 +
    – distortion and noise)
• Infrastructure maintenance


While the Bell company is responsible only for
providing functioning telephones, In the Vail
model the customer pays a monthly fee but the
provider assumes responsibility for everything
including fault repair and performance
maintenance
  The telephone company owns the telephone
  sets and even the wires in the walls !
                                                 OAM: Application-driven Evolution Slide 6
Service Level Agreements

• In order to justify recurring payments the provider agrees to a
  minimum level of service in an SLA
• SLAs should capture Quality of user Experience (QoE) but this is
  often hard to quantify
• So SLAs usually actually detail measurable network parameters that
  influence QoE, such as :
        Availability (e.g., the famous five nines)
        Time to repair (e.g., the famous 50 ms)
        Information rate (throughput)
        Information latency (delay)
        Allowable defect densities (noise/distortion)
• Availability (basic connectivity) always influences QoE
• It is hard to predict the effect of the other parameters on QoE even
  when there is only one application (e.g., voice)
• When multiple applications are in use – it may be impossible
                                                             OAM: Application-driven Evolution Slide 7
Some Applications
• System traffic
   – Routing protocols, DNS, DHCP, time delivery, system update, OAM,
     tunneling and VPN setup
• Business processes
   – Database access, backup and data-center, B2B, ERP
• Communications – interactive
   – Voice, video conferencing, Telepresence, instant messaging, remote
     desktop, application sharing
• Communications – non-interactive
   – Email, broadcast programming, music
   – Video : progressive download, live streaming, interactive
• Information gathering
   – Http(s), Web 2.0, file transfer
• Recreational
   – Gaming, p2p file transfer
• Malicious
   – DoS, malware injection, illicit information retrieval
                                                                 OAM: Application-driven Evolution Slide 8
What do applications need ?

• Some applications only require availability
• Some also require minimum available
  throughput
• Some require delay less then some end-
  end (or Real Time) delay
• Some require packet loss ratio (PLR) less
  than some percentage
   – And these parameters are not necessarily
     independent
For example, TCP throughput drops with                  1000 B packets
                                                        50 ms RTT
PLR



                                                OAM: Application-driven Evolution Slide 9
Some rules of thumb

• Mission Critical (and life critical) applications require high
  availability
• If there are any MC applications then system traffic requires high
  availability too
• MC applications do not necessarily require strict throughput but
  always indirectly require
   – A certain minimal average throughput
   – Bounded delay
• If the MC application uses TCP then it requires low PLR
• Real-time applications require sufficient throughput
   – But not necessarily low PLR (audio and video codecs have PLC)
• Interactive applications require low RT delay
   – It may be more scalable for a SP to measure 1-way delays



                                                                OAM: Application-driven Evolution Slide 10
Monitoring an SLA
• The Service Provider’s justification for payment is the
  maintenance of an SLA
• To ensure SLA compliance, the SP must :
   – Monitor the SLA parameters
   – Take action if parameter is dropping below compliance levels
But how does the SP verify/ensure that the SLA is being met ?
• Monitoring is carried out using Operations, Administration,
  Maintenance (OAM)
• The customer too may use OAM to see that the SP is compliant !
Technical note: OAM is a user-plane function but may influence control
and management plane operations, for example:
   – OAM may trigger protection switching, but doesn’t switch
   – OAM may detect provisioned links, but doesn’t provision them


                                                                    OAM: Application-driven Evolution Slide 11
Operations, Administration,
Maintenance
Traditionally, one distinguishes between 2 OAM functionalities :
1. Fault Monitoring: Required for maintenance of basic connectivity
   (availability)
   •   OAM runs continuously/periodically at required rate
   •   Detection and reporting of anomalies, defects, and failures
   •   Used to trigger mechanisms in the
              Control plane (e.g. protection switching) and
              Management plane (alarms)
1. Performance Monitoring: Required for maintenance of all other
   QoE attributes
   •   OAM run :
              Before enabling a service
              On-demand or
              Per schedule
   •   Measurement of performance criteria (delay, PDV, etc.)
                                                              OAM: Application-driven Evolution Slide 12
Early OAM

• Analog channels and 64 kbps digital channels did not have
  mechanisms to check signal validity and quality, thus:
  – Major faults could go undetected for long periods of time
  – Hard to characterize and localize faults when reported
  – Minor defects might be unnoticed indefinitely
• As PDH networks evolved, more and more OAM was
  added on:
  –   Monitoring for valid signal
  –   Loopbacks
  –   Defect reporting
  –   Alarm indication/inhibition
• The OAM overhead started to explode in size !
• When SONET/SDH was designed bounded overhead was
  reserved for OAM functions
                                                          OAM: Application-driven Evolution Slide 13
OAM for Packet Switched
Networks

• OAM is more complex for Packet Switched Networks
• In addition to the previous defects :
  – Loss of signal
  – Bit errors
• We have new defect types:
  – Packets may be lost
  – Packets may be delayed
  – Packets may delivered to the wrong destination
• The first PSN-like network to acquire OAM was ATM
  (I.610)
  – Although technically ATM is cell-based, not packet-based




                                                               OAM: Application-driven Evolution Slide 14
What about Ethernet ?

• Carrier Ethernet has replaced ATM as the default layer 2
• Ethernet is by far the most widespread network interface
  – Ethernet has some advantages as compared to ATM
  – It has network-wide unique addresses
  – It has a source address in every packet
• But some aspects make Ethernet OAM more difficult
  –   Connection-Less (CL)
  –   Multipoint to multipoint
  –   Overlapping layering – need OAM for operator, SPs, customer
  –   Some specific problematic ETH behaviors (flooding, multicast …)




                                                            OAM: Application-driven Evolution Slide 15
What’s the problem with CL ?

• OAM makes a lot of sense in Connection Oriented
  environments
  – Connections last a relatively long amount of time
  – There is some SLA at the connection level
• For CL networks, the network path is neither known nor
  pinned, so it doesn’t really make sense to talk about FM
  – What does continuity mean if when a link goes down, the
    network automatically reroutes around the failure ?
• The Ethernet CL problem is solved by overlaying CO
  functionality :
  – Flows, or
  – EVCs



                                                        OAM: Application-driven Evolution Slide 16
Ethernet OAM

For many years there was no OAM for Ethernet (LANs don’t need
OAM), now there are two incompatible ones!
• Link layer OAM – 802.3 clause 57 (EFM OAM, 802.3ah)
   – Single link only
   – Slow protocol, limited functionality
   – Some management functions
• Service OAM – Y.1731, 802.1ag (CFM)
   – Any network configuration
   – Multilevel OAM functionality
• In some cases one may need to run both, while in others only service OAM
  makes sense
• Link layer OAM is only for a single link, which is necessarily CO
• Service OAM is most frequently used for infrastructure networks,
  which are also CO
                                                               OAM: Application-driven Evolution Slide 17
MEPs and MIPs




                OAM: Application-driven Evolution Slide 18
What about MPLS ?

• The other L2 used today is MPLS
• OAM mechanisms that work well for Ethernet can not be used
  as-is for MPLS. This is because :
   –   MPLS does not use absolute addresses
   –   MPLS packets do not carry source addresses
   –   when using LDP MPLS is not pure CO
   –   LSPs are unidirectional entities
• The IETF has defined LSP ping that provides basic OAM
   – Continuity
   – Trace route
• The ITU defined Y.1711, but it has not seen widespread use
• The MPLS community is now working on MPLS-TP which is
  basically MPLS + strong OAM (FM + PM)
   – And functionalities dependent on OAM, such as protection
     switching
                                                                OAM: Application-driven Evolution Slide 19
What about IP ?

• It makes sense to monitor IP (IPv4/IPv6) performance as
  well
   – IP is the most popular end-to-end protocol
   – IP connectivity can be purchased (although perhaps not
     widely with SLAs)
• But from the OAM point of view, IP is the hardest of all
   – the IP protocol suite does not define anything beneath L3
   – IP is always pure Connection-Less
• In certain cases it may make more sense to jump directly
  to application flows



                                                           OAM: Application-driven Evolution Slide 20
IP OAM

• For IP, one usually talks about OAM between end-points

• The IETF defines an all-purpose OAM + control protocol :
  – ICMP (Internet Control Message Protocol)
• A protocol for FM :
  – BFD (Bidirectional Forwarding Detection)
• And two sophisticated protocols for PM :
  – OWAMP (One Way Active Measurement Protocol)
  – TWAMP (Two Way Active Measurement Protocol)
OWAMP and TWAMP are the only OAM protocols
with full security features !




                                                      OAM: Application-driven Evolution Slide 21
Summary

• It is advantageous to run networks as provided services
• Service provider income depends on SLA compliance
• SLA compliance requires OAM – FM and PM
• OAM protocols now exist for all relevant technologies :
  –   TDM – SDH
  –   Ethernet
  –   MPLS
  –   IP
• Ethernet is leading in OAM functionality, but MPLS-TP is
  rapidly catching up
• IP can not have FM tools as robust as Ethernet/MPLS, but
  already has more sophisticated PM ones

                                                      OAM: Application-driven Evolution Slide 22
Thank You
For Your
Attention

                 www.rad.com

            OAM: Application-driven Evolution Slide 23

Ethernet OAM evolution

  • 1.
    OAM: Application-driven Evolution Yaakov (J) Stein ChiefScientist RAD Data Communications OAM: Application-driven Evolution Slide 1
  • 2.
    Why do wepay for services ? Generally good (and frequently much better than toll quality) voice service is available free of charge (Skype, Fring, Nimbuzz…) So why does anyone pay for voice services ? Similarly, one can get free • (WiFi) Internet access • Email boxes • File storage and sharing • Web hosting • Software services So why pay ? OAM: Application-driven Evolution Slide 2
  • 3.
    Paying for QoS Thesimple answer is that one doesn’t pay for the service, one pays for Quality of Service guarantees In our voice model: price toll quality with mobility QoS BE But what does QoS mean and why are we willing to pay for it ? To explain, we need to review some history OAM: Application-driven Evolution Slide 3
  • 4.
    Father of thetelephone Everyone knows that the father of the telephone was Alexander Graham Bell (along with his assistant Mr. Watson) But Bell did not invent the telephone network Bell and Watson sold pairs of phones to customers The father of the telephone network was Theodore Vail OAM: Application-driven Evolution Slide 4
  • 5.
    Theodore Vail Theodore Who? • Son of Alfred Vail (Morse’s coworker) • Ex-General Superintendent of US Railway Mail Service • First general manager of Bell Telephone • Father of the PSTN Why is he so important? • Organized PSTN • Established principle of reinvestment in R&D • Established Bell Telephones IPR division • Executed merger with Western Union to form AT&T • Solved the main technological problems Use of copper wire Use of twisted pairs • Organized telephony as a service (like the postal service!) Vailism is the philosophy that public services should be run as closed centralized monopolies for the public good OAM: Application-driven Evolution Slide 5
  • 6.
    What’s the difference? In the Bell-Watson model the customer pays once, but is responsible for : • Installation (wires, wiring) • Operations (power, fault repair, performance + – distortion and noise) • Infrastructure maintenance While the Bell company is responsible only for providing functioning telephones, In the Vail model the customer pays a monthly fee but the provider assumes responsibility for everything including fault repair and performance maintenance The telephone company owns the telephone sets and even the wires in the walls ! OAM: Application-driven Evolution Slide 6
  • 7.
    Service Level Agreements •In order to justify recurring payments the provider agrees to a minimum level of service in an SLA • SLAs should capture Quality of user Experience (QoE) but this is often hard to quantify • So SLAs usually actually detail measurable network parameters that influence QoE, such as : Availability (e.g., the famous five nines) Time to repair (e.g., the famous 50 ms) Information rate (throughput) Information latency (delay) Allowable defect densities (noise/distortion) • Availability (basic connectivity) always influences QoE • It is hard to predict the effect of the other parameters on QoE even when there is only one application (e.g., voice) • When multiple applications are in use – it may be impossible OAM: Application-driven Evolution Slide 7
  • 8.
    Some Applications • Systemtraffic – Routing protocols, DNS, DHCP, time delivery, system update, OAM, tunneling and VPN setup • Business processes – Database access, backup and data-center, B2B, ERP • Communications – interactive – Voice, video conferencing, Telepresence, instant messaging, remote desktop, application sharing • Communications – non-interactive – Email, broadcast programming, music – Video : progressive download, live streaming, interactive • Information gathering – Http(s), Web 2.0, file transfer • Recreational – Gaming, p2p file transfer • Malicious – DoS, malware injection, illicit information retrieval OAM: Application-driven Evolution Slide 8
  • 9.
    What do applicationsneed ? • Some applications only require availability • Some also require minimum available throughput • Some require delay less then some end- end (or Real Time) delay • Some require packet loss ratio (PLR) less than some percentage – And these parameters are not necessarily independent For example, TCP throughput drops with 1000 B packets 50 ms RTT PLR OAM: Application-driven Evolution Slide 9
  • 10.
    Some rules ofthumb • Mission Critical (and life critical) applications require high availability • If there are any MC applications then system traffic requires high availability too • MC applications do not necessarily require strict throughput but always indirectly require – A certain minimal average throughput – Bounded delay • If the MC application uses TCP then it requires low PLR • Real-time applications require sufficient throughput – But not necessarily low PLR (audio and video codecs have PLC) • Interactive applications require low RT delay – It may be more scalable for a SP to measure 1-way delays OAM: Application-driven Evolution Slide 10
  • 11.
    Monitoring an SLA •The Service Provider’s justification for payment is the maintenance of an SLA • To ensure SLA compliance, the SP must : – Monitor the SLA parameters – Take action if parameter is dropping below compliance levels But how does the SP verify/ensure that the SLA is being met ? • Monitoring is carried out using Operations, Administration, Maintenance (OAM) • The customer too may use OAM to see that the SP is compliant ! Technical note: OAM is a user-plane function but may influence control and management plane operations, for example: – OAM may trigger protection switching, but doesn’t switch – OAM may detect provisioned links, but doesn’t provision them OAM: Application-driven Evolution Slide 11
  • 12.
    Operations, Administration, Maintenance Traditionally, onedistinguishes between 2 OAM functionalities : 1. Fault Monitoring: Required for maintenance of basic connectivity (availability) • OAM runs continuously/periodically at required rate • Detection and reporting of anomalies, defects, and failures • Used to trigger mechanisms in the Control plane (e.g. protection switching) and Management plane (alarms) 1. Performance Monitoring: Required for maintenance of all other QoE attributes • OAM run : Before enabling a service On-demand or Per schedule • Measurement of performance criteria (delay, PDV, etc.) OAM: Application-driven Evolution Slide 12
  • 13.
    Early OAM • Analogchannels and 64 kbps digital channels did not have mechanisms to check signal validity and quality, thus: – Major faults could go undetected for long periods of time – Hard to characterize and localize faults when reported – Minor defects might be unnoticed indefinitely • As PDH networks evolved, more and more OAM was added on: – Monitoring for valid signal – Loopbacks – Defect reporting – Alarm indication/inhibition • The OAM overhead started to explode in size ! • When SONET/SDH was designed bounded overhead was reserved for OAM functions OAM: Application-driven Evolution Slide 13
  • 14.
    OAM for PacketSwitched Networks • OAM is more complex for Packet Switched Networks • In addition to the previous defects : – Loss of signal – Bit errors • We have new defect types: – Packets may be lost – Packets may be delayed – Packets may delivered to the wrong destination • The first PSN-like network to acquire OAM was ATM (I.610) – Although technically ATM is cell-based, not packet-based OAM: Application-driven Evolution Slide 14
  • 15.
    What about Ethernet? • Carrier Ethernet has replaced ATM as the default layer 2 • Ethernet is by far the most widespread network interface – Ethernet has some advantages as compared to ATM – It has network-wide unique addresses – It has a source address in every packet • But some aspects make Ethernet OAM more difficult – Connection-Less (CL) – Multipoint to multipoint – Overlapping layering – need OAM for operator, SPs, customer – Some specific problematic ETH behaviors (flooding, multicast …) OAM: Application-driven Evolution Slide 15
  • 16.
    What’s the problemwith CL ? • OAM makes a lot of sense in Connection Oriented environments – Connections last a relatively long amount of time – There is some SLA at the connection level • For CL networks, the network path is neither known nor pinned, so it doesn’t really make sense to talk about FM – What does continuity mean if when a link goes down, the network automatically reroutes around the failure ? • The Ethernet CL problem is solved by overlaying CO functionality : – Flows, or – EVCs OAM: Application-driven Evolution Slide 16
  • 17.
    Ethernet OAM For manyyears there was no OAM for Ethernet (LANs don’t need OAM), now there are two incompatible ones! • Link layer OAM – 802.3 clause 57 (EFM OAM, 802.3ah) – Single link only – Slow protocol, limited functionality – Some management functions • Service OAM – Y.1731, 802.1ag (CFM) – Any network configuration – Multilevel OAM functionality • In some cases one may need to run both, while in others only service OAM makes sense • Link layer OAM is only for a single link, which is necessarily CO • Service OAM is most frequently used for infrastructure networks, which are also CO OAM: Application-driven Evolution Slide 17
  • 18.
    MEPs and MIPs OAM: Application-driven Evolution Slide 18
  • 19.
    What about MPLS? • The other L2 used today is MPLS • OAM mechanisms that work well for Ethernet can not be used as-is for MPLS. This is because : – MPLS does not use absolute addresses – MPLS packets do not carry source addresses – when using LDP MPLS is not pure CO – LSPs are unidirectional entities • The IETF has defined LSP ping that provides basic OAM – Continuity – Trace route • The ITU defined Y.1711, but it has not seen widespread use • The MPLS community is now working on MPLS-TP which is basically MPLS + strong OAM (FM + PM) – And functionalities dependent on OAM, such as protection switching OAM: Application-driven Evolution Slide 19
  • 20.
    What about IP? • It makes sense to monitor IP (IPv4/IPv6) performance as well – IP is the most popular end-to-end protocol – IP connectivity can be purchased (although perhaps not widely with SLAs) • But from the OAM point of view, IP is the hardest of all – the IP protocol suite does not define anything beneath L3 – IP is always pure Connection-Less • In certain cases it may make more sense to jump directly to application flows OAM: Application-driven Evolution Slide 20
  • 21.
    IP OAM • ForIP, one usually talks about OAM between end-points • The IETF defines an all-purpose OAM + control protocol : – ICMP (Internet Control Message Protocol) • A protocol for FM : – BFD (Bidirectional Forwarding Detection) • And two sophisticated protocols for PM : – OWAMP (One Way Active Measurement Protocol) – TWAMP (Two Way Active Measurement Protocol) OWAMP and TWAMP are the only OAM protocols with full security features ! OAM: Application-driven Evolution Slide 21
  • 22.
    Summary • It isadvantageous to run networks as provided services • Service provider income depends on SLA compliance • SLA compliance requires OAM – FM and PM • OAM protocols now exist for all relevant technologies : – TDM – SDH – Ethernet – MPLS – IP • Ethernet is leading in OAM functionality, but MPLS-TP is rapidly catching up • IP can not have FM tools as robust as Ethernet/MPLS, but already has more sophisticated PM ones OAM: Application-driven Evolution Slide 22
  • 23.
    Thank You For Your Attention www.rad.com OAM: Application-driven Evolution Slide 23