Performance routing Pfr

1,940 views
1,714 views

Published on

Performance routing Pfr
Wed 21st Nov 4:00pm - 4:40pm

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,940
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
73
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Transcript:
    So this is the agenda for today. We're going to look at what is PfR, and that consists of some of the benefits of PfR, why PfR all of a sudden is interesting. We're going to look at some examples of PfR deployment scenarios. Also going to look at how PfR works exactly. As you know, recognized PfR consists of multiple different technologies, and the interesting thing is how PfR can combine all these together. Also, we'll be looking at how to demonstrate PfR. This includes self-paced training, also if you want to actually show a customer performance routing [INAUDIBLE] or if you want to run a session for a customer where they can actually come in to Cisco premises and actually try performance routing out. In term of deployment scenarios, we do have some case studies as well. So we'll be looking at three different case studies, some recent ones. I've also left some recommendations on scaling and what kind of hardware we can use. And finally, I'll just end with a summary of news sources. As for the photos, please excuse the images of the queues of people, I have basically an analogy with performance routing, which will become really obvious and really apparent fairly quickly.
    Author's Original Notes:
  • Transcript:
    So what is PfR?
    Author's Original Notes:
  • Transcript:
    First off, it's important to have a look at what the problem might be that PfR is trying to resolve. So today, WAN performance is more important than it ever was before for branches. And I'll justify that later in another slide, exactly why that is. So I've found the question that comes to mind is, if the WAN performance is so important, how can we engineer performance applications. Meaning how can we ensure that applications can run reliably and fast and responsive so that people who are used to running applications on their desktop, if they're now using applications in a cloud with EPICenter how can they ensure that they still get good performance? Some other questions are what businesses are doing now that they may have redundant links. These may be sitting idle. The other worrying thing is it may have links that have faults-- degraded links, for example-- and these may be carrying the most important traffic. So fairly quickly it becomes obvious that some sort of application intelligence is needed. And what this could do, for example if we've some way of figuring out which is the important traffic, that could be either configured by the user or for the network devices to automatically figure that out. Also, we need to find a way of recognizing when there are problems in the network, either connectivity problems, or more subtle problems like certain amounts of packet loss or latency. And then also of course to find a way of directing traffic so that it can use the best link possible. And we can't control everything, but if we can control the most critical, important traffic, then that's something that people will experience straightaway when they're using their critical applications. And so having, just to start softly, you can imagine, for example, a branch with maybe a couple of links. They could be MPLS as a primary and maybe an internet back-up link. Or they could be two internet links, maybe from two service providers or the same service provider. Another analogy could be, let's say you get to an airport and you've got economy class and business class. And if the business class queue is empty, then why shouldn't the economy class customers be able to use the business-class queue? And for that kind of intelligence, you need somebody to actually see that the queue is empty, and to dispatch somebody to the right queue. And this kind of what performance routing does, but with application traffic. And in this case, the dispatcher, or the different intelligence is known as a Master Controller in PfR terminology.
    Author's Original Notes:
  • Transcript:
    Add improved reliability/redundancy as a benefit
    Greatly reduced WAN costs
    Use the most cost effective routes automatically while maintaining an SLA
    Improved performance - Media and Application aware routing
    Automatically routes voice and video quality over the best performing path – routes can be determined before the call is even made!
    Phased deployment is possible
    Enable learning mode on just one router (e.g. one branch)
    When satisfied, instruct the router to enforce policies
    Happy with the settings? Deploy on all routers
    So if we're interested in the application [INAUDIBLE], I'm going to be referring maybe a few times in the slides to this type of diagram there on the left-hand side. You can see there is a branch-- I've put a cloud-enabled branch here-- and on the right-hand side is the headquarters, so that ASR could be the WAN aggregation device, for example. And maybe there's two links from a service provider, or multiple service providers, just showing them as clouds in the center of the diagram. And in terms of the important traffic, maybe for this customer, Citrix and Oracle are important. Maybe they have Oracle database hook-ups and they're using Citrix, too. And the aim of performance routing is to be able to direct that traffic so that is goes over the correct path at the right time. So it's like a dynamic way of routing. So you can-- dynamics are corrected for problems in the network. By doing this, then perhaps three main benefits are achieved. Actually more than three, but let's just stick to these main ones for now. The first one is it improves performance. And by increasing the user experience it improves responsiveness, so, for example, there are no more delays due to intransmission if those are functioning. Also latency as well, for example. By meeting an application, we're reaching this specific key phase, so we can actually spot that there's a particular important bit of traffic, like the Oracle traffic, or Citrix traffic. Improved reliability, and by this I mean that we can do things that I really think LAN would not have been able to achieve. For example, we can take action on black holes. There's no manual interaction that's needed, so we can correct problems actually before the user even uses the application. How that is done is using active probes. And we'll talk about that a little bit more later, too. The other thing is it's extremely cost-effective. And if a customer is paying for multiple links, then they really can make the best use of multiple links with performance routing by using them simultaneously. And also with new technologies like 3G and 4G, those can be made use of as well. So some examples for a customer who may be trying to save costs would be passe to have lease lines when they're going to DSL or 3G.
    Author's Original Notes:
    The WAN cost saving with PfR could be extremely significant. It can (optionally) use an IP SLA type of capability to determine path statistics, hence the comment that routes can be determined before video calls are even made.
  • Transcript:
    Okay, so straightaway I think it's probably worth just showing what PfR consists of, just to make it clearer how it can work. As you see, it brings together a lot of existing technologies, many existing features. And some of these are unique innovations like applications with built-in control in a tiny Cisco office. Then as well as these existing bits of functionality, there are some new bits that are added onto in PfR. And those are primarily shown on the right-hand side, where it says Master Controller. As I've mentioned, Master Controller is like the brains, the entity that actually gets to decide which traffic gets sent well, but all of these features are on the left- and right-hand side. All of these features combine and are needed for PfR to work. And all of these features can run on one meter or across two meters.
    Author's Original Notes:
  • Transcript:
    Also, just so it's clear what we're talking about, what does this feature run on? It runs on the 1800 series upwards, does require the data license. On the ASR 1000, it doesn't need anything apart from the advanced IPO or the advanced enterprise license. But there's actually an opportunity to sell more licenses to customers. Any customer who's deploying this more than likely will need a security license, too. If they're using DMVPN, for example, if you make sure that their path is secure. Also ABC is another good license to offer to the customer. And you can offer this at the headquarters side, the WAN aggregation point. So that not only can they get good application performance, they can also actually monitor growth and consumption of applications that are in use. We don't really talk much about ABC in this presentation today. And here's just a summary of some of the licenses that are needed.
    Author's Original Notes:
  • Transcript:
    So, why now? Why is PfR so interesting now, when actually PfR has been around for quite a while? I think it's probably been around for maybe half a decade, actually. It was called OER, and so many people are familiar with it, especially people who've had to do Cisco labs and training courses. But the OER, the path is actually very different from the PfR that's available today. It's evolved quite a lot. Also, another key point is that we've actually now got very powerful hardware that now gives PfR much more advanced capabilities that OER did not have, like application identification. So these are some of the market transitions that are driving the need for performance routing. It's clear that as more and more applications are running in the cloud or in data centers then of course people will want the same experience that they're used you with their local applications. Also have BYOD, not just in the main campus, also for remote branches. Again, people are becoming aware that if there's more and more applications where maybe they just hate CTP, maybe you can just use a browser to access these applications, then why should they not be able to do that on their handheld devices? Of course, VDI-- probably don't need to go through all this in much detail-- I think you're really probably already aware-- but especially now that software capabilities have evolved for desktop virtualization, of course that's a great opportunity to save costs. So if a customer is doing that, then they can actually save costs further by deploying PfR, too. Video is interesting because-- of course everyone's familiar with it-- but at many points today, they may not have any enterprise video application. But nevertheless they're still extremely interested in this for the future. So, for example, some of the customers I've been talking to for PfR, they-- even if they do not have a video application today, they definitely want something that's going to work with video, too. And now that we have medianet technology, of course, then PfR compliments it very well, reassuring that there's good voice and video. And actually speaking of voice, it doesn't need-- for PfR to be effective, it does not need to be used in an enterprise where there's many voice calls going on. Even if there's just one simultaneous call from a branch at any point in time, then even that's sufficient. Because at the end of the day, if a customer is experiencing poor voice quality even on a single voice call, then that's a good business case for performance routing. And that can happen, depending on how much throughput there is at certain times of the day, especially with DSL.
    Author's Original Notes:
  • Transcript:
    So, why now? Some other examples of why now. It's because of the options that are now available for connectivity. So of course DSL's been around for a long time, but now it's becoming more important because it's a lot more reliable than it was, say, five years ago. We took a look at a study from 2007, and the high valuability is actually not very good for DSL back then. A US study found that there's a 98% uptime for a single DSL line. So therefore you needed two DSL connections for 3/9 availability. Or actually three DSL connections if you wanted 5/9. But those figures are extremely old, and so today with two DSL lines, that should offer much higher availability than in the past. And it's actually really easy to set up, just like a single leg or a regional PfR trial to establish what that high availability is for a particular region. Also for 3G of course, there's more than enough throughput for many branches. But with 4G, because of the much-simplified radio network, there's actually very few components between the-- very few devices between the handset and the internet service provider, unlike 3G. And that gives you much lower latency. In terms of how much costs can be saved, there are people actually currently working on a Return on Investment calculation. That will be used for providing customers with some figures. But, for example, leaving from lease lines to DSL, it could be a residential grade DSL-- that's what we're seeing some customers using-- and it may be a tenth of the cost. But the interesting thing is, people are not necessarily moving to DSL just because they can use PfR. They actually may have already moved to using DSL connections, and they may be already extremely unsatisfied with it. And so there's a real strong business need for solving this problem.
    Author's Original Notes:
  • Transcript:
    In terms of the typical customer requirements-- these are real customer requirements which we took from someone, and it actually doesn't even mention PfR anywhere, because these were our clients even before they'd heard of performance routing. So the customer may be spending a lot of money moving to their next-generation network, but they want to make sure that what they're doing will be cost-effective, and will provide them savings with their WAN coming through too. Also of course, like I mentioned before, they want to be able to get their best utilization from TDF or lines or 3G , 4G or fiber to the hand. Also, in terms of the applications that they're trying time make sure work well, there may only be one or two applications or there may be a lot more than that. It really just depends on the particular business. Customers want, like I mentioned, the ability to work with voice and video. And I think it's actually connecting with customers a lot, that because our devices can provide security, medianet, PfR, all of these features, these features are generally useful now. And they think they do meet their needs. Of course something that works with 3G, since it accesses NOSTG, it's also IFP NOSTG, too, and that can be used just to keep their WAN provider on their toes, for example.
    Author's Original Notes:
  • Transcript:
    Cost minimization: takes into account billing model for sustained bandwidth utilization
    MATT: Shabal, a couple of questions rolled in on the Q&A. Is the PfR supported on the higher-end 800 now? SHABAL: It's unclear. If it will be supported, it will be on the 892. I think it's officially not supported, but the product manager is working on that. So, unfortunately I don't have an answer for that just yet, but it's something we're trying to trace up because we've actually got a customer who wants to deploy PfR with a large number of 892 devices. It probably won't be deployed on any other 800 series, other than the 892. MATT: OK. And then, second question, from Richard, he seems to be confused on the G2 requirements. PfR runs on the G1 too. Can you confirm? SHABAL: Well, OER's been around for a very long time. But if you want the performance and the recent functionality on them, there's been a lot of work that's gone in to ensure the reliability of PfR. All of these features are in the G2. MATT: OK. Thank you. SHABAL: So this slide is actually just a summary of the benefits of PfR. And I think I've covered most of them already. But this slide might be useful, if you wanted to quickly show a customer what the benefits are.
    Author's Original Notes:
    Per-app policies: such as jitter and MOS for voice and video or TCP delay and packet loss for business applications
  • Transcript:
    So, if we're going to talk about PfR, there's some terminology that gets used a lot. And you see here again on the left-hand side of the branch, with the CP device. That router is marked with a couple of dark squares. And they're called exit links, and these are basically the exits that are under control of performance routing. And so the end result is this is exactly what PfR does, basically. It directs traffic to take a path through either one of many exit links. Actually, in the documentation it's called external interfaces. I tend to refer to it as exit links because actually that makes it a lot clearer.
    Author's Original Notes:
  • Transcript:
    I hinted earlier that the Master Controller was the brains of performance routing. Internally, in the software, the PfR functionality is split into MC and BR, for "border router." So they're logical functions. But they can run inside the same device, or they can run across two or more devices. So some examples of [INAUDIBLE] these don't actually show redundancy. At the branch, where you've probably only got one router, you can run the Master Controller and Border Router both together on the same device. And that's fine because, in terms of the volume of traffic that's running, the MC and BR will run at good performance at the branch in a single device. If you're running this at the other end, at the headquarters, then, you probably want to have a separate Master Controller contriving maybe one or more Border Routers. This is because there's some overhead on the Master Controller and the Border Routers. And so it's actually good to split the functionality out. I've got some more information on scaling later on. Also, it's awfully useful at the enterprise edge, where maybe you're not controlling which direction you want the traffic to go out or through. You can use active probing, which I'll let autorun. But at the enterprise edge, what you could also do is you could actually control which link the traffic actually arrives in into the enterprise, and that's done using BDPSD I'm sorry that's a little bit busy.
    Author's Original Notes:
  • Transcript:
    This is just trying to show the actual little bits of functionality that are within the Master Controller and the Border Router, and also where these bits of functionality are. So you can see the Master Controller primarily runs in the route processor, which is the Control Plane on the ASR 1000. For the Border Router, the functionality is split between the RP and the ESP card. Extra communication between the Master Controller and Border Router uses PCPOP. And that's why you can split that up if you really want to between two separate devices.
    Author's Original Notes:
  • Transcript:
    We already talked about the first scenario there, which is Master Controller and Border Router both running are the branch. And so traffic is being controlled to go out through one of the exit links, for example, as shown here. But PfR can also run in the other direction, too. At the bottom diagram you can see the red device is the Master Controller controlling two Border Routers, which are the green devices. And traffic's being controlled-- the exit links are running on those two Border Routers at the bottom, and so they can control which direction-- though which exit link, which could be one or another service provider-- which exit link the traffic is going through. We'll look at that technology later on in detail.
    Author's Original Notes:
  • Transcript:
    So when PfR runs, it actually has five main steps.
    Author's Original Notes:
  • Transcript:
    And we'll go through all of these steps in detail. So although we're going to be going through them sequentially, it's important to know that actually all of these steps can run concurrently. So once PfR is actually running, all of these steps can and do run concurrently. The first step is actually figuring out a way of actually identifying what the traffic of interest is, meaning which is the traffic that is actually important to the branch. You can see here that, for example, that might be done automatically, maybe using NetFlow, or you may actually configure which traffic is of interest. Once that's done, the router will actually monitor that traffic, and so we'll go into how this monitoring can actually occur. The monitoring can be passive or it can be active. Once the router's actually collected up information-- statistics based on the passive or active monitoring-- then it uses those measurements to try and figure out if the link is actually performing to specification or not. Once it's made the determination that a certain amount of traffic needs to be moved to a different link, then it will do path enforcement. There are several methods open to it for doing the path enforcement. Finally, people are probably wondering, if this device is going to be moving traffic across from one link to another, then is there a danger perhaps something could go wrong? And those kind of dangers are mitigated through control groups and timers. Like I mentioned, all of these steps actually run initially sequentially, but then they'll all be running concurrently.
    Author's Original Notes:
  • Transcript:
    PfR uses the concept of “Traffic classes”, and this basically means a destination IP or mask of addresses, or DSCP values or port number or NBAR application name.
    So the first step is looking at the traffic of interest. And there's two ways of doing this. The first method is automatic learning, and what that does is it relies on NetFlow. So you don't actually have configure NetFlow. When you configure performance routing, the NetFlow functionality will automatically turn itself on as long as you've configured automatic learning. And then what is will do is it will automatically identify the top destinations which have got the most throughput, or the ones which are suffering the most delay. And for that it only works for TCP. Automatic learning is very useful where you have an environment where it might be actually quite hard to determine what applications are in use in advance. For example, some businesses may not even know what applications are being run. So here it's very useful that you can actually instruct the router just to look, for example, at maybe the top 100 destinations. By destination, what I mean is it can be something very granular, down to an individual IP address. Or it could be very broad, with a [INAUDIBLE] to define a whole network or a sub-network of addresses. And it's known as a prefix. The diagram shows how it works, so it uses the normal NetFlow cache. It also periodically transforms the data by collecting it into a couple of views known as aggregation caches. These two views, or caches, are basically a format that performance routing can use. So one of them is in a format, which is prefix-based, meaning it's in exactly the format which performance routing has to use. The other cache actually contains some IP addresses, which are useful for active probes. Another method of configuring up the traffic of interest is just to manually do this. So what you can do is, if you already know which are the servers in the headquarters which have got the important traffic, then you can actually just directly go ahead and configure within the Master Controller bit of functionality, you can configure all the IP addresses up in prefixes. And like the automatic learning, these don't have to be single IP addresses, these can be used for the [INAUDIBLE] to define a whole group of IP addresses.
    Author's Original Notes:
    Automatic learning can be top throughput or top latency (delay) based. It can be qualified further, e.g. only to learn destinations with a particular mask of IP addresses, or a particular application type e.g. only citrix traffic.
  • Transcript:
    Packet delay, TCP session set-up delay, unsuccessful sessions, packet loss, jitter, bandwidth (bit/sec) consumption, MOS, etc.
    When will PfR monitor?
    Once you define that, the next step is to actually monitor the traffic. So the things that you might want to monitor can include, for example, latency, and you can figure that out by looking at the TCP handshake, packet loss by looking at the TCP sequence numbers. There's also active probes available, and there's many different types of probes available, like EDP, TCP, ICNT, and also just the probes for media.
    Author's Original Notes:
    What can PfR monitor?
  • Transcript:
    The important thing to know, when will PfR start monitoring? Because you may have multiple links, and so you're probably wondering does PfR only monitor the active links, or can it monitor the other links, too. So there's a few different modes of operation. One mode is passive mode, like I mentioned, which relies on NetFlow, and passive mode is useful for checking out packet delay, packet loss, reachability, meaning can any traffic get to that destination, and also throughput. So I think if you configure either manual mode or automatic mode whenever PfR is being configured, whenever that traffic is going through the router, if it matches a configuration or if it's in automatic mode, and there's a certain volume of it, then it will start monitoring that passively. That's if you've configured up passive monitoring. This passive mode is actually useful for enterprises only, normally if you're using it for any other use case, then you probably want to use active probing, or some hybrid method. So active mode, this relies on the [INAUDIBLE] probe, but again, you do not actually need to configure up [INAUDIBLE] probe, as long as you've configured up performance routing and you've put it to "mode active", then it will automatically do that for you. You can actually run into some different things with it. You can actually look at jitter. You can also directly specify and monitor mean opinion score for voice, too. Another method is "both mode." This uses active and passive mode together. And on the right-hand side you can see when these modes will run. So for the current exit, in active mode, there's always probing going on. For the other exit if you've got, for example, two exit links, and the traffic's currently running through one of the links and everything's fine, then the active probe will only be running through that link. For the other link, there will be no probes until you have the policy, [INAUDIBLE] So while that works, you may actually want something extremely responsive. And for that there's also a "fast mode." The fast mode is really great, if you want extremely quick path determination because it will send probes to every exit all the time.
    Author's Original Notes:
  • Transcript:
    So the next step is inspecting the measurements, and now the Master Controller will actually have a look at the measurements that got taken, to try and figure out if a threshold has been exceeded or not. So you may have set up different thresholds. Maybe for Citrix applications, you're willing to tolerate a certain amount of packet loss, maybe for voice, you're willing to tolerate a certain amount of jitter. And as you can imagine, it relies on statistics. It relies on being able to take an average and comparing that with the threshold. When you actually specify this in a policy, you can specify it in two methods. One's a relative method and the other one's called a "threshold method." You can see on the diagram here in grey those are the actual measurement taken. And then there's two levels of averaging that, a short-term average and a long-term average, too. Depending on the actual requirement-- and today we don't have any examples, normally it's actually done on the customer premises, if we've got a fair amount of data, if the customer actually tells us what actual applications they are running, then we can get some advice from engineering to try and figure out what the policy should actually look like. But in the future we'll have some recommendations so if we know a customer's running Citrix or we know that costumer's running a certain type of voice of video call, then we have a recommendation of what types of measurements to take. But today's is highly configurable, and it just needs to be done manually via configuration. For situations where a fixed threshold is sufficient, that's been marked with a dash and a blue line. Then basically what the router does, it just compares the short-term average with the threshold, and so whenever that threshold is exceeded by that short-term average, then that means that the router will take action, meaning it will try and look for a better exit link. So that's occurring at time B, for example, there. However, there might be situations where you need a short-term trend to be identified. And in that case, what you can do, you can actually look at the red dots, which are shown here at times A, C, and D. And those are three occasions when the short-term average was sufficiently higher than the long-term average, so you've triggered the router to take action. Notice that when the actual [INAUDIBLE] values are low, it takes a lot less of a spike to trigger the router to take action, because when you configure this, the configuration's in terms of a percentage and not an absolute value. So although there's three points marked on that diagram on the right-hand side where the router was triggered to take an action, if you look anywhere on the left-hand side of the graph, you can see the spikes where the brown line, which is the short-term average, went above the black line. Although there were spikes, as a percentage they're actually a lot lower, so the router did not take any action at those points to run its path selection algorithm. And that path selection algorithm, when it runs, basically the router will then determine what's the best exit for the traffic to actually take. Once measurements that are needed are defined, then it's actually possible to configure priorities as well on the measurement "Time". So on the left-hand side the measured value of [INAUDIBLE], for example, it could be packet loss or delay. You can actually set a priority as well on what's actually important for the particular application. So like I mentioned, we're going to have to evaluate further which settings are initially recommended for which application.
    Author's Original Notes:
  • Transcript:
    Once a path has been determined by PfR, so PfR looked at the measurement, decided that it does need to take action, now it's got a few options open to it on how to direct traffic to take a different route. But first let's have a look at what's inside the ASR 1000 so we can try and figure out how this can occur. So on the diagram you see at the top it says "Control Plane," meaning the route processor, and at the bottom of the data plane, the ESP card. So most functionality runs on the ESP card [INAUDIBLE] high performance. For all the different routing protocols-- for example the BGP announcement or the IGRP update-- the actual processes for the routing protocol they're running up in the Control Plane. And the routing capabilities are being maintained there. So for performance routing to work, one option open to PfR is to actually stick in a route within the routing table. And these get dilated down into the using some algorithm, for best use of T-count. And this is great because this is a good advantage for the ASR 1000 which has got a lot of T-count. And it runs really fast. Within one clock cycle it can make a decision. So this goes to show how PfR can actually run a very high performance on here, as well. The important thing is that a parent route needs to exist for PfR to actually be able to take control of routes for a certain application. And what I mean by that is perhaps an exact route already exists. In that case, if you're using BGP, then it will change a local preference, or if you have a static route, then it will just modify that static route to use a different exit path By a higher rate, what I mean is perhaps there is a /16 IP address for which a route already exists. Then it will do a prefix split internally and that won't be advertised out to another BGP AA. That will just be used within its own player. So for example, like a /24 could be their control. Where you actually need even more granularity than that, meaning where even an IP address needs to be more granular than that, then PVR could be used, so it could use a route map, where, for example, there's a particular application that needs to be controlled. But in general, PfR will use the best method open to it. So you'll use PVR as a last resort.
    Author's Original Notes:
    PfR can operate in an ‘observe’ mode too, where it does everything except the enforcement. Useful to see what PfR ‘would’ do.
  • Transcript:
    For controllers, there's three things that we're trying to do. We're trying to prevent [INAUDIBLE]. We're trying to make sure that the PfR is still responsive regardless of that, and also we need to make sure that there's time for the network to settle. By preventing [INAUDIBLE], what I mean is when the ASR actually makes a change, you don't want the ASR or ISR running PfR, if it makes a change, we don't want it to then make another change really quickly after that. But we still want it to be responsive, then sometimes we actually need to be able to control how quickly the router can make a subsequent change. And this is all configurable by means of a [INAUDIBLE] timer. By allowing the network to settle, this is important because there could be certain scenarios where when PfR is running, maybe there's an issue which is occurring on all exits. So maybe there's no optimum exit. And maybe some time is needed for some elements from the network to actually settle down. And so if there's actually no exit, no possible exit that can be found, then there's a timer within PfR called a back-up timer, and that kicks in. Actually the easiest way to see what's going on is through the use of two diagrams, which I'm showing now.
    Author's Original Notes:
  • Transcript:
    One is a PfR flowcharts which is actually highlighting everything which I've talked about earlier, meaning once you've actually configured or learnt the traffic of interest, then the ASR at the top you can see on the flowchart. And it'll start monitoring those prefixes. And after it's taken the measurement, if it decides that it's half the policy, then it will try and define a different link. The actual timer they're not showing here, but it just says at the bottom "damping", and that's actually to-- that's the timer to ensure stability. A much better way of actually representing all of this is by looking at this state machine.
    Author's Original Notes:
  • Transcript:
    And this is actually very useful as a reference, because if you're trying to debug PfR, then a good way of doing this is to enable Syslog and then just go and have a look in the Syslog to see the output messages. And you'll see things like "in policy," "holddown," these messages actually showing up. The [INAUDIBLE] log is the interim state, and it doesn't normally sit in there very long. That's just a state while its taking up all the measurements and trying to make a decision on which way to send the traffic. So if you can imagine, when you've actually initially configured up the router for PfR, it's far from the default stage. And there'll be a little bit of a delay to make sure that the configuration is all settled down. Then you'll move into the interim state briefly then you'll see the holddown state. So if you've just configured performance routing and then you look at Syslog, the first thing you'll normally see is you'll see it waiting in a holddown state. After that holddown state, the timer's expired, then hopefully it should move to an in-policy state, meaning that the traffic is meeting the configured policy, so the threshold which you set for packet loss or jitter, all of these thresholds are being met for the particular traffic which has been identified. And that's the normal state, so the left-hand state, the in-policy state, is the ideal state for traffic. Another important thing is, like I mentioned, whenever an exit is selected, it sits in that holddown state, but there may be some major problem, and the traffic may be totally unreachable to the destination. So in that case there's no point sitting in a holddown state. We should actually immediately change to a different destination. So in that case, it won't wait for the holddown timer to expire. It will immediately move to try and make another link selection, and then hopefully move to the in-policy state, as shown here. Of course if it's out-of-policy, then, again, it will move down through the interim state while it makes another link selection, and then back down to the holddown state. On the right-hand side, you can see the out-of-policy state. So that occurs when there's absolutely no feasible exit, like I mentioned, so all of the exit links are failing. And in that case, it sits there until the back-up timer has expired. And then it tries to make another link selection.
    Author's Original Notes:
  • Transcript:
    In terms of scaling, some figures here just for reference.
    Author's Original Notes:
  • Transcript:
    But basically, the important thing to be aware of is some ideal platforms for the headquarters [INAUDIBLE] before with RPG card. Our PSO responder had only 1001 or 1002 -X you should be more than fine. And of course 1800 series upwards for the branch [INAUDIBLE].
    Author's Original Notes:
  • Transcript:
    And if you are going to have significant levels of DPAC inspections. But bear in mind PfR does not need to use DPI always. You may be able to identify PfR priority traffic through other methods. But if you do need to use DPI then it might be worth picking a regional model up. In terms of network management, prime infrastructure courses already use that today for configuration of PfR.
    Author's Original Notes:
  • Transcript:
    You can just create a template. For monitoring, even prime infrastructure 2.0 doesn't address it really well, so we're going to be talking to an MTG on how we can fix that, the kinds of improvement that we want. But other than that's already demonstrated monitoring for PfR. And that'll be sort of the scale of improvements as well with the internet.
    Author's Original Notes:
  • Transcript:
    So to summarize, like I mentioned, the WAN performance is more critical to the enterprise today.
    Author's Original Notes:
  • Transcript:
    PfR directly addresses that. It accesses NOSTG IP NOSTG and you can combine it with a lot of other features, and I think that is resonating with customers today. And PfR helps come back in other than the routers, because they don't have such an equipment solution today.
    Author's Original Notes:
  • Performance routing Pfr

    1. 1. • What is PfR Benefits of PfR Why is PfR interesting now? Example PfR deployment scenarios • How it Works • How to demonstrate PfR • Example Deployments (case studies) • Scaling, Recommended Hardware • Summary • Resources © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
    2. 2. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
    3. 3. • WAN performance is more critical to the enterprise than ever before • How to engineer performance for applications? Redundant links may be idle • Application intelligence is needed in Degraded links may be carrying the network critical traffic! Recognise important traffic Recognise problems (or lack of) in the network Send the important traffic over the best link for that type of traffic © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
    4. 4. • • Dynamically influence routing – before users even detect faults Maintains user experience even in changing network conditions Improves Performance Improves Reliability WAN Cost Reduction High availability for DC and Cloud apps; increased uptime Improved user experience Makes best use of multiple links Active probes for fast response Media- and Application- Aware Routing DSL/3G/4G No manual Interaction Takes action on black-holes MPLS VPN ISR ASR ISP 1 Cloud Enabled Branch © 2010 Cisco and/or its affiliates. All rights reserved. Interne t ISP 2 Cisco Confidential 4 4
    5. 5. • PfR reuses many existing technologies • .. and introduces an algorithm and comms link Routing Protocols Policy Based Routing (PBR) NetFlow IP SLA AVC (NBAR) Border Router (BR) © 2010 Cisco and/or its affiliates. All rights reserved. Control Loop Timers State Machines Master Controller (MC) Cisco Confidential 5
    6. 6. • ISR G2 (1800 router upward) – Requires DATA license • ASR 1000 – Requires AdvIp/AdvEnt • Additional recommended licenses: SEC (required for encryption over the Internet), AVC (ideal for additional visibility/control functions) Platform ISR G2 (1800 upwards) ASR 1000 © 2010 Cisco and/or its affiliates. All rights reserved. License or Image Data (e.g. SL-19-DATA-K9) Security (e.g. SL-19-SEC-K9) Advanced Enterprise K9 or Advanced IP K9 FLASR1-IPSEC FLASR1-AVC Description Needed for PfR Needed for DMVPN Needed for PfR and DMVPN License for DMVPN Recommended to use AVC at the HQ Cisco Confidential 6
    7. 7. • Market transitions leading to apps in Clouds and DCs Cloud Workplace Flexibility Lean Branch Rapid Scalability BYOD IPv6 Auth/Encrypt Cloud Apps VDI Software Capabilities Save Costs Unified Fabric Video Smartphone Adoption Business Video © 2010 Cisco and/or its affiliates. All rights reserved. Immersive Video Cisco Confidential 7
    8. 8. • WAN connectivity options (e.g. DSL, 3G, 4G) DSL is more reliable than it was 5 years ago 3G offers high throughput 4G offers low latency • Drive to remain cost-effective and maintain performance Opportunity to reduce costs greatly; 75-90% savings in WAN costs per branch is possible with PfR © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
    9. 9. • Maintain cost-effectiveness/sustain savings moving to PfR/NGN • Get best utilisation from 2 DSL lines • Best user experience for business-critical apps: Protect business-critical apps Ensure the app works, and is responsive • Maintain app performance even if a DSL line is suffering from contention or anything else causing packet loss or delay • Ability to handle voice and video, on the same solution, at zero additional cost • Have a solution that will work with DMVPN, GET VPN and other features • Something that also works with 3G, i.e. access-agnostic © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
    10. 10. Requirement: Improve user experience Solution: PfR to maintain application performance Benefits: •Per-application policies can be set using the parameters that matter for the scenario •Dynamic best path determination before the user even uses the application •Voice and video performance is dynamically maintained throughout the call •Selects best path in both directions (Branch and HQ) Requirement: Make best use of multiple links Solution: PfR to provide load balancing Benefits: •The entire bandwidth of multiple links can be used •Applications will move link to meet performance needs •Cost minimization; load balancing takes into account ISP billing model Requirement: Increased branch uptime Solution: PfR to control all WAN links Benefits: •Most cost-effective way of increasing uptime •No manual interaction needed •Takes action on black-holes which traditional routing will not detect © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
    11. 11. • PfR controlled exits – known as Exit Links • PfR is transport agnostic, and ISP agnostic (aka External Interfaces) © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
    12. 12. BR • Some example topologies (redundancy not shown) MC • Master Controller and Border Router can be co-located BR Enterprise Edge Exit Links BR MC MC BR BR Branch © 2010 Cisco and/or its affiliates. All rights reserved. WAN Aggregation Cisco Confidential 12
    13. 13. • TCP/IP communication between MC and BR • Example message flows: Red: Setting/querying statistics via NetFlow Green: Programming in a new path Master Controller Policy Decision Point Active Probe Controller Config Top Talker Controller Database Passive Data Controller Reporting RP © 2010 Cisco and/or its affiliates. All rights reserved. Border Router Policy Enforcement Point Active Probe Export PBR API PfR Client Top Talker Export SAA API NBAR Client NetFlow API NetFlow Export RP NetFlow Client ESP Cisco Confidential 13
    14. 14. • Branch to HQ direction • HQ to Branch direction © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
    15. 15. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
    16. 16. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy 4. Apply path enforcement 5. Control loop © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
    17. 17. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy 4. Apply Path Enforcement 5. Control loop Automatic Learning Highest throughput destinations Most delay-suffering (TCP) Manual Learning IP addresses of important destinations Configured in prefix lists © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
    18. 18. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy 4. Apply Path Enforcement 5. Control loop • Several methods possible: Passive, Active and some hybrids Some Examples Latency: TCP handshake Packet loss: TCP sequence numbers UDP, TCP, ICMP probes RTP probes © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
    19. 19. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy Passive mode (delay, loss, reachability, throughput) 4. Apply Path Enforcement 5. Control loop As soon as configured (manual mode) As soon as traffic identified (automatic mode) Useful for Enterprise Edge only Active mode (delay, loss, reachability, jitter, MOS) Current exits: always Other exits: only when current exit is OOP Allows for best path determination even without traffic ‘Both’ mode Current exits: always Other exits: only when current exit is OOP Provides additional data points ‘Fast’ mode All exits: always Ultra-quick best path determination © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
    20. 20. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy 4. Apply Path Enforcement 5. Control loop • ‘Relative’ and ‘Threshold’ methods of specification are possible © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
    21. 21. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy 4. Apply Path Enforcement 5. Control loop A parent route needs to exist! Inside the ASR 1000 Exact route already exists Change local preference, or modify next-hop Higher route exists Prefix-split injected (not sent outside AS) More granularity needed PBR used © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
    22. 22. 1. Identify traffic of interest 2. Monitor the traffic 3. Compare with policy 4. Apply Path Enforcement 5. Control loop Prevent ‘flapping’ Responsiveness Allow network to ‘settle’ Hold-down timer – delay between exit changes Back-off timer – delay if no suitable exit can be found © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
    23. 23. Monitor Prefixes and Exits No • Monitoring can be passive, active (probes) or some hybrids Out-of-Policy Decision • What is measured passively? Throughput, TCP latency, TCP packet loss, TCP ‘reachability’(i.e. were there SYNs with no ACK?) • What active probes are available? ICMP echo (ping) to see if the destination is alive, UDP and TCP probe, RTP probes (for jitter, latency, etc). No Better EL Yes • How are paths enforced? – PfR will choose the best method. It can influence routing tables, or use dynamic route-maps, or static routing. There is a control loop to make sure changes are effective. • The damping is used to ensure stability Optimal Exit Link Selection Apply failed Yes Change a Prefix Exit Link Yes Damping © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
    24. 24. Default Short delay to allow configuration to settle Traffic initially identified or configured Unreachable InPolicy Traffic meets configured policy OOP Interim Periodic selection configured Successful exit selection OOP No suitable exit Interim state while link selection is made New exit selection Backoff time has expired OOP Out-of-policy; no routes meet the configured policy Holddown time has expired Holddown Unreachable Wait state to prevent flapping and gather rapid measurements © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
    25. 25. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
    26. 26. XE 3.8 • Number of traffic classes: 20k TCs Total TC = TC per branch x number of branches In practise, this will easily allow 300-500 branches per cluster • Number of branches: 300 (with a high number of TCs) Realistically slightly higher should be possible with a reasonable number of TCs, but needs testing beyond 300 • IP SLA responder sizing Use Performance dashboard: Realistically, ASR 1001 should be sufficient for most deployments http://wwwin-tools.cisco.com/CCIT/GPEOBI/saw.dll?PortalPages&PortalPath=/shared/Meteoric%20Dashboard/_portal/Meteoric%20%28ASR1k%20Performance%29 • DPI (NBAR) There is a performance hit, but realistically not all traffic needs this to identify the important traffic. ASR 1002-X has good DPI capability – 5Gbit/sec of inspected traffic © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
    27. 27. • HQ MC, BR ASR 1002-X or ASR 1004 • IP SLA Responder ASR 1001 or ASR 1002-X • Branch routers 1800 series upwards Significant levels of NBAR? Pick a router model one step up © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
    28. 28. • Network Management Prime Infrastructure is a good for configuration of PfR (create a template) Monitoring: Prime Infra 2.0 doesn’t really address this well ActionPacked has already demonstrated monitoring for PfR Plixer is another vendor with PfR monitoring capability • Further scale improvements CENT (Connected ENTerprise) will address this towards the end of next year © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
    29. 29. © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29
    30. 30. • PfR supports modern requirements – WAN performance is more critical to the enterprise today • PfR: Improves user experience Makes best use of multiple links and is cost effective Greatly increases application availability and reliability • PfR is access-agnostic, ISP-agnostic • PfR can be combined with DMVPN, HQoS and other Cisco solutions (e.g. MediaNet) • PfR helps combat against other vendor routers; they don’t have a good equivalent solution today © 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30

    ×