OpenFlow Aware Network Processor

OPENFLOW AWARE NETWORK PROCESSOR
Undergraduate graduation project report submitted in partial fulfillment
of the requirements for the
Degree of Bachelor of Science of Engineering
in
Department of Electronic & Telecommunication Engineering
University of Moratuwa
Supervisor: Group Members:
Dr. Ajith Pasqual W.A.T.M. Dananjaya
S. Iddamalgoda
K.G.P.H Kariyawasam
W.M.V.B. Wijekoon
April 2016

ii
Approval of the Department of Electronic & Telecommunication
Engineering
……………………………………………….
Head, Department of Electronic &
Telecommunication Engineering
This is to certify that we have read this project and that in our opinion it is fully
adequate, in cope and quality, as an Undergraduate Graduation Project.
Supervisor: Dr. Ajith Pasqual
Signature: …………………………………
Date: ……………………………………….

iii
Declaration
This declaration is made on the April 22nd
, 2016.
Declaration by Project Group
We declare that the dissertation entitled Project Name and the work presented in it are
our own. We confirm that:
 this work was done wholly or mainly in candidature for a B.Sc. Engineering
degree at this university,
 where any part of this dissertation has previously been submitted for a degree
or any other qualification at this university or any other institute, this has been
clearly stated,
 where we have consulted the published work of others, this is always clearly
attributed,
 where we have quoted from the work of others, the source is always given.
With the exception of such quotations, this dissertation is entirely our own
work,
 We have acknowledged all main sources of help,
………………………..
Date
………………………………
W.A.T.M Dananjaya (110089M)
………………………………
S. Iddamalgoda (110222R)
………………………………
K.G.P.H Kariyawasam (110285K)
………………………………
W.M.V.B Wijekoon (110630P)

iv
Declaration by Supervisor
I have supervised and accepted this dissertation for the submission of the degree.
………………………………………..
………………………………
Dr. Ajith Pasqual
Date

v
Abstract
OPENFLOW AWARE NETWORK PROCESSOR
Group Members : W.A.T.M Dananjaya, S. Iddamalgoda, K.G.P.H. Kariyawasam,
W.M.V.B. Wijekoon
Supervisor: Dr. Ajith Pasqual
Today the world is becoming more and more connected. Internet and mobile
communications are now part of our life; cloud computing, networked smart devices
are going to be pretty soon. More and more networking applications, with heavy
bandwidth requirements, are been introduced every week and number of users of these
applications are increasing at an even faster rate. This rapid growth means that today’s
networks have to be easily scalable, easily configurable and fast enough to handle all
the traffic. Current networking paradigm has some obvious flaws when it comes to
these requirements. So time has come to a paradigm shift in networking.
Software Defined Networking (SDN), theoretically, should be able to provide the
scalability and control that the new networking applications need, at a better speed
than what we have today. In this approach, networking control plane is not distributed
among the nodes, like we have today, but is rather centralized at a single location,
which provides the capability to configure or troubleshoot the complete network at
one go. Packet forwarding is done on a rule matching basis with the rules been
supplied by the central controller. Currently, there are a couple of networking
protocols that are designed to facilitate communication between the controller and the
nodes. The frontrunner of these is the OpenFlow protocol, developed by the Open
Networking Foundation.
Main objective of this project is to design a processor for a switch operating in an
OpenFlow environment. The processor should be able to carry out communications
with the SDN controller and to configure the data plane of the switch according to
controller’s instructions. How the processor handles these tasks differs with the
architecture of the data plane. After reviewing some available designs, we came to the
conclusion that it would be better to design the data plane ourselves; to get a better
idea about processor’s tasks and also to support high speed requirements. So first
phase of the project was to design a data plane architecture with the ability to support
very high speed network traffic. In the second phase, processor was designed
according to the requirements.
Project was first designed as a C-language model and later converted into a Verilog
design and implemented on a Xilinx Virtex-7 board. For demonstration purposes, a
pseudo-SDN controller was implemented in Java. All demonstrations were carried out
in a Linux environment. For troubleshooting, some network traffic generators and
capturing software were used.

vi
Dedication
This thesis is dedicated to Dr. Ajith Pasqual and the both academic and non-academic
staff, who guided us throughout the project.

vii
Acknowledgments
We would like to express deepest gratitude to our supervisor Dr. Ajith Pasqual for his
full support, expert guidance, understanding and encouragement throughout our
project. Without his incredible patience and timely wisdom and counsel, our project
would have been a frustrating and overwhelming pursuit. In addition, we express our
gratitude to Mr. Upul Ekanayake for encouragement and enormous support provided.
We would also like to thank Mr. Aditha Rajakaruna, Mr. Manupa Karunarathne for
helping us. Their thoughtful questions and comments were valued greatly.
Finally we would also like to thank the staff members and our fellow undergraduate
students of the department of Electronic and Telecommunication at University of
Moratuwa.

viii
Table of Contents
Page
Approval ii
Declaration iii
Abstract v
Dedication vi
Acknowledgments vii
Table of contents viii
List of Figures x
List of Tables xi
1 INTRODUCTION
1.1 Background 1
1.2 Problem Statement 3
1.3 Proposed system 4
1.4 Objectives 4
1.5 Deliverables 4
2 LETRATURE REVIEW
2.1 Software Defined Neworking 5
2.1.1 SDN & NFV 5
2.1.2 How SDN works? 8
2.2 What Are openflow network Components? 12
2.3 What are the Key features of the SDN architecture 13
2.4 What is Openflow Hybrid Architecture 13
2.5 What is NFV? 14
2.6 SDN data forwarding planes 14
2.7 Networking and Computer Security 15
2.8 Network processors & current trends 18
3 SYSTEM ARCHITECTURE
3.1 Overview 22
3.2 SDN Data plane architecture 24
3.2.1 Flow classification engine 25
3.2.2 Formatter 29
3.2.3 Flow processing Unit 29
3.2.4 Flow matching unit & Openflow pipeline 30

ix
3.2.5 Memory- Action/Rule Memory 35
3.2.6 Execution engine 36
3.2.7 Table missed handlers 37
3.2.8 Alternative architectural approaches 37
3.3 Customized network processor 41
3.3.1 Instruction set architecture 41
3.3.2 Complete instruction set architecture 44
3.3.3 Micro architecture 47
4 METHODOLOGY 49
4.1 Design & implementation 49
4.2 Phase 1: High level synthesis 50
4.3 Phase 2: RTL design & verification 51
4.4 Phase 3: PCI express subsystem & Ethernet subsystem 54
4.5 Phase 4: System integration and testing 57
4.6 Phase 5: Development of application layer 58
4.6.1 Performance analyzer app 59
4.6.2 Simple SDN controller app 60
5 RESULTS 63
5.1 Demo setup 63
5.2 Processor results 64
5.3 Data plane results 65
5.4 Total resource utilization 66
CONCLUSION 67
BIBLIOGRAPHY 69
ANNEX A 71
ANNEX B 72
ANNEX C 73

x
List of Figures
2.1 Current Networking Approach 5
2.2 Legacy Networking Overview 9
2.3 Software Defined Networking Overview 10
2.4 SDN and OpenFlow 12
2.5 OpenFlow Data Plane Architecture 15
2.6 Packet-based Forwarding 17
2.7 Flow-based Forwarding 18
3.1 Overview of the Proposed Architecture 23
3.2 Architectural Overview of the Data Plane 25
3.3 Flow Identifier in OpenFlow 26
3.4 Overview of the Flow Processing Unit 30
3.5 OpenFlow Flow Entry Format 31
3.6 Flow Processing Unit with Pipelined Flow Tables 32
3.7 How a packet travels through the Data Plane 36
3.8 Alternative Approach I 38
3.9 Alternative Approach II 40
3.10 Instruction Formats 42
3.11 Finite State Machine of the Processor 48
4.1 Overview of the Hardware Implementation 50
4.2 Test Procedure 52
4.3 Ethernet Switch Fabric 56
4.4 Test Architecture with Riffa 57
4.5 Switch Fabric Modified for Processor Communication 58
4.6 Application layer architecture 61
4.7 SDN Controller User Interface to add Flow Entries 62
4.8 SDN Controller User Interface for Observing the Active Flow Entries 63
4.9 A Specified Flow Entry 63
5.1 Demo setup readings 1 64
5.2 Demo setup readings 2 65

xi
List of Tables
3.1 Custom Instructions 44
3.2 Complete Instruction Set 44
5.1 Clock cycles per instruction type 65
5.2 Processor resource usage 65
5.3 Timing results of processor 66
5.4 Data plane resource usage 66
5.5 Timing results of data plane 66
5.6 Total resource utilization 67

1
Chapter 1
INTRODUCTION
1.1 Background
SDN (Software Defined Networks) is becoming the next breakthrough of the
networking & data forwarding technology which will result in the highly controlled,
monitored & supervised network control with efficient centralized SDN controller
functions. SDN three tier architecture has already proven performances with the
deployment of various network applications on distributed data forwarding plane with
the help of SDN control. This revolution will result in the SDN based clouds, core
networks, carrier networks in the near future. Most important impact on this transition
is that guarantee and trust build on networks with SDN while the traditional legacy
network approach is co-existing with the SDN networks while boosting the SDN
network performances. Therefore the smooth transition should happens all along with
the conventional networks and the SDN compatible devices should be capable of
handling this transition. When we consider SDN network perspective the network
functions and protocol stack that are running on the single network node is
distinguished from the legacy approaches. As a matter of pushing control function in
to more centralized and global plane and network node itself act as a data forwarding
plane, the way we define the network nodes is changed. As a result more and more
protocols running on the SDN controller & network node act as forwarding plane
according to commands given by the SDN controller. This may change whole network
architecture of running each and every protocol on every single network node. Result
is reduction of network control functions and protocols from network node and more
applications and application oriented functions are running on the network node. As a
SDN emerges SDN switch can act as a multi-layer forwarding plane and even act as a
firewall. Therefore SDN switch should be capable of handling more & more
application oriented task commanded by the SDN controller, which are alarmed to
SDN controller by SDN top tier: Application layer.
All along of the SDN advancement nowadays we are focusing on integrating some
advanced network security solutions and applications such as Network Function
Virtualization (NFV), Deep packet Inspection (DPI), Intrusion Detection &
Prevention systems (IDS & IPS) into SDN switch architecture instead of tradition

2
control protocols that were exhausted by the distributed controlling. Thus we need to
make leverages on scalability, flexibility, programmability & security on SDN
architecture by supporting hybrid solutions till the transition period till transition ends.
Even though the even SDN exist upon the traditional layered architecture implicitly,
we need to exploit that when we come application support. There are lots of network
processors exist currently on the market and also SDN data planes are also coming to
the market very often nowadays. But still we have a big problem catering SDN traffic
specifically to exploit its performance while moving to pure green field SDN concept.
Therefore compatibility, adoptability and specific solutions will result in SDN
accelerations. Therefore SDN aware network processor which can be extended to
pure SDN environment with the compatibility to more customized SDN data
forwarding plane will be needed in the future.
Trending network processors have lots of customized functions and instructions for
the network applications. As well as different implementations with pipelines and
parallel architectures. In our research we mostly use openflow as a southbound
protocol. Therefore our processor is mostly openflow aware network processor which
can be extended to hybrid openflow architecture and compatible with separate data
forwarding plane hardware module. In SDN environment the network processor act as
a more application oriented platform to give various SDN services over SDN data
plane. Most of the controlling functions are embedded into the SDN controller. With
the internal intelligence of the processor, dumb SDN data plane can act as a separate
module rather processor in embedded inside the data plane. Advantage is more control
protocols are being pushed into SDN central control plane while some amount of SDN
functionalities and protocol support survive within processor inside SDN switch. Still
bottleneck of the SDN network is scalability and high throughput of the system as
well as network security. Compatible SDN aware network processor can reduce the
exhausting general purpose functions and more legacy network oriented functions out
of processor and give responsibility to SDN control plane for that while emerging
more SDN oriented network functionalities to scale up the SDN network with
customized architecture. Therefore our research is basically focusing on how we can
make leverages on SDN customizations with more SDN aware functionalities to
support this scalability and high throughput. We see some SDN data planes has
arrived to market so SDN aware network processor that can support those data
forwarding with some intelligence is paramount important to the growth of SDN.

3
1.2 Problem Statement
Networking infrastructure has rapidly grown from traditional networking devices,
single Bluetooth device to massive data centers and cloud computing facilities. Thus
Internet of Things becomes paramount important topic to the networking world. As
long as internet is becoming more and more accessible and more usable the number of
technology nodes connected to the inter-network is rapidly growing. Network
heterogeneity and vendor specific protocols provide lesser flexibility to network
growth and researches. Therefore modern network should be much more flexible and
scalable. Along with the network growth technology should have a proper way of
implementing network programmability to support dynamic network environment.
Today’s networks are not programmable and because of that, in order to increase
scalability and flexibility, massive amount of funding and man hours are required.
There should be more proper way of implementing network programmability that
count for more research and innovations in the networking, without affecting the
heterogeneous network basis and various network protocols that run on the network
itself nowadays. To achieve programmability and flexibility, network must be capable
of understanding overall big picture of the network, as well as controlling upon the
data forwarding. Thus centralization of control plane to support distributed data
forwarding planes are required.
Growing complex network facilities including huge data centers, massive
clouds as well as fast growing ISP/Telecommunication core networks are clamoring
for faster forwarding, ACL applications and QoS adjustments. And also they required
more traffic shaping, traffic engineering and more control over data forwarding to
reduce traffic congestions and achieve faster data forwarding.
1.3 Proposed system
Our proposal is to design flow aware programmable processor with an extensible ISA
and fast OpenFlow Flow Forwarding unit that support line-rate flow-forwarding as
well as traditional/legacy forwarding and control, to support the innovative OpenFlow
Hybrid architecture. We expect to boost up flow processing with internal/external
memories and advanced algorithms to provide stateful flow based forwarding/pinning,

4
nested flow-forwarding actions for millions of flows and dynamic flow-based load
balancing while supporting legacy network processing. We design the system based
on the OpenFlow 1.4 specifications and standard network functions
1.4 Objectives
Main Objective is the OpenFlow aware network processor engine. Design of a
network processor with relevant hardware acceleration to support high speed
OpenFlow flow processing, protocol and forwarding as well as to support legacy
networking functions. Develop an instruction set architecture (ISA) and advanced
micro architecture to support OpenFlow hybrid structure (OpenFlow pipeline as well
as traditional processing). Developing the Supportive hardware acceleration and
processing unit specific for OpenFlow processing and extend our design to support
legacy networking infrastructure to enable hybrid architecture.
1.5 Deliverables
I) Advanced Network RISC (Reduced Instruction Set Architecture) Processor with
SDN support and to handle 1Gbps line rate.
II) Custom Instruction Set Architecture (ISA) and advanced micro-architecture with
Pipelined Architecture
III) SDN Data Forwarding plane architecture and Design with Native OpenFlow
support and Switch Fabric for multilayer switching functions
IV)Header Classification Engine with SDN flow entry formatting unit.
V) Flow Processing, Matching and Forwarding Unit with TCAM technology which
can handle 1Gbps line rate.
VI)Streamlined SDN programing interface that communicate with Network
Processor.
VII) Implement Processor and SDN Data forwarding plane with HDL (hardware
Descriptive Language) and prototyping with FPGA.

5
Chapter 2
LETRATURE REVIEW
2.1 Software Defined Networking
Fig 2.1: Current Networking Approach
2.1.1 Software Defined Networking (SDN) & Network Function Virtualization
(NFV)
What is the problem with today’s network infrastructure? If we consider internet
infrastructure consist of the network nodes they are lack of programmability,
scalability, flexibility, reliability, resiliency and security. This is because of the current
network architecture where each and every router process information and make
decision of their own network based on the protocol information. If we take single
network node it is very heavy of heterogeneity and incompatibility, consisting of
millions id lines of source codes which are statics and hardcoded to every single node.
And also functionality of the devices are categorized into specific requirements and
use packet-by-packet processing techniques without any states. Therefore we can
conclude that current networks are vertically integrated, complex, closed, static,
proprietary and not suitable for dynamic and experimental ideas. Therefore sometimes
it cost millions of money. And also managerial tasks such as routing management,
mobility management, access control and forwarding tasks. Therefore

6
Why Software Defined Networking (SDN) & Not Legacy Networking?
Software Defined Networking (SDN) is the next breakthrough technology
emerging in the technology world. Prevailing legacy networking has some bottlenecks
and drawbacks with the rapid growth of the large scale networks with the growing
complexity and lack of scalability. And also legacy networks has faced a huge
problems of redundant processing power due to distributed control functionality in the
network nodes. Network is always distributed, and the latest way of identifying the
network node is as a network device consisting of control plane and data forwarding
plane inside it, but separated, technique which look different to earlier approach which
is used to deploy both control and data forwarding planes mixed inside a single
network device. Therefore as an example if we want to proceed with the network
applications and protocols such as routing protocols, routing calculations and routing
table lookups should be processed by all the routers in the network. This a huge
processing power wastage. Other bottleneck arising with the legacy approach is that
network nodes take the decision task based on the narrow view of the overall network
rather having the global view of the networks. Therefore most of the time network is
becoming more unstable and congested when the network is expanding and
complicating. Lack of global view with a big picture of the network and individual
decision making of the network functions often end up with the bad consequences.
Other vital factor to be considered with this legacy networking approach is that
the programmability of the network which is highly emerging with the virtualizations
and flexibility of the modern networks. Prevailing network infrastructure is lack of
programmability which will going to be the most essential part of next generation
networking and telecommunication infrastructure. Actually the main idea behind this
new concept of SDN is mainly the programmability and scalability if the networks. If
we consider the current scenarios what happen if developer or engineer want to deploy
his own protocol or application throughout the existing network infrastructure and to
test that design. As an example if I want to launch my own routing protocol and
analyses the performance of the network. What I have to do in current scenario is that
I need to change the network infrastructure or I need to change all the operating
systems and network hardware to support my protocol. Therefore the programmability
of the network in existing scenarios is extremely hard. But we can observe that the
network is behaving as flows rather than the individual packets. Therefore what

7
researchers thought was that all the processing and changes to existing packets is
depends on the different tuples and fields of the packet headers. Current network stack
has already stabilized and well organized to provide layered network architecture, thus
no need to change the layered network architecture and its individual layer’s
functionality. What we can do is that identifying a novel methods to implement and
process the functionalities of the network stack with a better programmability and to
adapt with the dynamic network environment.
On the other side cluster computing and cloud computing is becoming an
emerging technologies with new technologies. Therefore concept called network
function virtualization is becoming paramount important which will be resulting
highly virtualized and abstract network application oriented hardware. For network to
be a virtualized and more application oriented the network should be highly
programmable and should be able to dynamically act with an intelligence. And also as
we mentioned earlier the paramount important factor to be considered for transiting to
highly virtualized networks is that we need a global vies and more often big picture of
the network. If we consider large wide spread networks such as clouds and datacenters
today they are moving towards the virtualization techniques to optimize the network
resource utilization and to process faster with higher performances.
Today most of the network information processing and transferring
technologies are widely talking about the information and network security which is
the highly is becoming a huge threat to the world. Therefore the new areas of
distributed network security features are emerging to cater the clamoring appetites of
the technology community. This concept is to spread and distribute the firewall
policies of the mother firewall all around the networks which provide better security
avoiding and single point of failures and providing distributed check point architecture
to the networks. Therefore network engineers and businesses need better security
while providing high performance networking.
Other problem exist still in the network that is we have to separate network
functionalities in to Layer 2 forwarding, Layer 4 forwarding ,Access controlling,
Quality of Service assignment and traffic engineering and traffic shaping and drop
packets with firewall polies up to several layers. What if we have an inexpensive
device that can perform all these tasks as necessary and according to the requirements
provided by the network administration? This can be only performed by identifying

8
the packets to specific flows and cater the network traffic as flows rather packet-by-
packet processing. This is very vital in the network virtualization also.
Therefore ultimate objective of the next generation networking technologies is
to achieve scalability. Resiliency, security, reliability & availability and more
importantly programmability of the network with the global point of view of the
networks.
In SDN,
I) Control and data planes are decoupled and abstracted from each other.
Controller and Data Plane share information through OpenFlow Protocol.
II) Intelligence is logically centralized, resulting in having a global view of
network and changing demands. Thus controller is the vital part of making
decisions. Data Plane consist of some intelligent to forward packets based
on flow forwarding.
III) Underlying network infrastructure abstracted from applications, which
makes it possible to develop different applications. Thus can be extended
for Network Function and Resources Virtualization (NFV).
IV)Programmable data plane brings automation and flexibility to networks.
V) Faster innovation with dynamic networking.
VI)If we support Hybrid Architecture OpenFlow run alongside of the legacy
network process, but independently.
2.1.2 How Software Defined Networks (SDN) works?
SDN is emerging out as the solution to address the prevailing bottlenecks of the
networking technologies that we discussed about. Software Defined Networking
(SDN) leverage the high speed networking performances by pushing the separated
control plane of all the distributed data forwarding planes in to centralized plane/layer
that can be used to take the decisions of the network not only looking into a single
network node, but also by considering global view of the network. Therefore it is
more effective and save lots of processing power in the network devices that can be
effectively used for the data forwarding task. Advantage of that is that now network
data forwarding plane essentially

9
Fig 2.2: Legacy Networking Overview
distributed data forwarding planes in to centralized plane/layer that can be used to take
the have more processing power than legacy networking scenario that can be used for
the data forwarding functionalities including flow processing instead of packet-by-
packet processing.
In SDN concept it separate the data forwarding plane and control plane and
control plane is pushed towards the central controller. Therefore it is kind of like a
running single network operating system at the controller which differ from the legacy
approach that each and every network node should run their own operating systems.
According to the Open Networking Foundation definition of SDN,
[Research paper-open networking foundation] In SDN architecture, the control and
data plane are decoupled, network intelligence, control and state are logically
centralized, and the underlying network infrastructure is abstracted from application.
-Open Networking Foundation-

10
Since the control plane is centralized into a single control plane, now all the
information, topology and protocol information are exist in a single place with a big
picture of the network. Therefore now forwarding task can be more fasten and
optimized, because we have additional resources and processing power.
Fig 2.3: Software Defined Networking Overview
Unlike the legacy approach now each and every network node/router don’t need to
perform control and information processing task. And also SDN switch can perform
multilayer forwarding up to higher layers, including application layer, SDN Switch
perform as a multilayer switch as well as firewall which can contains flows rather than
individual packets processing. Therefore it looks like a firewall that can execute rules,
not only dropping and multilayer switch which can even perform firewalling, quality
of service assignments and traffic engineering functions at data plane layer.
Therefore it is clearly shown that the SDN operates on the three tier
architecture, which is not equivalent to OSI seven layers architecture which is used to
organize and operates network protocols. SDN is not exploiting the traditional OSI
layered architecture, which is the basis for the network protocols to be operated in the

11
network. What SDN does is that it changes the way that the network identifies the
protocol and how it will be processed in the data forwarding plane. What protocols
need to perform in the data forwarding plane is that, they need to define certain
matching SDN fields/tuples that are extracted from the SDN data forwarding plane,
that are relevant to process the protocols. SDN data plane consist of the methods to
identify the specific flows that the packet belongs to and extract the SDN matching
tuples from the packet header. As example if want to perform L2 forwarding only then
the designer only need destination MAC address and Source MAC address out of
large number of SDN matching tuples/fields to forward packet to relevant port.
Therefore SDN control plane/ SDN Controller, responsible for the building network
topology and network hierarchy and to push certain policies and rules into the SDN
data plane as queries.
Therefore apart from the packet forwarding and dropping tasks, SDN data
plane should have another paramount important functionality that is not vital in the
legacy networking approach. Communication with the SDN controller or control
plane when there are unknown network flows. Then SDN switch query the SDN
controller with the packet header information and ask for new policies and rules to
execute against that flows. Therefore now instead of earlier approach now we have a
point that we need more security, reliability and guarantee of the whole network:
communication between SDN controller and SDN data forwarding plane.
This where southbound protocols such as OpenFlow comes to play a major
role to acquire the standardized way to operate on communication between the SDN
switch and SDN controller. The main functionality of the southbound protocol is to
identify the resources, capabilities and hardware of the SDN data forwarding plane
and send those information to the SDN controller. And at the run time It should send
the unknown flows to the SDN controller with the relevant information for building
up the network structure and to get the policies related to the that specific flows.
With SDN three tier architecture there is another thing that we want to care
about which is inherently make things more advanced and easy. That is the
communication between the SDN application layer and the SDN controller which is
more often known as the North Bound protocols.
Application layer run upon the SDN controller which has the global view of the
network, that make network to be highly abstracted and virtualized. This is where
NFV (Network Function Virtualization) also comes to play a major role.

12
2.2 What are OpenFlow Network Components?
There are two major components of the OpenFlow Network. Mainly
I) SDN Controller
II) SDN Data Forwarding Plane
a. Flow Tables
i. Flow Entries (TCAMs)
b. Processing
i. Pipeline Processing/OpenFlow Pipeline Processing
ii. Packet Classification
iii. Packet Matching
iv. Instructions & Action Set
All these components and their functionalities are well defined in the OpenFlow
architecture. OpenFlow is more stabilized and rapidly growing protocol that make
SDN more comprehensive and easy to use.
Fig 2.4: SDN and OpenFlow

13
2.3 What are the Key features of the SDN architecture?
1. Architecture for Centralized Control Plane/ Network Operating System with
an application oriented & Service oriented architecture.
2. Fast flow forwarding architecture with dynamic quality of service adaptation
3. Pooling to achieve scalability
4. Highly programmable data forwarding plane
5. Appropriate abstraction to foster simplification
6. Decouple topology, traffic and inter-layer dependencies
7. Dynamic multi-layer networking
And also OpenFlow provide better programmable interface to SDN data forwarding
plane through flow table programming interface. The programming of the SDN data
forwarding plane can actually happening in two ways. Either SDN controller can
program the data plane according to the management/application layer facilitated or
users can locally program the data plane according to the local or custom
requirements. But according to the SDN architecture globally programming according
to the application/top layer requirement is the best way of managing next generation
networks.
2.4 What is OpenFlow Hybrid Architecture?
Today’s networks are massively deployed with the legacy network
infrastructure which is more familiar to network engineers. Therefore converting this
infrastructure into more distinguish SDN architecture cannot be taken place rapidly
within as short period of time. Instead networking world need the smooth transition of
the technologies. We can deploy SDN in the prevailing network infrastructure, but
less efficient and cannot exploit the maximum advantages of the SDN & NFV
concepts. On the other hand the existing hardware cannot be fully replaced by the
SDN supported hardware due to expenses and time. Therefore the concept of Hybrid
SDN architecture is vital which support both legacy networking as well as Software
Defined Networking on the same hardware within the transition time. Therefore the
SDN hybrid architecture can work and coexist with both SDN hardware as well as
legacy networking infrastructure. This approach save both time and cost of the
organization.

14
2.5 What is Network Function Virtualization (NFV)?
In networking, network virtualization is the process of the combining hardware and
software network resources and network functionality into a single, software-based
administrative entity, a virtual network.
There are two main categories of the network virtualization?
1. External Network Virtualization
2. Internal Network Virtualization
External Network Virtualization
Combining many networks, or part of networks into a virtual unit. It essentially tells
that combining excessive resources into a single virtual unit like cloud combine
several data center’s resources into a single cloud. Since now we have a single
network operating system we can run hypervisor on top of the single virtualized
network to achieve creation and monitoring id VMs.
The interesting part of the network function virtualization is that we can reduce
the packet travelling latencies. And also for example Routing, Switching, NATs,
Firewalling, IDA, IPS can be virtualized easily on network hardware to provide
dynamic services to the network flows. NFV is going to be next revolutionary
technology with SDN, because with the SDN architecture there are lots of potential
network functions in both computer and telecommunication networks. This can be
easily achieved because of the forwarding abstraction of the SDN data forwarding
plane and the decoupled centralized control plane.
2.6 SDN data forwarding planes
Our literature is mainly focus on two areas: SDN data planes and Network processors.
Because we were going to develop the SDN eco system which includes the SDN data
plane as a dedicated hardwired logic and more processing and management is up to
customized SDN aware network processors. We reviewed mainly the SDN realization
companies such as Broadcom, Corsa, ZNYX, NEC, CISCO, Brocade and their SDN
researches.

15
Fig 2.5: OpenFlow Data Plane Architecture
2.7 Networking and computer security
Fast and high performing Network Processor are becoming a paramount important
fact for the high speed internetwork. Therefore we have to review current network
processing architecture since we need to develop our own customized network
processor to boosting up the SDN and OpenFlow processing. Our aim was to develop
lightweight customized OpenFlow aware network processor to facilitate protocol
supports and OpenFlow operations such as secure communication with SDN
controller, OpenFlow buffer management and programming of the SDN data plane.
Network Processor Design and Advancement
First we are going through legacy networking stack which consist of traditional L2, L3
network functions etc. Our main focus is concentrated on prevailing network
architectures such as advanced CEF (Cisco Express Forwarding) implemented by
well-known network vendors with standard network protocols. Traditional network
designs and architectures based on L2, L3 forwarding, QoS (Quality of Services),
ACL based on the OSI or TCP/IP model has already implemented in almost all the
networking nodes. Our First approaches is to identify the network nodes that are
placed all over the network. We have identified 4 major types of Network Devices.

16
I) End-Devices
II) Edge Routers
III) Backbone Switches
IV)Backbone Routers
Basically Backbone switches are based on the L2 forwarding. Most of the time LAN
technologies deploys such switches. Based on the topologies and network type LAN
technologies can be varied. Backbone switches can be implemented with various
technologies such as Ethernet, frame relay, ATM etc. In modern approaches
technologies such as MPLS also can be used within backbone. Whatever LAN
technology they have to avoid redundant paths and maintain better point to point
connection rather end-to-end. They are usually operate in bottom 2 layers: Data link
and Physical.
Backbone Router play critical role in route handling, routing and intermediate traffic
handling based on the information gathered around through protocols such as link
state (OSPF, EIGRP) and distance vector routing protocols (RIP). Core networks are
consist of complex backbone routing network to direct and route traffic.
When it comes to edge routers, in conventional architectures there is no intensive
differences between backbone and edge routers. But edge router should have more
capabilities in modern advanced technologies such as MPLS, route redistribution,
route aggregations, routing information exchange. And also it should capable of
handling and processing large scale traffics rather forwarding. Security appliances as
well as quality of services can be different from one type to another. Whatever
technology that backbone are comprising of should have proper way of operating in
higher layers.
Modern technologies such as layer three routing may exploit this architecture by
differentiating one service to another level and provide hardware level acceleration to
the fast packet forwarding. Most of the time today’s technologies comprise of packet
based forwarding.

17
Fig 2.6: Packet-based Forwarding
Multi-layer switching may change the way that usual l2 forwarding happens. As an
example if we take Cisco CEF architecture it supports up to layer 3 at forwarding
hardware. Therefore the SDN can be seen as advance extension of the layer 3
forwarding but with more and more features.
Apart from the network forwarding the current trending and most vital thing on the
networking is the information security. Therefore most of the network and information
security areas are focusing on the network security topics such as distributed
firewalling. Therefore providing better confidentiality, authentication, authorization,
access control, with the proper reliability and availability of the networks is an
essential task of future networks. Therefore we spent a short period of time to study
how to implement security features on next generation SDN switch.

18
2.8 Network processors & current trends
Fig 2.7: Flow-based Forwarding
Since we have been working on the optimizing and fastening the networking in both
scenarios, but initially more concentrating on the Greenfield SDN concept, our main
target was to develop a SDN/Open Flow aware network processors. Therefore our
review was mainly concentrated on processors, network processors and customization
of processors for a specific tasks. We reviewed on the current network processors and
their capabilities and performances such as Cisco, NEC, and Ericson etc. And also we
have been looking into current SDN startups, their researches and emerging products
in SDN areas. And also we thoroughly focused on the network processors and generic
purpose processors used for the SDN switches.
We saw that most of the SDN switches currently in the market use generic purpose
processors such as Intel Xeon. If we take both top SDN companies Corsa and ZNYX,
they both adopt the Intel’s Xeon processor as their network processor which is more
generic purpose. If we take Corsa architecture they have their own SDN data
forwarding plane with some flow tables. But ZNYX has deployed Broadcom OFDPA
OpenFlow data plane with the Intel Xeon processor for the SDN switch. Network
processors task is to manage the local configuration and switch set up and also it
should support existing network protocols and control plane functionalities in the
legacy scenario. But when it comes to SDN (Software Defined Networking) most of
the controlling functionalities are pushed towards the centralized controller, therefore
the processing task and protocol operational task now becomes lightweight at the

19
processor. Therefore SDN aware processor should consider some additional task
which we will be discussed later of establishing and maintaining the communication
between SDN controller and SDN switch. And also processor should manage the
programming of SDN data plane with both local and central requirements. Managing
the flow tables, keeping track of the flows and their information and managing
OpenFlow buffers are most vital processing task that now comes towards the
processor.
We also analyzed the current network processors in the market. As we realized among
all the network processors Cisco Toaster and EZChip Network Processor is two best
network processors with high performances. If we take modern trends in the
networking companies like Ericson researched for OpenFlow processor which
facilitate all the OpenFlow operations and classification and flow tables on the
processor itself. According to our review, although the Network Processors has
evolved through several ages, still clamoring for high speed and high performances.
According to our review, we have identified 4 basic level of processing which consist
of many more functions.
I) Interface Level:
 Framing, Integrity Check, Bridging, Load Balancing
II) Protocol Level:
 Routing, Redundancy Avoidance
III) Packet Level:
 QoS, Firewalling, Packet Level Load Balancing
IV)Flow Level:
 Active processing
Basically it consist of 5 main sub systems inside apart from low level processing units.
I) I/O system
II) Memory System
III) Classification Tables
IV)Context Mappers
V) Centralized Control Unit
Network Processor must mainly concentrate on basic processing and some advanced
processing techniques.
I) Packet Multiplexing, De-multiplexing (Encapsulation-De-capsulation)

20
II) Packet Processing
III) Packet Forwarding
IV)Packet Blocking
Packet Processing Functionalities mainly focus towards,
I) Protocol Conversions
II) QoS(Quality of Service) and Security ACL (Access Control List)
III) Payload Conversion
IV)Custom processing
But still large scale network infrastructure still lagging behind necessary requirements
of the network architectures and designs.
I) Flexibility
II) Scalability
III) Programmability
IV)Modularity
V) Extendibility
To provide the above functionalities Modern Trend and Architectures the following
new and advanced feature are Implementing in the processors.
I) More customized and modularize processing of data stream
II) More flexible, scalable and programmable solution
III) IPV6, Multicast, QoS handling and Multi-Layer Forwarding
IV)Store-Process-Forward paradigm
V) Control data path processing by software rather than hard wired logic/approach
VI)Multi-Processing engines
VII) Reduce communication delays, drops and jitters
VIII) Higher Clock rates
IX)Standard RISC like cores and specific hardware accelerators (ASICs)
X) Modular Network Processor approaches
XI)Programmability and scalability towards the evolution of the networks
XII) Massively parallel and pipelined architecture
Major functionalities that we try to achieve using a SDN/OpenFlow aware processor
over legacy network processors can be identified as follows. Those are essentials in
OpenFlow aware network processor.
I) Packet Processing, Flow classification & s Line Rate support of the data
forwarding plane

21
II) Flow matching and flow management
III) OPENFLOW Secure Channel Establishment
IV)OPENFLOW Protocol Support with
a. OpenFlow Buffer Management
b. SDN Data Plane Programming Control
c. SDN Data Plane Info Access
V) SDN/OPENFLOW application & services support
VI)Packet forwarding as flow forwarding abstraction
Because of the SDN architecture we can reduce the exhaustive tasks and processing
time and resources intensive some tasks dedicated to legacy networking. According to
our literature reviewed we identified the following operations that can be neglect in
the Greenfield SDN concept which were regular part of the usual network processors.
I) Legacy Network Control Protocols Support
II) Lookup Tables and Pattern Matching
III) Higher Layer Forwarding
IV)Access Controlling and Queue Management
V) Traffic Shaping and Control
VI)More application oriented processing
VII) Forwarding

22
Chapter 3
SYSTEM ARCHITECTURE
3.1 Overview
Our architecture consists of a network processor customized for SDN and OpenFlow,
and two dedicated logic units to handle flow classification and flow matching. Main
objective of the architecture is to operate reliably at high speed and that is why we
have introduced the dedicated units. Also, the processor supports some custom
instructions that enable it to execute OpenFlow specific tasks faster. Special care was
taken to limit the use of costly hardware resources (like TCAMs) whenever the impact
on speed by doing so is negligible.
When it comes to SDN data plane and traffic forwarding, primary tasks are
identifying flows, finding the matching flow entry and applying the specified actions.
Apart from that, the data plane should also have the capability to communicate with
the centralized controller. As today’s networks are dynamic entities, handling these
tasks should be done in a flexible manner.
In order to obtain that flexibility and also keeping with the SDN principle of
programmability, what we introduce is a customized network processor. It has two
dedicated units to handle the two most important tasks of traffic forwarding; flow
identification and matching. With these two units and the custom instructions of the
processor, our architecture is able to handle very fast line rates reliably, as proven by
experimental results (presented in section IV). A simple overview of the architecture
is given in the diagram below.

23
Fig 3.1: SDN Data Plane Architecture
For flow identification, we introduce a flow classification unit, a dedicated logic
block. This classifies the incoming flows according to their type (ether-ip-tcp, ether-
ip-icmp, ether-vlan, etc.). With the flows classified correctly, the information needed
to construct the flow identifier (IP addresses, MAC addresses, etc.) can be easily
extracted from packet headers.
The other dedicated logic unit (‘flow match unit’) is for matching the incoming
traffic flows with the stored flow entries. Flow identifier constructed earlier is used for
matching. This consists of three types of memory blocks; a very fast, low capacity
cache memory, a TCAM block and a RAM block. Cache memory stores the most
recently matched entries, as it can be expected that, often, packets belonging to a
specific flow would arrive one-after-the-other. TCAM block is to store the flow
identification part of the flow entries. Identifier constructed from an incoming traffic
flow would be matched against these. A match outputs an indicator to a memory
location in the RAM block where the actions of that flow entry are stored. We do not
store actions with the identifiers for the simple reason of TCAMs being high-cost
memory blocks; this configuration limits TCAM usage.
Purpose of the input and output buffers is storage of incoming and outgoing
packets. When a matching flow entry for a specific packet (belonging to a particular

24
flow) is found, actions are retrieved from the RAM block in the flow matching unit
and the specified actions are applied to that packet by the processor prior to
forwarding it to the output buffer.
Interacting with the two units and communicating with the centralized
controller are tasks of the processor. Flow entries sent by the controller in ‘flow_mod’
type OpenFlow messages have to be written to the memory blocks in the flow match
unit. This is commonly referred to as ‘programming the data plane’. We introduce
programming interfaces and some custom instructions to efficiently carry out data
plane programming and communication with the controller.
3.2 SDN data plane architecture
Data plane is the hardware sub-unit where the actual packet forwarding takes place.
As previously stated, rather than having completely dumb data planes as network
nodes, in our architecture there is some intelligence with them, supplied by the
customized processor.
Keeping with the OpenFlow protocol, packet forwarding is done in a flow-based
manner. Rules that should be applied to each flow are stored in the data plane and
each packet is matched with these rules to find the appropriate one/ones. Actions
specified in that rule/rules will be then applied to the packet. All this takes place in
the ‘dumb’ data plane. Processor will handle the programming of the data plane and
any necessary communication with the central (SDN) controller.
Data plane in our architecture consists of three primary parts. These are;
1. Pre-processing Unit
2. Classification Engine & SDN Match Field Formatter
3. Flow processing Unit/Execution Engine
a. OpenFlow Pipeline
b. Execution Engine
4. Programming Interface

25
INGRESS BUFFER EGRESS BUFFER
CLASSIFICATION
ENGINE
OPENFLOW PIPELINE
EXECUTION ENGINE
FLOW CACHE
ACTION
MEMORY
PRE-COMPOSER
FORMATTER
SDN PROG INTERFACE INTERNAL PACKET BUFFERS
TABLE
MISS
HANDLER
Fig 3.2: Detailed Architectural of the Data Plane
Other than these principal modules, packet buffers are there to store the incoming
packets. Additional communication channels with the processor may be added to
facilitate faster programming of the data plane. Incoming packets are first received
by the pre-processing unit. There necessary information is extracted from the packet
header and formatted to a ‘match-field’ set, as specified in the OpenFlow protocol.
These match-field sets and the packets are then stored in separate buffers.
3.2.1 Flow classification engine
One of the main differences between traditional networking and software defined
networking is that, in software defined networking, the switch can performs flow
processing rather than packet processing. In a Software defined networking switch, in
the flow matching unit, the flow of a packet is matched against the given flow entries
in the tables and apply specified rules. In order for flow matching unit to identify the
flow of packet there has to be a unit which classifies packets according to the flow.
That unit which performs that classification of packets according to flows is called the
Classification Engine. Therefore basically the output of this classification engine will
directly be the input to the flow matching unit.

26
In classification engine the flow of a packet is identified by looking at it fields in the
header, such as destination mac address, source mac address, destination IP etc. In
Openflow there is a specific format of the fields to identify a flow. These fields cover
OSPF level 1 up to level 4. Two packets are said to be from the same flows if and only
if all the fields that are specified are the same. Therefore the main role of the
classification engine is to extract those relevant fields from the header of the packet
and arrange them according to the specified format, which is known as the match
field, and passing that to the flow matching unit. The flow identification fields
specified in the OpenFlow are shown below.
Fig 3.3: Flow Identifier in OpenFlow
When extracting the above mentioned fields from the packet header we need to
know the position of a particular field in the packet header, which varies with the
protocol being used (IPV4, MPLS, VLAN etc.). Therefore when formulating the
match field for a packet, first we should know the protocol of the packet, in order to
extract the fields from the correct positions in the header.
Mainly the protocol identification process is carried out by checking the ether type in
the packet header. The ether type for some of the protocols are as follows.
0x0800 – Ipv4
0x86DD – Ipv6
0x8100 – VLAN tagged frame
0x8847 – MPLS unicast
0x8848 – MPLS multicast

27
Once the protocol type of the packet has been identified, we can extract the relevant
fields from the packet header to form the match field. The match field will consist of
356 bits. The match field consists of the following fields with these number of bits in
the given order.
Ingress Port 32
Metadata 64
MAC Source 48
MAC Destination 48
Eth Type 16
VLAN ID 12
VLAN Priority 3
MPLS Label 20
MPLS Traffic Class 3
IPv4 Source 32
Ipv4 Destination 32
Ipv4 Protocol 8
Ipv4 ToS 6
TCP Source 16
TCP Destination 16
----
Total 356
All these processes mentioned above will happen in a single clock cycle in the
hardware layer. There is a dedicated hardware logic implemented to carry out the
above mention operations. At the end the extracted fields will be written to a register
of size 45 Bytes. Then that completed match field will be en-queued to the Match field
buffer (if the buffer is not full), a Memory unit which act as a queue. And that packet
will be en-queued to the Packet Buffer, which will also act as a queue.
Once a match field is requested by the flow processor, the oldest match field in the
match field buffer with the relevant packet in the packet buffer will be returned to the
flow processor. Once the match field returned to the flow processor, that match field
will be removed from the match field buffer, and the relevant packet will also be
removed from the packet buffer.

28
As SDN introduces flow-based traffic handling to replace the traditional
packet-by-packet approach, identifying traffic flows is a primary concern. This should
be carried out in as little time as possible, as it affects throughput directly.
For identifying incoming flows, certain information should be extracted from the
header of the packet and arranged to the format specified in the OpenFlow protocol
(this format is illustrated in section II). How these information are extracted changes
from flow type to flow type. As an example, consider extracting the TCP source port
from a packet header. The exact location of that field would be different in an IP-TCP
flow and a VLAN-IP-TCP flow. So for constructing the identifier for an incoming
traffic flow, identifying the flow type is an essential task.
In our architecture, we introduce a dedicated logic unit to handle flow type
identification. For testing purposes, we were only concerned with a limited number of
flow types and the results were encouraging. The design can be expanded easily to
handle less frequent traffic types as well.
Introducing a dedicated logic unit for identifying flow type does limit the flexibility
to a certain extent (as an example, if a new L3 protocol is introduced, the whole unit
has to be redesigned). But this unit is capable of classifying traffic belonging to all the
major flow types, and at a very fast rate. The processor discussed earlier is equipped
with the necessary instructions to handle flow classification on its own, although at a
comparatively lower rate. So the flexibility aspect of SDN and OpenFlow is not
violated, as classifying could be carried out either by the fast dedicated logic unit, or
by the more flexible processor.
After identifying the type of flow, extracting the necessary information from packet
headers is a task that is to be handled by the processor. The processor would thus
construct the flow identifier and pass it to the matching unit which would in turn send
the set of actions to be applied to that flow.
The position of a particular field in the packet header varies for different type
protocols (IPV4, MPLS, and VLAN etc.). Therefore when formulating the match field
for a packet, first we should know the protocol of the packet, in order to extract the
fields from the correct positions in the header. This unit is responsible of this packet
classification. Mainly the classification is done by checking the ether type in the
packet header.
And OpenFlow as an application by specific port 6633. Therefore we can
classify packet with any combination of above protocols. And also IP protocols

29
explained below is taken into account for classification. This classification engine is
implemented with the parallel architecture and consume single clock cycle for the
classification. We developed this architecture to support streamline network
processing without delays and packet drops to enhance the performances and quality
of service.
3.2.2 Formatter
Once we know the type of the packet we can extract the relevant fields from the
packet header to form the match field. The match field will consist of 356 bits. These
fields will be extracted to a register of size 45 Bytes. Then that completed match field
will be en-queued to the Match field buffer (if the buffer is not full), a Memory unit
which act as a queue. And that packet will be en-queued to the Packet Buffer, which
will also act as a queue.
Once a match field is requested by the flow processor, the oldest match field in the
match field buffer with the relevant packet in the packet buffer will be returned to the
flow processor. Once the match field returned to the flow processor, that match field
will be removed from the match field buffer, and the relevant packet will also be
removed from the packet buffer.
3.2.3 Flow Processing Unit
Flow processing unit is where the flow matching and action applying takes place. As
we have identified, these functions are the bottleneck when trying to reach higher
speeds with SDN (specifically OpenFlow) switches. Architecture is therefore
designed to achieve the maximum speed possible in these tasks while keeping the
design viable as a commercial product.
Flow processing unit can further be divided into the following sections.
I) OpenFlow Pipeline & Flow Matching Unit
II) Action/Rules Memory
III) Execution Engine/Action Processor
IV)Handlers

30
Fig 3.4: Flow Processing Unit
3.2.4 Flow Matching Unit & OpenFlow Pipeline
This In SDN, as the control plane is decoupled from the data plane and centralized,
traffic forwarding is reduced to looking up a rule table and executing the actions
specified in the matching rule. Also, SDN and OpenFlow introduce flow-based
forwarding, replacing the traditional packet-by-packet forwarding. So, a network node
operating in an SDN environment that uses OpenFlow should have tables of rules
(‘flow-entry tables’ or ‘flow tables’) which specify what to do with each flow of
traffic. Finding the flow entry which matches with the received traffic flow is called
‘flow matching’. Efficiency of traffic forwarding depends very much on how
efficiently this matching task is executed. In our architecture, we introduce a separate
dedicated unit to handle flow matching.
Before we go into details of how the unit operates, it is essential to have some
understanding of the format of an OpenFlow rule, or a ‘flow entry’. An overview of a
flow entry is given in the diagram below.

31
Fig 3.5: OpenFlow Flow Entry Format
An OpenFlow flow entry consists of three main parts; rule, action and stats. Rule, or
‘match fields’, gives the identifier for distinguishing the flow for which the rule is
intended. It is against these that the identifiers constructed from incoming traffic are
matched. Action part of the flow entry specifies what to do with the flows matching
with the entry. Mostly, actions would specify the port out of which the flow should be
forwarded or whether it is to be dropped. In more recent OpenFlow versions, some
more actions are introduced, like modifying certain fields of the packet header. Final
part of the entry, stats, contains counters to keep track of the number of packets/bytes
that matched with the entry. For analyzing the traffic patterns and network
performance, controller might ask for these counter values from the nodes.
The entries are sent by the centralized controller to the nodes in the network, in
OpenFlow ‘flow_mod” type messages. These are stored in the nodes, typically in a
tabular format. For one particular flow, the controller may first send a flow entry with
a simple forward action and would send more actions (NAT, Firewalling actions, etc.)
for that flow as the network changes. Also, controller may wish to remove certain
actions or the flow entry altogether. Writing these entries and modifying the flow
tables are collectively called ‘programming the data plane’, another essential task
related to the flow matching unit.
Following diagram highlights the main components of the unit we use in our
architecture.

32
Fig 3.6: Flow Processing Unit with Pipelined Flow Tables
To store the flow entries, it uses two memory blocks; TCAM block and RAM block.
TCAM block holds match fields and stats parts of flow entries while the RAM block
stores the actions. Ternary content addressable memories (TCAMs) are known to be
very fast memory blocks, capable of finding a match in one clock cycle. The downside
to this is the high cost and power consumption. RAM, on the other hand, is quite
cheap and, as it is not content addressable, one has to search through all the stored
data to find a match, which increases latency to a level that is not tolerable in
networking. By using TCAMs only to store the match fields and stats of a flow entry,
we reduce TCAM usage almost by half when compared with typical TCAM
implementation of OpenFlow flow tables. As you only need the match fields to match
with the identifier of an incoming flow, we do not lose speed.
With our architecture, flow matching is done in the following manner. The identifier
constructed from the incoming traffic is used to match against the match fields stored
in the TCAM block. When a match is found, counters are incremented and a pointer to
the memory location in the RAM block where the actions are stored is outputted. Then
those actions can be retrieved from the RAM block. Usually, the number of flow
entries would be too much to store in a single TCAM. So the flow tables (which

33
contain only match fields and stats in this architecture) are arranged in a pipelined
fashion inside the TCAM block (now with several TCAMs). It should be noted that
there exists a one-to-one mapping between the match fields stored in TCAMs and
actions stored in the RAM.
OpenFlow defines a number of possible actions in its latest version, 1.4. In our
architecture, memory is allocated for each flow entry to store all these possible
actions. As we store all the actions in the RAM block, this does not have much impact
on cost. A ‘flag’ is used to indicate which actions have been actually specified. We
refer to this as the ‘action flag’ and it has a width (in bits) equal to the no. of possible
actions. If a particular action has been specified, the bit allocated to that in the flag
would be high. Use of the action flag allows faster execution of actions.
Often, all actions specific to a single flow is not sent all at once by the controller. It
would repeatedly send actions to be added to the flow entry as the network changes. In
typical OpenFlow flow table implementations, these actions are added as new flow
entries to the flow tables. This has a negative impact on both speed and cost. Because
actions for a flow is stored at several places, a flow should now be matched against all
the tables in the pipeline, adding a latency equal to clock period into no. of flow
tables. Also, flow entries are stored in TCAMs, and thus increase their usage, in turn
increasing the cost.
In our architecture, we store each flow entry only once and pre-allocate enough
memory to store all the OpenFlow specific actions in the RAM block. That way, when
the controller sends a new action for an existing flow entry, we just add that action to
the action set of the specific entry and modify the flag to reflect the new action. No
modifications are done in the TCAM block. As memory is pre-allocated, adding the
action only means writing some data to the RAM block. This way, each flow entry is
stored only once and a flow can exit the pipeline as soon as it finds a matching entry,
improving speed.
Often it can be expected that the packets belonging to a specific flow would arrive
at the switch one-after-the other. This observation presents an opportunity to further
increase the speed of forwarding. We introduce a cache memory block in our flow
matching unit, which would hold the most recently matched entries. So, flow
identifiers would first be checked against those in the cache and would proceed to go
through the flow tables only in the case in which a match is not found (in the cache).
If a match is found, the action set (and the action flag) has to be retrieved from the

34
RAM block, as in the usual case. This small cache memory would also be a content-
addressable memory, so the cache can also be checked in one clock cycle.
The identifier for an incoming flow would be passed to the flow matching unit by
the processor and the unit would then find the matching entry and pass the actions to
the processor which would execute them. Due to the use of the action flag, execution
of actions can be carried out reliably at high speed. If a matching entry is not found for
a flow identifier, flow matching unit would indicate this to the processor by passing,
instead of the action set, the identifier itself. Processor would then handle
communication with the controller in order to resolve the issue. All this interaction
between the processor and the flow matching unit is carried out through the
communication interface.
As for programming the data plane, a separate programming interface is introduced.
Processor is equipped with special instructions to handle this task in a fast and
efficient manner. The instructions and procedure would be described in detail when
we describe the processor.
Flow processing unit would take one each from the two buffers introduced above
and would try to match the match-field set with a rule/rules stored in its memory. If
any match/matches are found, relevant actions would be applied to the packet. In the
case where there is no matching rule in its memory, the flow processing unit would
forward the packet to the processer, through the relevant communication channel.
As we would describe in the coming sections, our architecture is stream-lined for
faster flow-processing, the bottleneck we have identified in SDN switches.
Programming interface handles the programming of the flow-processing
unit’s memory. As specified by the central controller, rules may be added, modified
or deleted. These actions are taken care of by the programming interface which can
additionally act as a communication channel between the data plane and the
processor (control plane).
Care was taken to design each of the units as modules of a system; that is
they can be developed separately and then combined with minimum hassle. This
was done partly due to the evolving nature of the OpenFlow protocol. As an
example, if the format of the match-field set is changed in the next version of
OpenFlow, only the preprocessor module will have to be designed. We feel that this
approach is also suited to scale up the switch; if there are a large number of ports,
several preprocessor units can be used. Flow processing unit is designed such that

35
new memory blocks can be inserted to the memory. They would be working in a
pipelined architecture, ensuring that the design can scale up with minimum effect on
speed.
3.2.5 Memory- Action/Rule Memory
Memory stores the rules sent by the central controller. As mentioned earlier,
programming the memory is done by the programming interface, a unit separate
from the flow processing module.
OpenFlow rules consist of a set of match-fields, composed of fields in the TCP/IP
header, a mask that specifies which bits should be matched and a set of actions.
Actions in our implementation is currently limited to drop, forward and set field
types. Other action types specified in OpenFlow 1.4 can be easily integrated into the
design without modifying the architecture.
In this architecture, match-fields and masks are stored in one block of memory and
the actions in another. For applying actions, an action flag is used which does the
same sort of job that a normal mask does.
To store match-fields, we are using TCAMs. This ensures that matching a flow
through all the stored rules only takes a few clock cycles. As specified in the
OpenFlow protocol, the same mask can be used with several sets of match-fields.
TCAMs are essentially high-cost memory blocks and it would be a huge waste if we
store the same mask several times. Because of this, one particular mask would be
stored only once in the memory and the match-field sets (flow entries) that use that
mask is grouped together in the TCAM. While this ensures that memory is not
wasted through redundant storages, the downside is that updating the memory would
now be a costly operation, time-wise. The algorithm we use currently is of the order
of O(n) in time complexity which while not ideal, is a good trade-off with the high
cost incurred otherwise.
Actions relevant to a particular flow entry is stored in a single place. For storing the
actions, and the action flags, RAM blocks are used. A match in the TCAM block
(matching flow entry) would output the memory location of the relevant action set.
As RAM blocks are quite cheap, memory needed for all the actions is first allocated
to each flow entry, regardless of the fact that every action would not be required.
This approach to storing actions, while costly memory-wise, would speed up the
flow-matching unit as all relevant actions can be obtained in one go. Another effect

36
this would have is it would speed u deleting a flow but would slow down modifying
a flow as a search has to first take place before modifying.
To further speed-up flow matching, a flow-cache is used. Every flow is first
matched with the entries in the flow-cache. These entries are the most recently
matched flow entries in the main memory block. As packets belonging to a single
flow most often arrive at a network node simultaneously, we feel use of a flow-
cache would speed up the flow matching.
3.2.6 Execution Engine
Here all the relevant actions are applied to the packet. As we are concerned only
with drop, forward and set field types of actions, the action processor is actually a
hardwired part. This approach is extremely useful as applying actions would then
take a minimum possible time, given the hardware is designed accordingly.
In set filed type actions, the value in a specified field of a packet’s TCP/IP header
will be replaced by a pre-specified value. Therefore, no calculation will be done at
the action processor. Just replacing the value will be done. For this, action flag
supplies the guidelines.
Fig 3.7: How a packet travels through the Data Plane

37
3.2.7 Table Missed Handlers
Handler units are there to facilitate communication with the main processor. Most
common case is when a packet does not match with any of the flow entries. In such
a case, the packet would be forwarded to the handler unit, which would then send it
to the processor, freeing up other parts inside the flow processing unit to carry out
the usual tasks.
3.2.8 Alternative Architectural approaches considered for the Flow Processing
Unit
We considered a couple of different approaches when deciding on the architecture of
our flow processing unit. As this is the unit that impacts the throughput of the switch
the most, we had to take into account both reliability and speed when selecting an
approach. Advantages and disadvantages of both the approaches we considered are
given below.

38
I) Approach 1
Fig 3.8: Alternative Approach I
In this approach, flow identifier part and actions part of a flow entry is stored at
different locations. For storing the identifier, a high speed memory block would be
used (like a TCAM) and the actions would be stored in a normal RAM block.
Identifiers constructed from incoming traffic flows would be matched against the flow
identifiers stored in the TCAM block which would output a memory location in the
RAM block from which to obtain the relevant action set. Then these actions, together
with the packet, would be forwarded to a hardwired unit for action execution.
Advantages
a. Removing a flow entirely is easy since all the actions corresponding to one
flow entry are stored at a single place (on the Action Set RAM block).
b. As action processing is hardwired, actions can be applied to each flow very
quickly, increasing speed (This is possible as action set corresponding to a
flow consists of only the new values that certain fields of the flow would

39
be changed into, and applying actions mean replacing the old ones with
these new values.).
c. Whole unit for flow matching and carrying out the required actions is
hardwired, increasing the speed of flow processing and thus enabling
handling a relatively large number of flows in less time.
Disadvantages
a. Adding a new flow entry requires some processing, as the values that entry’s
actions will result in have to be calculated.
b. As this calculation is done through a processing unit, adding a new flow entry
and updating an existing one is slower.
c. Average case memory requirement is large (equal to worst case requirement)
as in the action set, all fields to which an action may be applied are stored,
even if no action is currently been applied to some of these fields.
Several flow matching units can be pipelined if a large number of flows are required
to be handled. Even so, there will be only one flow entry and corresponding action set
and flag for any one flow. Even if all possible actions are to be applied to a particular
flow, and the SDN controller sends those updates one-by-one, the values resulting
from these actions are first calculated and the action set corresponding to that flow is
updated, a new flow entry or action set is not added.

40
I) Approach 2
Fig 3.9: Alternative Approach II
In this approach, flow identifier part of a flow entry would be stored every time the
controller sends a new action. Flow tables here are also implemented with high speed
TCAM blocks.
Advantages
a.Easy to add a new flow entry. Minimal amount of processing needed as the
entry is just added to the flow table with enough space.
b.If we are handling flows with just a couple of actions, memory requirement
outperforms approach 1.
Disadvantages
a.Takes more time to delete a flow entirely, as entries corresponding to that
flow can be anywhere.
b.Worst case memory requirement is higher than that of approach 1, as for
each updated action, details about the flow are also stored.
c.If there are a large amount of flow entries (not flows), and we have to use ‘n’
number of pipelined flow tables, same number of action set blocks are

41
needed to store the actions of the ‘n’ flows that are been currently processed
in the tables.
d.We need to have links between each action set block and flow table. When
‘n’ flows/packets are processed in the pipelined flow table block, the worst
case is when each flow/packet has an entry in each of the flow tables.
Therefore, to keep the actions of all flows/packets distinct, we need ‘n’
blocks for storing action sets. Also, there should be a method to know to
which of these blocks we should save an action for a particular flow/packet.
Considering advantages and disadvantages offered in both architectures, we decided
on the first approach, as we feel that approach would be better in terms of speed and
also that it is less costly to implement.
3.3 Customized network processor
3.3.1 Instruction set architecture
One of the main objectives of SDN is to add flexibility to networks to cope with the
dynamic traffic patterns of today. This is achieved through making the network
programmable; centralizing the control plane of networking as a software entity. In
keeping with this principle of flexibility, major component in our SDN architecture is
a customized network processor. Majority of SDN related activities are processor’s
responsibility, including handling the OpenFlow protocol. The processor we have
introduced is a customized RISC processor and thus has many attributes typical of a
RISC processor. Customizations were done in order to make the processor more
‘aware’ of SDN and OpenFlow.
Instruction Set Architecture (ISA) can be considered as an integral part in defining
the performance of a processor. In the processor we introduce, we have deviated from
the typical RISC ISA to accelerate OpenFlow and SDN related tasks like writing
entries to the flow tables, handling packets, etc.
We use 32-bit long instructions with 6 bits reserved for the opcode. Instructions use
(mostly) registers as operands and thus, the ISA can be categorized as following the
load/store architecture. Several different instruction formats are used and these are
illustrated in the diagram below.

42
Fig 3.10: Instruction Formats
Our ISA can be divided into three broad categories as;
I. Memory Access and Control Instructions
II. ALU Instructions
III. OpenFlow Instructions
Instructions to access memory and to control the execution flow, as found in typical
load/store architecture, are in this category. Most of these instructions use two register
operands and a 16 bit constant.
As an example, LOAD instruction is explained below.
LOAD R1 R2 12364
Here opcode (LOAD) would take up the first 6 bits and the next 10 bits are for the
two registers and the final 16 for the constant. When executed, this would load the
memory content, at the address specified by the constant + the offset in R1, in to R2.
As an example for a control instruction, let’s look at BREQ.
BREQ R1 R2 12456
Instruction length of 32 bits are used same as in the LOAD instruction explained
earlier. When executed, it would check whether the contents in R1 and R2 are equal
and if so, it would change the control flow to the instruction at the memory location
specified by the constant.

43
Category of ALU operations can be further divided into two sub-categories, ALU
operations and ALU operations with constants. Such a division was decided on as the
processor should support a high-speed data plane and there are several OpenFlow
related tasks that require arithmetic operations with constants. One possible alternative
we considered was storing those special constant values in a separate part of memory
and performing a LOAD and a normal ALU instruction. But when you take speed into
consideration, this approach was found to be slower, by a couple of clock cycles.
When we consider normal ALU operations, instructions generally have 3 register
operands (a couple of exceptions, like NOT, INC do exist). As previously stated,
registers are indicated using 5 bits. These add up to 15 bits and the opcode is 6 bits;
last 15 bits are ignored.
As an example, let’s consider ADD.
ADD R1 R2 R3
When executed, this would add the values in R2 and R3, and would store the result
in R1.
The other type of ALU operations, ones with constants, has a slightly different
format. These also have 3 operands, but only two are register operands. Last 16 bits in
the instruction are reserved for the constant. As an example, let’s consider ADDC.
ADDC R1 R2 10098
Here, value in R2 is increased by the specified constant (10098 here) and the result
is stored in R1.
As explained earlier, primary focus of the processor is to facilitate OpenFlow and
handle traffic reliably. These tasks should also run in as little time as possible, to
facilitate higher throughput. Therefore, a special category of custom instructions,
‘OpenFlow Instructions’, were designed to handle them.
Some of these instructions deal with handling flow matching and processing in the
data plane and some with programming the data plane as instructed by the SDN
controller. These instructions, together with their purpose, are listed below.

44
Instruction Operands Description
DEQUE Dequeue packet from packet-in queue
ENQUE R1 Enqueue last processed packet into queue given by R1
DPLRD R1 R2 A Read from flow tables
DPLWR R1 R2 A Write to flow tables
PDROP Drop last processed packet
TABLE R1 Chose flow table given by R1
LKPT R1 N
CRC R1 R2 R3 Carry out Cyclic-Redundancy-Check (CRC)
CHKSM R1 R2 R3 Introduce Checksum field
Table 3.1 : Custom Instructions
3.3.2 Complete instruction set architecture
INSTRU
CTION
DESCRI
PTION
MODIFIED4 MODIFIED4 DESCRIPTION
DEQUE 100001 6+20+6
DEQUEUE PACKET FROM
QUEUE
ENQUE 100010 R1 6+5+5+16
ENQUEUE PACKET INTO
QUEUE R1
PDROP 100011 6+20+6 DROP PACKET
DPLRD 100100 R1,R2,A 6+5+5+16 OPENFLOW BUFFER READ
DPLWR 100111 R1,R2,A 6+5+5+16
DATA PLANE PACKET
WRITE
MEMORY ACCESS & CONTROL
LOAD 010001 R1,R2,A 6+5+5+16 LOAD DATA FROM MEMORY
STORE 010010 R1,R2,A 6+5+5+16
STORE DATA INTO THE
MEMORY
LDILW 010011 R1,C 6+5+21
LOAD IMMEDIATE VALUE
INTO LOWER HALF OF THE
WORD
LDIHG 010100 R1,C 6+5+21
LOAD IMMEDIATE VALUE
INTO UPPER HALF OF WORD
COMP 010111 R1,R2 6+5+5+5+5+6
COMPARE AND SET ALU
STATUS BIT

45
BR 011000 A 6+20+6 UNCONDITIONAL BRANCH
BREQ 011001 R1, R2 6+20+6 CONDITIONAL BRANCH
BRL 011010 R1, R2 6+20+6 CONDITIONAL BRANCH
BREQL 011011 R1,R2 6+20+6 CONDITIONAL BRANCH
NOP 010000 6 NO OPERATION
JMP 011100 A 6
JUMP TO
INSTRUCTION(ABSOLUTE)
ARITHMETIC & LOGIC INSTRUCTIONS
ADD 000000 R1,R2,R3 6+5+5+5+5+6 ADD
SUB 000001 R1,R2,R3 6+5+5+5+5+6 SUBTRACT
MULT 000010 R1,R2,R3 6+5+5+5+5+6 MULTIPLY
INC 000011 R1 6+5+5+5+5+6 INCREMENT
DEC 000100 R1 6+5+5+5+5+6 DECREMENT
FWD 000101 R1,R2 6+5+5+5+5+6
FORWARD THE R1 VALUE TO
R2
SHL 000110 R1,R2 6+5+5+5+5+6 SHIFT LEFT
SHR 000111 R1,R2 6+5+5+5+5+6 SHIFT RIGHT
AND 001000 R1,R2,R3 6+5+5+5+5+6 LOGIC AND
OR 001001 R1,R2,R3 6+5+5+5+5+6 LOGIC OR
XOR 001010 R1,R2,R3 6+5+5+5+5+6 LOGIC XOR
NOT 001011 R1,R3 6+5+5+5+5+6 NEGATION
RDBT 001110 R1,R2,R3 6+5+5+5+5+6
READ C1 INTO R2 FROM R1
STARTING FROM C2
CMP 001111 R1,R2,R3 6+5+5+5+5+6
COMPARE R1 & R2 AND
STORE IT IN R3
ADDC 110000 R1,R2,C 6+5+5+16 ADD CONSTANT
SUBC 110001 R1,R2,C 6+5+5+16 SUBTRACT
MULTC 110010 R1,R2,C 6+5+5+16 MULTIPLY
INCC 110011 R1,C 6+5+5+16 INCREMENT
DECC 110100 R1,C 6+5+5+16 DECREMENT
FWDC 110101 R1,R2 6+5+5+5+5+6
FORWARD THE R1 VALUE TO
R2
SHLC 110110 R1,R2,C 6+5+5+16
SHIFT R1 LEFT BY
IMMEDIATE C VALUE INTO
R2
SHRC 110111 R1,R2,C 6+5+5+16 SHIFT R1 RIGHT BY

46
IMMEDIATE C VALUE INTO
R2
ANDC 111000 R1,R2,C 6+5+5+5+5+6
LOGIC AND BY IMMEDIATE C
VALUE
ORC 111001 R1,R2,C 6+5+5+5+5+6
LOGIC OR BY IMMEDIATE C
VALUE
XORC 111010 R1,R2,C 6+5+5+5+5+6
LOGIC XOR BY IMMEDIATE C
VALUE
RDBTC 111110 R1,R2,C2,C2 6+5+5+5+5+6
READ C1 INTO R2 FROM R1
STARTING FROM C2
CMPN 111111 R1,R2,C1,C2 6+5+5+5+5+6
COMPARE C1 BITS IN R1 & R2
STARTING FROM C2
CMPC 111100 R1,R2,C 6+5+5+5+5+6
COMPARE R1 AND CONSTANT
C RETURN
TABLE 101010 R1 6+5+5+16
CHOOSE TABLE POINTED BY
R1
LKPT 101011 R1,N 6+5+5+16
CLASSIFICATION FOR N
ENTRIES(RESTORE VALUE
INTO R1)
CRC 101100 R1,R2,R3 6+5+5+5+5+6
CRC STARTING AT R1 FOR R2
BYTES (RESULT IN R3)
CHKSM 101101 R1,R2,R3 6+5+5+5+5+6
CHECKSUM STARTING AT R1
FOR R3 BYTES (RESULT R3)
Table 3.2: Complete Instruction Set
3.4 Micro-architecture
Micro-architecture of the processor is a hardwired pipelined architecture which
support several pipelined stages including instruction fetch, decode and few
execution cycles. Data path of the different instruction formats consume different
number of clock cycles which is used as a performance evaluation parameter as the
initial development process. And also we have implemented finite state machine
(FSM) for the control unit. Combination of the control unit with multi-cycle data
path support our custom ISA. Micro-Architecture itself is based on the hardwired
RISC approach that support RISC instruction set. In addition to some network
functions and some generic processor functions we need to support our custom

47
instruction to accelerate packet processing and SDN data plane access. Therefore we
accommodate the SDN data plane program access control, SDN data plane
information access control and OpenFlow buffer control into the processor.
Therefore In our micro architecture FSM control will take care of the controlling
perspectives of this actions while data path itself have support for the accessing
those data plane functions. Therefore our processing stages can be marked as
follows.
I) Instruction Fetch
II) Instruction decode and data fetch
III) ALU operation & Net-R (format IV) operations
IV)Memory access or Data Plane access or R-format instruction completion
a. OpenFlow Data Plane Access
i. OpenFlow Buffer Access
1. READ Buffer ID
ii. Data Plane Program Access
1. WRITE Program ID
iii. Data Plane Info Access
1. READ Data Plane Information to upper layers.
V) Memory access completion or data plane access completion
Apart from the ISA, microarchitecture is also central to a processor’s performance.
Our processor’s microarchitecture can be broadly described as a hardwired, pipelined
architecture. It supports several pipelined stages; fetch, decode and a few execution
cycles (this depends on the specific instruction). Microarchitecture is based on that of
a typical, hardwired RISC processor. Few customizations have been introduced to
accelerate packet processing and handling OpenFlow related tasks. An overview of
the processor microarchitecture is given below. It should be noted that we have
introduced an ‘OpenFlow register’ (OFR) and OpenFlow read (OFRD) and write
(OFWR) control signals in order to accelerate OpenFlow related tasks. Instruction and
data memories are kept separate from the flow tables (flow matching unit memory). In
our processor architecture it is a multi-cycle data path controlled by the finite state
control machine which is inside the control unit.

48
Fig 3.11: Finite State Machine of the Processor
Microarchitecture is capable of completing execution of all instructions in five or a
lesser number of clock cycles. Number of clock cycles required for each instruction is
listed below.
 LOAD, DPLRD – 5
 STORE, DPLWR – 4
 ALU Instructions – 4
 All other instructions – 3

49
Chapter 4
DESIGN AND IMPLEMENTATION (METHODOLOGY)
4.1 Design & implementation
In the last Chapter we have already discuss the architecture of the system. Therefore
we have understood that main components of our total system. The next step was to
build the system in stages. Development cycle of the system consist of several design
stages which will be elaborated in this chapter. It is the block diagram of the
implemented system.
Fig 4.1: Overview of the Hardware Implemetation
Final System is looks like the above figure which consist of not only the main cores
that we developed, but also other support core and designs we developed to extend our
project into SDN switch with the virtualized SDN/NFV application layer where
network engineers can build their own app on virtualized environment. Our design

50
and implementation stage on this system can be categorized as follows after initial
architecture.
Phase 1: High Level Synthesis Model with C/C++
Phase 2: RTL design and Verification of SDN Data plane & Processor
Phase 3: RTL design and Verification of the PCI Subsystem and Ethernet Switch
Fabric
Phase 4: System Integration and Test
Phase 5: Development of SDN/NFV Application Layer with SDN App Store
4.2 Phase 1: high level synthesis or c/c++ model
In this phase our main target was to build the software model of our system. Since we
have extend the project to SDN data plane with some dedicated logic to support the
OpenFlow aware network processor. We have developed the architecture on software
model and test the design on software model. Most of the time we develop the
software modules according to our hardware. We developed following vital
component of the system with C.
I) Program to get network traffic/packets as pcap file with WinPcap on
Windows.
II) Classification engine to identify flow types.
III) Flow Formatter to extract SDN match fields.
IV) Flow Matching engine with software TCAMs to match flows.
V) Execution Engine and Table Missed Packet Handler to apply SDN policy
rules.
VI) Processing functions emulating OpenFlow processor.
And also we use Mininet simulation platform with floodlight SDN controller
to realize the SDN/OpenFLow work flow. We developed I/O system of the software
model to map into files which are used to read and write inputs and outputs and to
analyze them. TCAMs are organized into text files which were emulating the original
TCAMs behavior but in sequential manner. Therefore updating and searching time is
higher than the actual scenario.

51
PCAP
FILE
CLASSIFICATI
ON ENGINE
FLOW
MATCHING
ENGINE
ACTION
RULES
EXECUTION
ENGINE
PROCESSORHANDLER
And also we developed the ISA architecture for the processor in this stage with
C/C++ model. This stage was more helpful to optimize the ISA and change the initial
ISA according to requirement. Initially we developed the program to directly convert
the Assembly level programming file to run on the processor and did small
calculation. Initially processor was developed a programming model with functions
that do the same thing as the ISA did. Initially what we wanted to do is that verify the
data plane functionalities all along with the flow classification. Matching and
execution engines with processing task.
Fig 4.2: Test Procedure
4.3 Phase 2: RTL design & verification of data plane & processor
Then we entered into the RTL (Register Transfer Level) design stage of the system.
Our objective was to implement the system on Xilinx FPGA hardware platform,
therefore we are trying to make some of our cores compatible with the Xilinx
Hardware. But we developed our core functional modules to be more generalized and
can be worked on any hardware and even also as the separate IP Core.
We used Verilog Hardware Descriptive Language (HDL) to develop the
system. Initially we developed the core modules that were essential parts of the SDN
data plane subsystem and verify each and every modules. For the verification process
we used Verilog test benches and System Verilog support at some points.
I) Flow Classification Engine
II) Flow Matching Engine with OpenFlow Pipeline/Flow Tables
III) Flow Policies/Rule Memory
IV) Flow Execution Engine
V) Table Missed Packet Handler

52
VI) OpenFlow Meters
VII) SDN Programming interface
VIII) Internal FIFO Buffers to buffer Packets.
After all the relevant modules of the SDN Data plane we integrated them into
the single Data plane module and test them. Then these modules were organized into
pipelined architecture inside the Data Plane. If we consider n stage OpenFlow
pipeline, then we have n+4 pipeline stages inside this SDN data plane which makes
the Data Plane faster. And also we developed the SDN programming interface to
expose the Data plane for fast and reliable programming from the processor itself. We
developed the communication interface from data buffers to access buffered packets
from the processor. And also with the SDN programming interface we have the
programming interface to processor and other relevant hardware to program data
plane and push policies. This interface is not only for programming purposes, but also
for the configuration management and information access form the data plane outside.
Thus we created the programming interface and packet interface to outside which can
be connected to processor to push and pull packets as well as access and push
programming and configuration information to data plane.
Technologies used for the project consist of Xilinx Tools as well as Xilinx
hardware. We developed the system on Xilinx Vivado and Xilinx ISE design tools.
Therefore we essentially use their design tools and flow including synthesis, place and
route, bit stream generation ad hardware programming. And also Xilinx ILA
(Integrated Logic Analyze) and tools such as ChipScope was used at the hardware
debug stage.
Then the RTL design stage of the processor has been taken place. We have
mostly finalized the ISA (Instruction Set Architecture), then we had to design the
micro-architecture for the processor. As we discussed in the architecture chapter, we
used RISC architecture for the OpenFlow aware network processor. Processor consist
of several sub modules.
I) Program Counter
II) IR (Instruction Register) & MDR (Memory Data Register)
III) MDR (Memory Data Register) and ODR (OpenFlow Data Register)
IV) Register File

53
V) ALU (Arithmetic & Logic Unit)
VI) Control Unit (Finite State Control)
VII) Multiplexers
VIII) Memory and OpenFlow Data Plane Access interface
Processor micro architecture is also developed in pipelined architecture and we
reduced the hardware logic. Importance of the multi stage pipeline architecture is that
it can reduce the clock cycle and thus fasten the clock frequency.
The processor also developed with the Verilog HDL on Xilinx tools. The
processor verification ran with the help of compilation program developed by us to
convert assembly code into the binaries.
Micro-architecture of the processor is a hardwired pipelined architecture which
support several pipelined stages including instruction fetch, decode and few execution
cycles. Data path of the different instruction formats consume different number of
clock cycles which is used as a performance evaluation parameter as the initial
development process. And also we have implemented finite state machine (FSM) for
the control unit. Combination of the control unit with multi-cycle data path support
our custom ISA. Micro-Architecture itself is based on the hardwired RISC approach
that support RISC instruction set. In addition to some network functions and some
generic processor functions we need to support our custom instruction to accelerate
packet processing and SDN data plane access. Therefore we accommodate the SDN
data plane program access control, SDN data plane information access control and
OpenFlow buffer control into the processor. Therefore In our micro architecture FSM
control will take care of the controlling perspectives of this actions while data path
itself have support for the accessing those data plane functions. Therefore our
processing stages can be marked as follows.
I) Instruction Fetch
II) Instruction decode and data fetch
III) ALU operation & Net-R (format IV) operations
IV)Memory access or Data Plane access or R-format instruction completion
a. OpenFlow Data Plane Access
i. OpenFlow Buffer Access

54
1. READ Buffer ID
ii. Data Plane Program Access
1. WRITE Program ID
iii. Data Plane Info Access
1. READ Data Plane Information to upper layers.
V) Memory access completion or data plane access completion
VI)Sometimes some pipelined stages are overlapped to get higher throughput. In
initial stage we have following observed clock cycles for the different
instruction formats.
VII) Load: 5 states /clock cycles
VIII) Store: 4 states
IX)OpenFlow Data Plane Access
a. OpenFlow Buffer Read: 5 clock cycles
b. OpenFlow SDN Data Plane Program Write: 4 clock cycles
c. OpenFlow Data Plane Info Access: 5 clock cycles
X) R-format ALU instructions: 4 states
XI)NET-R-format instructions: 4 states
XII) Branch: 3 states
XIII) Jump: 3 states
Thus we have different clock cycles for different instructions in the execution
process with minimum of having a single clock cycle.
4.4 Phase 3: PCI express subsystem & ethernet subsystem
As we are aimed to deliver initial version of a SDN Switch that is not enough to have
SDN data plane and Processor itself. We need additional hardware for Ethernet sub
system including Switch fabric and PCI Express v2 subsystem to communicate and
send information into application layer. Therefore we had to build two sub systems
based on the three main Xilinx IP Cores.

55
IP Cores
I) 1G/2.5G Ethernet PCS/PMA Core v15.0
II) Tri-Mode Ethernet MAC Core v9.0
III) 7 Series Integrated Block for PCI Express 3.1 (PCIeGen2x8If128)
Fig 4.3: Ethernet Switch Fabric
And apart from these three major IP Cores we have used FIFO IP Cores to facilitate
buffers. For the Ethernet Switch Fabric we deployed PCS/PMA core and the Tri-
Mode Ethernet MAC core. Xilinx Virtex 7 VC707 FPGA board has one Ethernet port
and can be extended with the Ethernet FMC card. And Xilinx Virtex 7 board consist
of Marvell M88E1111 PHY chip which has SGMII (Serial Gigabit Media
Independent Interface) as the MII (Media Independent Interface). Tri-Mode Ethernet
MAC core doesn’t have that support. Xilinx Tri-Mode Ethernet MAC Core can be
used with GMII, RGMII modes. Therefore we had to use SGMII to RGMII convertor.
Therefore we deployed Sub PCS layer known as Xilinx 1G/2.5G Ethernet PCS/PMA
core to convert SGMII to RGMII mode. On top of the PHY and PCS/PMA sub layer
we deployed the Tri-Mode Ethernet MAC Core. Upon the tri-mode Ethernet MAC
core we have deployed our additional cores related to switching fabric with packet
buffers and control interfaces. Then we connected the SDN Data Forwarding plane
with the incoming packets from the Ethernet Switch Fabric.

56
We developed couple of Finite state machines for the input arbiter and output
arbiter logic. The Data Buffers were designed using Xilinx FIFO generator in the IP
catalog.
We used the Riffa wrapping modules with Xilinx 7 Series integrated PCI
Express Core PCIeGen2x8If128 to send data to upper layers. In PCIe interface we can
use up to 12 channels. In our case we used 3 out of 12 channels, one to monitor input
data traffic to our core system, other one for monitoring output traffic from our core
system and the last one for programming and configuring the data plane with the
application layer requirements. For the Riffa we use their drivers in the Linux machine
to get the PCIe interface traffic into our machine. Apart from the cores the all the
state machines to send and receive data are deployed inside the Riffa PCIe wrapping
modules. All the standard communication interfacing methods are used such as AXI
data.
Fig 4.4: Test Architecture with Riffa
More information of the IP Cores that we used can be found in the Appendices at the
final pages.

OpenFlow Aware Network Processor

OpenFlow Aware Network Processor

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to OpenFlow Aware Network Processor

Similar to OpenFlow Aware Network Processor (20)

More from Mahesh Dananjaya

More from Mahesh Dananjaya (14)

Recently uploaded

Recently uploaded (20)

OpenFlow Aware Network Processor