Fpga implementation of scalable queue manager


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Fpga implementation of scalable queue manager

  1. 1. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN INTERNATIONAL JOURNAL OF ELECTRONICS AND 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMECOMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)ISSN 0976 – 6464(Print)ISSN 0976 – 6472(Online)Volume 4, Issue 1, January- February (2013), pp. 79-84 IJECET© IAEME: www.iaeme.com/ijecet.aspJournal Impact Factor (2012): 3.5930 (Calculated by GISI) ©IAEMEwww.jifactor.com FPGA IMPLEMENTATION OF SCALABLE QUEUE MANAGER Ms. Sharada Kesarkar#1, Prof. Prabha Kasliwal#2 Department of Electronics, MAE, Alandi (D) University of Pune, India. 1 sharada.kesarkar@gmail.com, 2prabha.kasliwal@gmail.com ABSTRACT The main issue while designing a network system is very often its memory system. This is mainly due to the constantly changing nature of network traffic and demand to realise quality of service (QoS) in networks. Per flow queuing is an effective solution to guarantee QoS but its brute force implementation consumes a huge amount of memory because its assign dedicated physical queues to each in-progress flow, hence QM need to maintain a large number of queues. It require huge amount of memory and hence memory scalability is becoming critical issue also system performance may be degrades as the number of flow increase. To achieve per flow queuing performance with less memory a scalable queue manager (QM) solutions are vital with fixed architectures and efficiently management of queues. In this paper, we present a proposed FPGA- implementation of scalable QM architecture which is able to manage memory resources efficiently by dynamic queue sharing (DQS) techniques. DQS provide isolation of each in coming active flow by dynamically allocating ongoing active flows onto a limited number of physical queues instead of assigning a dedicated queue for each in-progress flow. Practically, the number of active flows is always low which significantly reduce the required physical queue from millions to hundred hence required memory resources will be less. DQS for per flow and per class system is designed, implemented and simulated in Xilinx Ise simulator using Xilinx family device spartan 3 xc3s4000. The proposed advanced algorithm enables the architecture to work in per flow and per class mode which dramatically reduce the queue exhaustion. Keywords: Field programmable gate arrays (FPGAs), Queue manager (QM), DQS, Active flow, per flow queuing 79
  2. 2. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMEI. INTRODUCTION The constantly changing nature of network traffic and demand to realise quality ofservice (QoS) in networks require packet processing functionality in networking devices tosupport varying QoS levels.[1] It mainly responsible for holding arriving packets during thetime of traffic congestion to smooth the burst of internet traffic. In a packet buffering system,related studies fall into three categories: enqueue mechanism, dqueue mechanism and queueorganization mechanism. [2] Queue organisation is the basis of the queuing system. It decideshow to implement queue in a buffer and how to assign flow with physical queue. If thenumber of physical queues is larger than the number of active flows (the flow having thepacket in the queue), an isolated queue is assigned for each flow to maintain required QoS;otherwise several different flows would share a same queue and the QoS guarantee of eachflow is violated and also increase the queue exhaustion.per-flow designs assign separate physical queues to each in-progress flow but one problemwith this is that number of flows carried over links in the current networking system iscontinuously increasing [3] hence whenever there are more flows than the number of physicalqueues, several different flows will compete for the same resources and the QoS guarantee ofeach flow is violated hence QM need to maintain a large number of queues. But withincreasing number of flows, it is difficult to maintain and manage large number of queuesalso it require huge amount of memory and hence memory scalability is becoming criticalissue. Thus, efficient use of memory resource is necessary in achieving scalable queuemanagement system.To overcome these limitations we propose a FPGA implementation of scalable QM bydesigning the proposed advanced algorithm. We monitor the number of active flowscontinuously. Practically the fact that most active flows have short time scales [2], [4], [5].On the other hand, the number of active flows is always low compared to the total physicalflow. Based on this, we proposed techniques to implement scalable QM. The idea is that weonly assign physical queues for ‘active flows’ instead of all in-progress flows in networkwhile maintaining the per-flow queuing feature. We used total 10 fix queues which are usedto store packets from different in coming active flows. If queue is full provide a queue fullindication to use another free queue.Whenever the particular output port is available it send packet to that port and make queuefree. Hence the number of required physical queues can be reduced from millions tohundreds. We proposed the advanced algorithm which enables the architecture to work inboth per flow and per class mode depending on the current traffic condition (i.e. whether it isincrease or smooth) to dramatically reduce the queue exhaustion and also reduce the numberof packet loss. Key feature is:-To dynamically assign the separate physical queue to only active flow to maintain the perflow queuing features.-Efficiently use the memory resources by reducing the required number of physical queues inpre class mode and-Maintain the more number of input flows on same number of queue.Queue exhaustion phenomena is observed in traditional per-flow QMs as there are more flowthen available physical queue [7]; a straightforward solution is to increase the number ofphysical queues equal to the number of input flow but this requires more memory and alsodifficult to maintain these huge number of physical queue. By allocating dedicated queues forsimultaneous active flows (An active flow is defined as the flow having packets buffered in 80
  3. 3. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMEthe devices) instead of all in- progress flows, the QM is able to reduce the required physicalqueues from millions to hundreds [10]. However in practice, traffic flow of current network iscontinuously varying which can impact performance of network system. Thus, unfavourablenetwork traffic can contend for the available queue resources, leading to queue exhaustion(i.e. all of the physical queues are occupied) and also causing major system performancedegradation. Dynamic queue sharing (DQS) is used to share a small number of physicalqueues among active flows in a specific system. This technique provides solution on queueexhaustion for some extent. However, no QM solution exists that implements active flowsharing for queue management in a scalable fashion.II. THE QUEUE MANAGER The block diagram of the proposed queue manager is shown in Fig. 1. In thisarchitecture, packets are arriving from the input port and are departing from the output port.The system is controlled through the queue controller. This architecture uses memory devicesto implement the queues i.e. queue bank. There is a queue controller which is responsible forwriting (reading) data packets to (from) the queues. The main function of the QM is to storethe incoming packets in certain individual queues to the data memory, read the stored packetsand forward those to the data packet flow whenever the output ports will be free. Fig. 1outlines the architecture of the QM. Fig.1: QM ArchitectureIt consists of following main blocks: queue Controller (QC), Queue bank, segmentation,active flow indication, data packet flow. The data packet is received from the input port. ifinput flow is active flow this is indicated by active flow indication and sends it tosegmentation module. It performs the de-framing operation and separate data bits frompacket. The queue controller takes data and search for free queue to write data into it. if freephysical queue is available then QC write data into that queue and make it busy queue. ifqueue is full it provide an indication for queue full to search for new free queue to write datainto it. This is called queue write process. The queue read process works by sending the datato data packet flow which frame the packet and send it to output port if it is available andmake the queue free. As long as there are enough free queues, these queues will be used fornew incoming active flows. QC is responsible to monitor the number free queues if it is lessthan cut off level it will select the per class mode otherwise the architecture works in defaultper flow mode.The architecture can be work in two mode i.e. per flow QM and per class QM. Wheneverthere is increase in traffic and number of free queue less than cut off level then QC allows 81
  4. 4. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMEthe architecture to select per class mode, this will reduce queue exhaustion in the per flowQM by reducing the number of incoming active flows that can be done by organising theinput active flow into a class of flow. Similarly, the QC allows the reverse operation in orderto resume per flow mode, whenever there is a smooth traffic and the number of free queuegreater than cut off level. The system can protect itself from queue exhaustion by selectingany mode (i.e. per flow QM or per class QM). This can be implemented by designingAdvance algorithms.III ALGORITHMSArrivalSearch for active flowif yescheck for queue fullif null write the packets in to queueelse allot a new free queue to write packet and make it busyDepartureif read signal send packet to output port from corresponding queueif queue empty make it freeelse nullMain Algorithmbusy queues reach to cut offif null work in per flow modeelse work in per class modeIV. FPGA IMPLEMENTATION The described architecture was implemented using VHDL hardware descriptionlanguage. We have used Xilinx 9.2i. We have implemented ten fixed size queues for thetwelve minimum input flows. The QM store and forward the packet to the output port. Thewhole architecture is controlled by the QC which takes input packet and writes it to the freequeue. If this queue is full it will allot another free queue to write packets.Whenever the output port is free QC send packets from corresponding queue to that port.While doing this the QC always monitor the number of free queue if they are less than the cut 82
  5. 5. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMEoff level then it will select the per class mode and run the same architecture in that modeotherwise the architecture run in the per flow mode.We have run the architecture in per flow and per class mode on 3s400fg456-5 device andprovide the simulation result to calculate required number of physical queues, logic elementand Memory utilization. We have observed the result per the following table. Decrease Advance Parameter DQS rate w.r.t. d Decrease algorithm to DQS algorithm algorithm memory Memory size in 242.2 308.7 -66.5 27.06% Utilization Mb Number 5257 2544 2713 51.60% of Slices Number of Slice 5904 1020 Device 4885 82.72% Flip 82% 14% Utilization Flops Number of 4 5441 4972 649 8.61% input 75% 69% LUTsV. CONCLUSION The main objective of any algorithm is to be implemented on FPGA to determine theefficient Memory utilization together with the minimum number of physical queues anddevice utilization. In this study, the algorithm for the proposed design is described in VHDLhardware description language and the logic is tested in Xilinx-Ise simulator. Thesimulated design to be optimized for Memory resources and device utilised using Xilinxfamily device spartan 3 xc3s4000. DQS algorithm operates on 126MHz while the advancedalgorithm works on 137MHz. We got the result that without the introduction of the scalableQM mechanism, per flow queuing suffers from a very high queue exhaustion and requiredhigher memory resources compare to the per class mode, also by implementing Advancedalgorithms, the packet dropped rate is again reduces dramatically, device utilization andmemory resources are less required in per class QM.VI. REFERENCE[1] Qi Zhang, Member, IEEE, Roger Woods, Senior Member, IEEE, and Alan Marshall,Senior Member, IEEE, “ An On-Demand Queue Management Architecture for aProgrammable Traffic Manager” ieee transactions on very large scale integration (vlsi)systems,pp1849-1862,Octomber 2012 [2] C. Hu, Y. Tang, X. Chen, B. Liu, “Dynamic Queuing Sharing Mechanism for Per-flowQoS Control,” IET Communications, Vol. 4, No. 4, pp.472-483, Mar. 2010.[3] Y. Xiao, et al.: “Internet Protocol Television (IPTV): The killer application for the Next-Generation Internet,” IEEE Communication Magazine, pp.126-134, Nov. 2007. 83
  6. 6. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEME[4] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell, T. Seely, C.Diot, “Packet-level traffic measurements from the Sprint IP backbone,” Network, IEEE, vol.17, no. 6, pp. 6-16, Nov-Dec. 2003[5] A. Kortebi, L. Muscariello, S. Oueslati and J. Roberts, “Evaluating the number of ActiveFlows in a Scheduler Realizing Fair Statistical Bandwidth Sharing”, ACM SIGMETRICS05,Banff, Canada, Jun. 2005.[6] A. Nikologiannis, I. Papaefstathiou, G. Kornaros, C. Kachris, “An FPGA-based QueueManagement System for High Speed Networking Devices,” Elsevier Journal on"Microprocessors and Microsystems", special issue on FPGAs, Vol. 28, Issues 5-6, pp. 223-236, Aug. 2004.[7] H. Jonathan Chao and Bin Liu, “High Performance Switches and Routers,” ISBN:9780470053676, John Wiley & Sons Publication, 2007.[8] M. Alisafaee, S. M. Fakhraie, M. Tehranipoor,” Architecture of an Embedded QueueManagement Engine for High-Speed Network Devices” IEEE 48th Midwest Symposium onCircuits and Systems, pp 1907 - 1910 Vol. 2, Auguest 2005[9] Jindou Fan, Chengchen Hu, Bin Liu, “Experiences with Active Per-flow Queuing forTraffic Manager in High Performance Routers,” Proc. Of IEEE ICC 2010, Cape Town, SouthAfrica, May 23-27, 2010.[10] Chengchen Hu, Yi Tang, Xuefei Chen, Bin Liu” Per-flow Queueing by Dynamic QueueSharing”, 26th IEEE International Conference on Computer Communications. IEEE, pp.1613 – 1621, May 2007[11] Qi Zhang, Roger Woods and Alan Marshall:”A Scalable and Programmable ModularQueue Manager Architecture” ECIT Institute, Queen’s University Belfast, Queen’s Road,Queen’s Island, Belfast, BT3 9DT, N. Ireland[12] I. Hadzic and J. M. Smith, “Balancing performance and flexibility with hardwaresupport for network architectures,” ACM Trans. Comput. Syst., vol. 21, no. 4, pp. 375–411,Nov. 2003.[13] Sriadibhatla Sridevi, Dr. Ravindra Dhuli and Prof. P. L. H. Varaprasad, “FPGAImplementation Of Low Complexity Linear Periodically Time Varying Filter” Internationaljournal of Electronics and Communication Engineering &Technology (IJECET), Volume 3,Issue 1, 2012, pp. 130 - 138, Published by IAEME.[14] Mrs.Bhavana L. Mahajan,Prof. Sampada Pimpale and Ms.Kshitija S. Patil, “FPGAImplemented Ahb Protocol” International journal of Electronics and CommunicationEngineering &Technology (IJECET), Volume 3, Issue 3, 2012, pp. 162 - 169, Published byIAEME. 84