SlideShare a Scribd company logo
1 of 4
Improving software system load balancing using messaging.

                    Marc Karasek
            System Lead Technical Engineer
                      IVivity Inc.
The paradigm for devices using the PCI bus in a co-processor model has been one of extending
the core functionality of the system. In a typical application the core system has a central CPU
with a companion chip that provides some level of connectivity and functionality to the outside
world. An example of this would be an Intel x86 with a Northbridge/Southbridge chipset. In this
system, the companion chips to the CPU provide connectivity to PCI, AGP, USB, IDE and some
peripherals, such as serial and audio. In order to add more functionality to this system, a PCI
adapter would be added to the system. The CPU would communicate over the PCI bus to the
added peripheral (commonly called a HBA) and send commands and receive data over the PCI
bus. This model isolates the system (CPU + companion chips) from the added peripheral (HBA).
The software in this model also encapsulates what executes on the system versus the added
peripheral. There is a clear separation of which tasks run on the peripheral versus what is run on
the system. This does not allow for moving functionality from the system to the added peripheral
or the reverse easily. The software is fixed in which tasks are run on the peripheral and what is
run on the system. From a system perspective, this is a very rigid system with little or no room
for optimization. You could not, for example, take a processing block running on the peripheral
and move it over to the system. The reason for this is that these two entities, peripheral and core
system, are viewed as separate entities and not part of an overall system.


                                                 Core Sytsem


                                        Driver                 Memory




                                 Command and Control            Data




                                       Mailbox                  DMA

                                                  Peripheral


                                 Figure 1 : Mailbox Control Path


The method of communications used in a host to co-processor has been mailboxes, since the first
ISA card was plugged into a PC. It is still used today, in one form or another, to communicate
with PCI adapters. Most mailboxes are accessed as internal registers to the adapter, and are
mapped through one of the PCI BARs (Base Address Register) to the host. When this register is
written to, an interrupt is raised based on the direction of the write (Host->Adapter or Adapter-
>Host) and the mailbox is processed. This procedure is not an efficient use of the PCI bus and
does not lend itself to transmitting large amounts of information. A separate DMA engine is
typically used to move data from the peripheral to the system. This implementation generally
requires that a system send in one command at a time and wait for a response from the peripheral.
There may be a queue in the driver so the system can send more commands, but they are all
issued in a sequential order. This can lead to a bottleneck in waiting for a command to be
completed, before sending in the next command. It also does not lend itself to a logical
separation of tasks on the system from the driver for the peripheral. They must go through the
same command pipeline in the driver, with no way of logically separating one task’s command
from another task’s command. This means that the driver for the peripheral must have
knowledge of the upper layer tasks. This results in a monolithic code that encompasses both the
low-level interface to the peripheral and the upper layer system tasks.

In order to change this paradigm, a better method of modeling the system needs to be created.
This model should not limit where a specific software function is run allowing the developer to
determine what is the optimum software load balance in the system. One such model would be to
extend over PCI from the peripheral to the system a messaging mechanism that allows the system
to appear as an extension of the peripheral. Using this model software blocks in the peripheral
could be moved to run on the system with minimal effort. This also allows for a more integrated
system approach to software. The peripheral no longer exists as an appendage to the system but
an integral part of it.


                                                System


                                                         Memory

                                    Messaging

                                                          DMA


                                            Peripheral



                               Figure 2 : Messaging System Model
One method to implement this approach is to use the Message Queue Bus (MQB) of the iDiSX
2000 Storage Network Processor (SNP) from iVivity Inc. as unique new approach to
communication across the PCI bus. The MQB is an 8-byte messaging bus architecture for
passing information between processing blocks within the iDiSX 2000 SNP. This messaging bus
is extended outside the device over its PCI-X interface. This allows any processing block within
the device to send a message to an external host over the PCI-X bus. With this model the
peripheral can now view the system as another processing block within the peripheral. It requires
a thin driver on the system to handle the MQB overhead, which exposes the system to a minimal
API to register a callback, send a message and deregister a callback. Tasks on the system use this
API to register a callback with this driver into one of 4 possible message queues and send
messages to any processing block (PB) within the SNP. These processing blocks can be software
tasks running on any of the processors within the SNP (MIPS, ARC) or one of the device’s
hardware acceleration engines. This allows a logical separation of tasks on the system. The low-
level driver is abstracted from the upper layer tasks; it merely passes messages between the
system and the peripheral, having no knowledge of what the messages contain. Each of the four
hardware queue can send and receive messages asynchronously from/into the iDiSX
2000 SNP, so there is no longer a single sequential pipeline for command and control.
The ability to pass 8 byte messages means more information can be transferred in the
message. Status information, pointers to data structures in system memory, etc. can now
be all passed in the same message.
System



                            Task 1   Task 2   Task 3    Task 4   Memory
                                       MQB Driver



                                                                  DMA



                                              iDiSX 2000




                            Figure 3 : iDiSX 2000 Message Queue Bus


Using this model we can now move processing blocks easily between the peripheral and the
system, allowing the developer to better load balance the overall system processing. From a
processing flow viewpoint added features can be inserted into the data flow process with
minimal effort. This will have a positive impact on the ability to add new functionality to a
given system, especially in terms of off-loading processing from the core system to a peripheral.
From a system level, both the core system and the peripheral look like they are part of the same
system, rather than one being an extension of the other. It also impact how embedded systems
are designed. Currently the same model used on the desktop is also used in the embedded space.
The core system and the peripheral are designed from a system perspective as two blocks that
communicate over PCI, each having its defined set of tasks. The ability to view the whole
design, core system and peripheral, as one overall system along with the capability to run tasks
anywhere will lead to better system performance and utilization. Developer’s can make
tradeoffs of how much front-end processing is required versus the processing done in the
CPU system.

By abstracting the hardware from the application, the iDiSX 2000 SNP can be used in a myriad
of configurations. If the system has a cache system, disc arrays, memory, etc. on the backend
with a minimal change to the current software stack, an iSCSI front-end can be added to a storage
solution. To this end iVivity Inc. can provide a sample application that interfaces with the
Linux /dev devices to a backend disc array. This sample uses the standard Linux SVM to handle
the backend storage, while providing an iSCSI front-end.

More Related Content

What's hot

Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dosvanamali_vanu
 
Centralized vs distrbution system
Centralized vs distrbution systemCentralized vs distrbution system
Centralized vs distrbution systemzirram
 
Distributed Processing
Distributed ProcessingDistributed Processing
Distributed ProcessingImtiaz Hussain
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemPrankit Mishra
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systemsSHATHAN
 
Distributed system unit II according to syllabus of RGPV, Bhopal
Distributed system unit II according to syllabus of  RGPV, BhopalDistributed system unit II according to syllabus of  RGPV, Bhopal
Distributed system unit II according to syllabus of RGPV, BhopalNANDINI SHARMA
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software conceptsPrajakta Rane
 
Communication primitives
Communication primitivesCommunication primitives
Communication primitivesStudent
 
Amoeba Operating System
Amoeba Operating SystemAmoeba Operating System
Amoeba Operating SystemBurhan Abbasi
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemudaya khanal
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An IntroductionSuhit Kulkarni
 
Distributed Operating System
Distributed Operating SystemDistributed Operating System
Distributed Operating SystemAjithaG9
 
Distributed Operating Systems
Distributed Operating SystemsDistributed Operating Systems
Distributed Operating SystemsUmmiya Mohammedi
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 

What's hot (20)

Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
 
Centralized vs distrbution system
Centralized vs distrbution systemCentralized vs distrbution system
Centralized vs distrbution system
 
Distributed Processing
Distributed ProcessingDistributed Processing
Distributed Processing
 
Lec2
Lec2Lec2
Lec2
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Communications is distributed systems
Communications is distributed systemsCommunications is distributed systems
Communications is distributed systems
 
Distributed system unit II according to syllabus of RGPV, Bhopal
Distributed system unit II according to syllabus of  RGPV, BhopalDistributed system unit II according to syllabus of  RGPV, Bhopal
Distributed system unit II according to syllabus of RGPV, Bhopal
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts
 
Communication primitives
Communication primitivesCommunication primitives
Communication primitives
 
Amoeba Operating System
Amoeba Operating SystemAmoeba Operating System
Amoeba Operating System
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Aos distibutted system
Aos distibutted systemAos distibutted system
Aos distibutted system
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An Introduction
 
Distributed Operating System
Distributed Operating SystemDistributed Operating System
Distributed Operating System
 
Distributed Operating Systems
Distributed Operating SystemsDistributed Operating Systems
Distributed Operating Systems
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Microkernel
MicrokernelMicrokernel
Microkernel
 

Similar to Improving software system load balancing using messaging.

Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH veena babu
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressHeiko Joerg Schick
 
Unit 5 multi-board system
Unit 5 multi-board systemUnit 5 multi-board system
Unit 5 multi-board systemPRADEEP
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
Data Acquisition and Control System for Real Time Applications
Data Acquisition and Control System for Real Time ApplicationsData Acquisition and Control System for Real Time Applications
Data Acquisition and Control System for Real Time Applicationsijsrd.com
 
Data Redundancy on Diskless Client using Linux Platform
Data Redundancy on Diskless Client using Linux PlatformData Redundancy on Diskless Client using Linux Platform
Data Redundancy on Diskless Client using Linux PlatformIJCSIS Research Publications
 
Spie2006 Paperpdf
Spie2006 PaperpdfSpie2006 Paperpdf
Spie2006 PaperpdfFalascoj
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMahesh Kumar Attri
 
Avionics Paperdoc
Avionics PaperdocAvionics Paperdoc
Avionics PaperdocFalascoj
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET Journal
 
Operating Systems
Operating Systems Operating Systems
Operating Systems Fahad Shaikh
 
Linux Based Advanced Routing with Firewall and Traffic Control
Linux Based Advanced Routing with Firewall and Traffic ControlLinux Based Advanced Routing with Firewall and Traffic Control
Linux Based Advanced Routing with Firewall and Traffic Controlsandy_vasan
 
distributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxdistributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxlencho3d
 
Distributed computing ).ppt him
Distributed computing ).ppt himDistributed computing ).ppt him
Distributed computing ).ppt himHimanshu Saini
 
Difference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeDifference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeSUNODH GARLAPATI
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsateeq ateeq
 

Similar to Improving software system load balancing using messaging. (20)

Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH Co question bank LAKSHMAIAH
Co question bank LAKSHMAIAH
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
 
Unit 5 multi-board system
Unit 5 multi-board systemUnit 5 multi-board system
Unit 5 multi-board system
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Data Acquisition and Control System for Real Time Applications
Data Acquisition and Control System for Real Time ApplicationsData Acquisition and Control System for Real Time Applications
Data Acquisition and Control System for Real Time Applications
 
Data Redundancy on Diskless Client using Linux Platform
Data Redundancy on Diskless Client using Linux PlatformData Redundancy on Diskless Client using Linux Platform
Data Redundancy on Diskless Client using Linux Platform
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Spie2006 Paperpdf
Spie2006 PaperpdfSpie2006 Paperpdf
Spie2006 Paperpdf
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Lesson 9 Device Management
Lesson 9 Device ManagementLesson 9 Device Management
Lesson 9 Device Management
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
 
Avionics Paperdoc
Avionics PaperdocAvionics Paperdoc
Avionics Paperdoc
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing Framework
 
Operating Systems
Operating Systems Operating Systems
Operating Systems
 
Linux Based Advanced Routing with Firewall and Traffic Control
Linux Based Advanced Routing with Firewall and Traffic ControlLinux Based Advanced Routing with Firewall and Traffic Control
Linux Based Advanced Routing with Firewall and Traffic Control
 
distributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxdistributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptx
 
Distributed computing ).ppt him
Distributed computing ).ppt himDistributed computing ).ppt him
Distributed computing ).ppt him
 
Cloud C
Cloud CCloud C
Cloud C
 
Difference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeDifference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIe
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 

Improving software system load balancing using messaging.

  • 1. Improving software system load balancing using messaging. Marc Karasek System Lead Technical Engineer IVivity Inc.
  • 2. The paradigm for devices using the PCI bus in a co-processor model has been one of extending the core functionality of the system. In a typical application the core system has a central CPU with a companion chip that provides some level of connectivity and functionality to the outside world. An example of this would be an Intel x86 with a Northbridge/Southbridge chipset. In this system, the companion chips to the CPU provide connectivity to PCI, AGP, USB, IDE and some peripherals, such as serial and audio. In order to add more functionality to this system, a PCI adapter would be added to the system. The CPU would communicate over the PCI bus to the added peripheral (commonly called a HBA) and send commands and receive data over the PCI bus. This model isolates the system (CPU + companion chips) from the added peripheral (HBA). The software in this model also encapsulates what executes on the system versus the added peripheral. There is a clear separation of which tasks run on the peripheral versus what is run on the system. This does not allow for moving functionality from the system to the added peripheral or the reverse easily. The software is fixed in which tasks are run on the peripheral and what is run on the system. From a system perspective, this is a very rigid system with little or no room for optimization. You could not, for example, take a processing block running on the peripheral and move it over to the system. The reason for this is that these two entities, peripheral and core system, are viewed as separate entities and not part of an overall system. Core Sytsem Driver Memory Command and Control Data Mailbox DMA Peripheral Figure 1 : Mailbox Control Path The method of communications used in a host to co-processor has been mailboxes, since the first ISA card was plugged into a PC. It is still used today, in one form or another, to communicate with PCI adapters. Most mailboxes are accessed as internal registers to the adapter, and are mapped through one of the PCI BARs (Base Address Register) to the host. When this register is written to, an interrupt is raised based on the direction of the write (Host->Adapter or Adapter- >Host) and the mailbox is processed. This procedure is not an efficient use of the PCI bus and does not lend itself to transmitting large amounts of information. A separate DMA engine is typically used to move data from the peripheral to the system. This implementation generally requires that a system send in one command at a time and wait for a response from the peripheral. There may be a queue in the driver so the system can send more commands, but they are all issued in a sequential order. This can lead to a bottleneck in waiting for a command to be completed, before sending in the next command. It also does not lend itself to a logical separation of tasks on the system from the driver for the peripheral. They must go through the same command pipeline in the driver, with no way of logically separating one task’s command from another task’s command. This means that the driver for the peripheral must have
  • 3. knowledge of the upper layer tasks. This results in a monolithic code that encompasses both the low-level interface to the peripheral and the upper layer system tasks. In order to change this paradigm, a better method of modeling the system needs to be created. This model should not limit where a specific software function is run allowing the developer to determine what is the optimum software load balance in the system. One such model would be to extend over PCI from the peripheral to the system a messaging mechanism that allows the system to appear as an extension of the peripheral. Using this model software blocks in the peripheral could be moved to run on the system with minimal effort. This also allows for a more integrated system approach to software. The peripheral no longer exists as an appendage to the system but an integral part of it. System Memory Messaging DMA Peripheral Figure 2 : Messaging System Model One method to implement this approach is to use the Message Queue Bus (MQB) of the iDiSX 2000 Storage Network Processor (SNP) from iVivity Inc. as unique new approach to communication across the PCI bus. The MQB is an 8-byte messaging bus architecture for passing information between processing blocks within the iDiSX 2000 SNP. This messaging bus is extended outside the device over its PCI-X interface. This allows any processing block within the device to send a message to an external host over the PCI-X bus. With this model the peripheral can now view the system as another processing block within the peripheral. It requires a thin driver on the system to handle the MQB overhead, which exposes the system to a minimal API to register a callback, send a message and deregister a callback. Tasks on the system use this API to register a callback with this driver into one of 4 possible message queues and send messages to any processing block (PB) within the SNP. These processing blocks can be software tasks running on any of the processors within the SNP (MIPS, ARC) or one of the device’s hardware acceleration engines. This allows a logical separation of tasks on the system. The low- level driver is abstracted from the upper layer tasks; it merely passes messages between the system and the peripheral, having no knowledge of what the messages contain. Each of the four hardware queue can send and receive messages asynchronously from/into the iDiSX 2000 SNP, so there is no longer a single sequential pipeline for command and control. The ability to pass 8 byte messages means more information can be transferred in the message. Status information, pointers to data structures in system memory, etc. can now be all passed in the same message.
  • 4. System Task 1 Task 2 Task 3 Task 4 Memory MQB Driver DMA iDiSX 2000 Figure 3 : iDiSX 2000 Message Queue Bus Using this model we can now move processing blocks easily between the peripheral and the system, allowing the developer to better load balance the overall system processing. From a processing flow viewpoint added features can be inserted into the data flow process with minimal effort. This will have a positive impact on the ability to add new functionality to a given system, especially in terms of off-loading processing from the core system to a peripheral. From a system level, both the core system and the peripheral look like they are part of the same system, rather than one being an extension of the other. It also impact how embedded systems are designed. Currently the same model used on the desktop is also used in the embedded space. The core system and the peripheral are designed from a system perspective as two blocks that communicate over PCI, each having its defined set of tasks. The ability to view the whole design, core system and peripheral, as one overall system along with the capability to run tasks anywhere will lead to better system performance and utilization. Developer’s can make tradeoffs of how much front-end processing is required versus the processing done in the CPU system. By abstracting the hardware from the application, the iDiSX 2000 SNP can be used in a myriad of configurations. If the system has a cache system, disc arrays, memory, etc. on the backend with a minimal change to the current software stack, an iSCSI front-end can be added to a storage solution. To this end iVivity Inc. can provide a sample application that interfaces with the Linux /dev devices to a backend disc array. This sample uses the standard Linux SVM to handle the backend storage, while providing an iSCSI front-end.