Improving software system load balancing using messaging. Marc Karasek System Lead Technical Engineer IVivity Inc.
The paradigm for devices using the PCI bus in a co-processor model has been one of extendingthe core functionality of the system. In a typical application the core system has a central CPUwith a companion chip that provides some level of connectivity and functionality to the outsideworld. An example of this would be an Intel x86 with a Northbridge/Southbridge chipset. In thissystem, the companion chips to the CPU provide connectivity to PCI, AGP, USB, IDE and someperipherals, such as serial and audio. In order to add more functionality to this system, a PCIadapter would be added to the system. The CPU would communicate over the PCI bus to theadded peripheral (commonly called a HBA) and send commands and receive data over the PCIbus. This model isolates the system (CPU + companion chips) from the added peripheral (HBA).The software in this model also encapsulates what executes on the system versus the addedperipheral. There is a clear separation of which tasks run on the peripheral versus what is run onthe system. This does not allow for moving functionality from the system to the added peripheralor the reverse easily. The software is fixed in which tasks are run on the peripheral and what isrun on the system. From a system perspective, this is a very rigid system with little or no roomfor optimization. You could not, for example, take a processing block running on the peripheraland move it over to the system. The reason for this is that these two entities, peripheral and coresystem, are viewed as separate entities and not part of an overall system. Core Sytsem Driver Memory Command and Control Data Mailbox DMA Peripheral Figure 1 : Mailbox Control PathThe method of communications used in a host to co-processor has been mailboxes, since the firstISA card was plugged into a PC. It is still used today, in one form or another, to communicatewith PCI adapters. Most mailboxes are accessed as internal registers to the adapter, and aremapped through one of the PCI BARs (Base Address Register) to the host. When this register iswritten to, an interrupt is raised based on the direction of the write (Host->Adapter or Adapter->Host) and the mailbox is processed. This procedure is not an efficient use of the PCI bus anddoes not lend itself to transmitting large amounts of information. A separate DMA engine istypically used to move data from the peripheral to the system. This implementation generallyrequires that a system send in one command at a time and wait for a response from the peripheral.There may be a queue in the driver so the system can send more commands, but they are allissued in a sequential order. This can lead to a bottleneck in waiting for a command to becompleted, before sending in the next command. It also does not lend itself to a logicalseparation of tasks on the system from the driver for the peripheral. They must go through thesame command pipeline in the driver, with no way of logically separating one task’s commandfrom another task’s command. This means that the driver for the peripheral must have
knowledge of the upper layer tasks. This results in a monolithic code that encompasses both thelow-level interface to the peripheral and the upper layer system tasks.In order to change this paradigm, a better method of modeling the system needs to be created.This model should not limit where a specific software function is run allowing the developer todetermine what is the optimum software load balance in the system. One such model would be toextend over PCI from the peripheral to the system a messaging mechanism that allows the systemto appear as an extension of the peripheral. Using this model software blocks in the peripheralcould be moved to run on the system with minimal effort. This also allows for a more integratedsystem approach to software. The peripheral no longer exists as an appendage to the system butan integral part of it. System Memory Messaging DMA Peripheral Figure 2 : Messaging System ModelOne method to implement this approach is to use the Message Queue Bus (MQB) of the iDiSX2000 Storage Network Processor (SNP) from iVivity Inc. as unique new approach tocommunication across the PCI bus. The MQB is an 8-byte messaging bus architecture forpassing information between processing blocks within the iDiSX 2000 SNP. This messaging busis extended outside the device over its PCI-X interface. This allows any processing block withinthe device to send a message to an external host over the PCI-X bus. With this model theperipheral can now view the system as another processing block within the peripheral. It requiresa thin driver on the system to handle the MQB overhead, which exposes the system to a minimalAPI to register a callback, send a message and deregister a callback. Tasks on the system use thisAPI to register a callback with this driver into one of 4 possible message queues and sendmessages to any processing block (PB) within the SNP. These processing blocks can be softwaretasks running on any of the processors within the SNP (MIPS, ARC) or one of the device’shardware acceleration engines. This allows a logical separation of tasks on the system. The low-level driver is abstracted from the upper layer tasks; it merely passes messages between thesystem and the peripheral, having no knowledge of what the messages contain. Each of the fourhardware queue can send and receive messages asynchronously from/into the iDiSX2000 SNP, so there is no longer a single sequential pipeline for command and control.The ability to pass 8 byte messages means more information can be transferred in themessage. Status information, pointers to data structures in system memory, etc. can nowbe all passed in the same message.
System Task 1 Task 2 Task 3 Task 4 Memory MQB Driver DMA iDiSX 2000 Figure 3 : iDiSX 2000 Message Queue BusUsing this model we can now move processing blocks easily between the peripheral and thesystem, allowing the developer to better load balance the overall system processing. From aprocessing flow viewpoint added features can be inserted into the data flow process withminimal effort. This will have a positive impact on the ability to add new functionality to agiven system, especially in terms of off-loading processing from the core system to a peripheral.From a system level, both the core system and the peripheral look like they are part of the samesystem, rather than one being an extension of the other. It also impact how embedded systemsare designed. Currently the same model used on the desktop is also used in the embedded space.The core system and the peripheral are designed from a system perspective as two blocks thatcommunicate over PCI, each having its defined set of tasks. The ability to view the wholedesign, core system and peripheral, as one overall system along with the capability to run tasksanywhere will lead to better system performance and utilization. Developer’s can maketradeoffs of how much front-end processing is required versus the processing done in theCPU system.By abstracting the hardware from the application, the iDiSX 2000 SNP can be used in a myriadof configurations. If the system has a cache system, disc arrays, memory, etc. on the backendwith a minimal change to the current software stack, an iSCSI front-end can be added to a storagesolution. To this end iVivity Inc. can provide a sample application that interfaces with theLinux /dev devices to a backend disc array. This sample uses the standard Linux SVM to handlethe backend storage, while providing an iSCSI front-end.