SlideShare a Scribd company logo
T he PCI Interface, which was originally created
for the personal computer, has now been univer-
sally adopted by system designers for use in datacom,
telecom, PCs, servers, and many other systems.
Currently, the PCI Interface is used mainly as an
expansion bus to add PCI cards that have wide ranging
applications. For example, a router, which was designed
by a datacom company, could have eight expansion
slots that allow a user to connect eight Ethernet cards.
In this application, the designer would have integrated
two PCI-to-PCI bridges enabling any Ethernet PCI
card to transfer data to any other Ethernet device.
The new 3-Port PCI-to-PCI Bridge from Pericom, a one-chip
solution, enables designers to connect all eight Ethernet cards
to one PCI bridge. This breakthrough architecture, which has
two secondary PCI buses integrated onto the same chip,
enables two devices sitting on two different secondary PCI
buses to communicate to each other directly without involving
the host CPU or main memory.
The Triple Port PCI-to-PCI Bridge has a number of benefits:
Board real estate saving
Cost reduction
Secondary-to-Secondary performance improvement
Ease of design for redundant applications
Isolating and saving the bandwidth on the primary bus
Less loading on the primary device
Saving real estate space on the board
Because the triple-port bridge has dual secondary buses, it can
replace two single-port bridges, thus alleviating the problem of
crowded boards. This feature is extremely important for telephony
or datacom systems. In these applications, the designer must de-
sign for as many Ethernet cards as possible, especially for a tele-
phony application. In a CompactPCI system design, you can in-
sert up to 16 PCI peripheral cards on a single bridge using Pericom’s
PI7C7300 Triple-Port PCI Bridge. When using a dual-port com-
peting application, the limit is eight cards. The Single-Channel
PCI-to-PCI Bridge is the same size as the Triple-Port PCI-to-PCI
Bridge, but the Triple-Port PCI-to-PCI Bridge provides an addi-
tional secondary PCI channel.
Cost reduction
Because the Triple-Port PCI-to-PCI architecture replaces two
Single-Port PCI-to-PCI bridges, using Pericom’s device can result
in reducing costs by one half.
Theoretical analysis: Secondary-to-Secondary
performance improvement
Since the Triple-Port PCI-to-PCI Bridge has dual secondary
channel communication between the two secondary devices,
it takes much less time when transferring data from one bridge
to another via main memory or another device to link the two
secondary buses created by two separate PCI-to-PCI bridges.
To further analyze performance, consider a 400-byte (100
double word) transfer between two devices, each on a sec-
ondary PCI bus:
Device 1 on the secondary bus 1 in Slot 1
Device 2 on the secondary bus 2 in Slot 1
Case #1: Using two PCI-to-PCI bridges
Here data must go through main memory, traveling as
follows:
Expansion PCI device transfers data to the PCI bridge
PCI bridge initiates a transfer to the Primary PCI device
North Bridge picks up the data and writes it to main
memory
CPU performs an I/O write to indicate that the data
transfer is complete
DMA in the North Bridge reads data from main memory
and writes it to the primary PCI bus
PCI bridge transfers data to the second PCI-to-PCI
Bridge and to the destination PCI device
Assuming that the PCI devices can run burst PCI cycles of 4
transfers each, this is what is involved:
PCI device to PCI bridge. Each burst is 3-1-1-1.
Each burst is 6 clocks long. 100/4 = 25
Add two turnaround cycles for each transfer.
Total time = 8 x 25 = 200 cycles
PCI Bridge-to-PCI Bridge. To transfer data to main
memory 8-1-1-1 = 11 PCI clocks
Add two turnaround clocks. Each transfer takes 13
PCI clocks. Total time = 13 x 25= 325 cycles
6 PCI clocks for the I/O cycle
Main memory to Primary PCI Bridge. Same as above:
13 x 25 = 325 cycles
Primary PCI bus to destination.
Same as above. Total time is 8 x 25 = 200 cycles
Total time to transfer 400 bytes is T: T = 200 + 325 + 6 +
325 + 200 = 1056 PCI clocks
12345678901234567890123456789012123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789012
12345678901234567890123456789012123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789012
Three Port PCI Bridge PerformanceAdvantages
Mohamad Tisani
Application Note 40
For comparison, in Table 1 we used different chipsets – 430
BX, 440GX, and the 840 Camino from Intel. Table 1 shows
actual time in clock cycles to transfer 100 D_words from one
I/O card to the next on two different secondary buses, S1
and S2 (Figure 2 shows this information graphically).
Case #2: Using Pericom’s 3-Port PCI Bridge
(data only goes through the bridge)
Pericom’s PCI-to-PCI bridge, which has 128 bytes FIFO for
each bus, keeps accepting data for each PCI clock as long as
there is room in the FIFO. To receive the first few D_words
from the first secondary bus and post them into the FIFO it
takes a maximum of 5 clocks. After the first data has been
received, the bridge will request the second secondary bus
and initiate the transfer. If both PCI devices are capable of
sustaining burst, then the bridge will be able to do 5-1-1-1-
1…1 transfer. That means it takes 5 clocks to the first data
transfer but following data will take only one PCI clock. The
bridge will be able to burst as long as the 4 Kbit line bound-
ary transfer is not crossed.
Total time in this example is T2 = 5 + 99 = 104.
Comparison
In terms of performance, we obviously have tremendous
gains; T2 is about 10% of T1. Using Pericom’s 3-Port PCI
Bridge device, PI7C7300, the gain is: [(1056 –104) / 1056] x
100 = 90%.
Actual analysis, Secondary-to-Secondary
performance improvement
To get real data on performance, we used a PCI Exerciser/
Analyzer and the Logic Analyzer to send traffic (100
D_words) through the bridge to the main memory and back.
This analysis shows how long it will take to transfer data
from one I/O card to another I/O card when going through
the main memory.
The other example was to send the data across Pericom’s
3-Port PCI-to-PCI Bridge. We captured the waveform and
recorded how long it took to transfer the data.
Performance data will depend on what motherboard and
chipset designers are using. In the following example, we
send 100 D_words from the S1 secondary bus. We wrote 100
D_words to the main memory and then we read the data
back, 100 memory read from S2 secondary PCI bus to main
memory. Let us call this Path 2 (see Figure 1).
We can achieve the same purpose if we wrote 100 D_words
from S1 to S2 on the PI6C7300. Let us call this Path 1 (see
Figure 1).
PATH 1 PATH 2
Path 1 :S1 to S2 Path 2 : Memory Path2 : Memory Write & Path2 : Memory Write & Performance
Memory write Write & Memory Read Memory Read Line Memory Read Multiple Gain
430 BX 103 1161 1160 821 87 % - 91 %
440GX 103 1169 1169 827 88% - 91%
820Camino 103 1152 933 562 82% - 91%
Figure 2
Notice that in the Table 1, the performance increase is be-
tween 82% to 91%. Also notice that the performance does
not change when going from S1 to S2 on the 3-Port PCI
Bridge. It is constant with the chipset that is on the
motherboard. Also notice that there is a difference as to
what command on the PCI bus is used. The other parameter
to consider is the Cache Line Size programmed into the
chipset and the PCI-to-PCI Bridge. Normally, if you are do-
ing long bursts, then using the largest Cache Line Size and
the commands Memory Read Multiple, and Memory Write
and Invalidate, will lead to the best performance (especially
true of this experiment).
Table 1
1200
1000
800
600
400
200
S1 to S2 MW & MR
NumberofClocks
100 D_Words Transfer Time of Different Platforms
Type of Transfer
MW & MRL MW & MRM
0
430 BX 440 XX 820
Figure 3 is a logic analyzer snapshot illustrating S1 to S2, 400
Byte transfer performed in a single sustained burst, taking 103
clocks to transfer 100 double words.
Figure 4 is a logic analyzer snapshot illustrating 100 D_word
memory write cycles followed by 100 D_word memory Read
cycles. Initiator is on S1, the secondary bus, behind the
3-Port Bridge. This is performed on the 840 Camino chipset.
Isolation & Saving Bandwidth
on the Primary Bus
Reviewing the numbers mentioned in Table 1, notice that by
moving data from one secondary to the other, you are actu-
ally isolating traffic on the secondary bus alone. Therefore,
you are making the Primary PCI bus and the memory bus idle
during this time resulting in better I/O performance on the
primary PCI bus by up to 80 or 90%. Meanwhile, the memory
bus is idle and ready to be used by the CPU or other Primary
PCI devices resulting in greater system bandwidth and per-
formance (see Figure 6).
Conclusion
Pericom’s 66 MHz Triple-Port PCI-to-PCI Bridge is a hot-
swap friendly CompactPCI device that enhances overall sys-
tem performance. The PI7C7300 permits the attachment of
up to 15 Compact PCI cards onto the same board, thus en-
abling a system designer to save board space and compo-
nent cost.
Ease of Design for Redundant Applications
The architecture of the 3-Port PCI Bridge makes it ideal for
redundant applications (see Figure 5). In such an applica-
tion, the system resources are duplicated so that if one re-
source goes down, the second, or slave resource, will come
up and run the system. This feature is very important for
banking, defense, and other critical systems.
Mohamad Tisani is the Director
of Application Engineering at Pericom
Semiconductor Corporation. His
experience includes System Design
and Architecture. He has an MSEE
from Santa Clara University, and a
BSEE from U.C. San Diego. In his
spare time he enjoys Racquetball,
Tennis, and Swimming.
Figure 3
Figure 4
Figure 5
Figure 6

More Related Content

What's hot

PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
gnkeshava
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_FinalGopi Krishnamurthy
 
Lb35189919904
Lb35189919904Lb35189919904
Lb35189919904
IJERA Editor
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
AMD Developer Central
 
FPGA based QDR-II+ SRAM Controller
FPGA based QDR-II+ SRAM ControllerFPGA based QDR-II+ SRAM Controller
FPGA based QDR-II+ SRAM Controller
iosrjce
 
Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...
Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...
Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...
IJECEIAES
 
ADS Lab 5 Report
ADS Lab 5 ReportADS Lab 5 Report
ADS Lab 5 ReportRiddhi Shah
 
Evaluation of High Speed and Low Memory Parallel Prefix Adders
Evaluation of High Speed and Low Memory Parallel Prefix AddersEvaluation of High Speed and Low Memory Parallel Prefix Adders
Evaluation of High Speed and Low Memory Parallel Prefix Adders
IOSR Journals
 
Bu34437441
Bu34437441Bu34437441
Bu34437441
IJERA Editor
 
An119
An119An119
Isa Dma & Bus Masters
Isa Dma & Bus MastersIsa Dma & Bus Masters
Isa Dma & Bus Masters
nishantshah2381
 
Pcie basic
Pcie basicPcie basic
Pcie basic
Saifuddin Kaijar
 
Mukherjee2015
Mukherjee2015Mukherjee2015
Mukherjee2015
Bannoth Madhusudhan
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Editor IJARCET
 
Verilog hdl design examples
Verilog hdl design examplesVerilog hdl design examples
Verilog hdl design examples
dennis gookyi
 
Final Project Report
Final Project ReportFinal Project Report
Final Project ReportRiddhi Shah
 
Melp codec optimization using DSP kit
Melp codec optimization using DSP kitMelp codec optimization using DSP kit
Melp codec optimization using DSP kit
sohaibaslam207
 
DMA
DMADMA

What's hot (20)

PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
Lb35189919904
Lb35189919904Lb35189919904
Lb35189919904
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
FPGA based QDR-II+ SRAM Controller
FPGA based QDR-II+ SRAM ControllerFPGA based QDR-II+ SRAM Controller
FPGA based QDR-II+ SRAM Controller
 
Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...
Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...
Designing a Novel High Performance Four-to-Two Compressor Cell Based on CNTFE...
 
PCI & ISA bus
PCI & ISA busPCI & ISA bus
PCI & ISA bus
 
ADS Lab 5 Report
ADS Lab 5 ReportADS Lab 5 Report
ADS Lab 5 Report
 
Evaluation of High Speed and Low Memory Parallel Prefix Adders
Evaluation of High Speed and Low Memory Parallel Prefix AddersEvaluation of High Speed and Low Memory Parallel Prefix Adders
Evaluation of High Speed and Low Memory Parallel Prefix Adders
 
Bu34437441
Bu34437441Bu34437441
Bu34437441
 
An119
An119An119
An119
 
Isa Dma & Bus Masters
Isa Dma & Bus MastersIsa Dma & Bus Masters
Isa Dma & Bus Masters
 
Pcie basic
Pcie basicPcie basic
Pcie basic
 
Mukherjee2015
Mukherjee2015Mukherjee2015
Mukherjee2015
 
Gdi cell
Gdi cellGdi cell
Gdi cell
 
Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362Ijarcet vol-2-issue-7-2357-2362
Ijarcet vol-2-issue-7-2357-2362
 
Verilog hdl design examples
Verilog hdl design examplesVerilog hdl design examples
Verilog hdl design examples
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Report
 
Melp codec optimization using DSP kit
Melp codec optimization using DSP kitMelp codec optimization using DSP kit
Melp codec optimization using DSP kit
 
DMA
DMADMA
DMA
 

Similar to PCI Bridge Performance

PCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignPCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignMohamad Tisani
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
FPGA Central
 
Bus Interfacing with Intel Microprocessors Based Systems
Bus Interfacing with Intel Microprocessors Based SystemsBus Interfacing with Intel Microprocessors Based Systems
Bus Interfacing with Intel Microprocessors Based Systems
Murtadha Alsabbagh
 
Difference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeDifference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIe
SUNODH GARLAPATI
 
Communication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet NetworkCommunication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet Network
IJERA Editor
 
Introduction to VIP with PCI Express Technology
Introduction to VIP with PCI Express TechnologyIntroduction to VIP with PCI Express Technology
Introduction to VIP with PCI Express Technology
ijsrd.com
 
Hyper Transport Technology
Hyper Transport TechnologyHyper Transport Technology
Hyper Transport Technology
nayakslideshare
 
Bus interface 8086
Bus interface 8086Bus interface 8086
Bus interface 8086
University of Gujrat, Pakistan
 
Transport layer
Transport layer   Transport layer
Transport layer
AnusuaBasu
 
LECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.pptLECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.ppt
MonirHossain707319
 
Skerlj Dbi Ecc Isqed08 1 B2
Skerlj Dbi Ecc Isqed08 1 B2Skerlj Dbi Ecc Isqed08 1 B2
Skerlj Dbi Ecc Isqed08 1 B2skerlj
 
OPC UA TSN - A new Solution for Industrial Communication | White Paper
OPC UA TSN - A new Solution for Industrial Communication | White PaperOPC UA TSN - A new Solution for Industrial Communication | White Paper
OPC UA TSN - A new Solution for Industrial Communication | White Paper
B&R Industrial Automation
 
XIO2200A PCI Express to 1394a Chip
XIO2200A PCI Express to 1394a ChipXIO2200A PCI Express to 1394a Chip
XIO2200A PCI Express to 1394a Chip
Premier Farnell
 
POWER EFFICIENT CARRY PROPAGATE ADDER
POWER EFFICIENT CARRY PROPAGATE ADDERPOWER EFFICIENT CARRY PROPAGATE ADDER
POWER EFFICIENT CARRY PROPAGATE ADDER
VLSICS Design
 
Design and implementation of DADCT
Design and implementation of DADCTDesign and implementation of DADCT
Design and implementation of DADCT
Satish Kumar
 
EC8791 consumer electronics-platform level performance analysis
EC8791 consumer electronics-platform level performance analysisEC8791 consumer electronics-platform level performance analysis
EC8791 consumer electronics-platform level performance analysis
RajalakshmiSermadurai
 
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET Journal
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Anand Haridass
 
profibus theory solution
profibus theory solutionprofibus theory solution
profibus theory solution
Md. Mashiur Rahman
 
Bus
BusBus

Similar to PCI Bridge Performance (20)

PCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card DesignPCIX Gigabit Ethernet Card Design
PCIX Gigabit Ethernet Card Design
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
 
Bus Interfacing with Intel Microprocessors Based Systems
Bus Interfacing with Intel Microprocessors Based SystemsBus Interfacing with Intel Microprocessors Based Systems
Bus Interfacing with Intel Microprocessors Based Systems
 
Difference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIeDifference between PCI PCI-X PCIe
Difference between PCI PCI-X PCIe
 
Communication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet NetworkCommunication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet Network
 
Introduction to VIP with PCI Express Technology
Introduction to VIP with PCI Express TechnologyIntroduction to VIP with PCI Express Technology
Introduction to VIP with PCI Express Technology
 
Hyper Transport Technology
Hyper Transport TechnologyHyper Transport Technology
Hyper Transport Technology
 
Bus interface 8086
Bus interface 8086Bus interface 8086
Bus interface 8086
 
Transport layer
Transport layer   Transport layer
Transport layer
 
LECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.pptLECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.ppt
 
Skerlj Dbi Ecc Isqed08 1 B2
Skerlj Dbi Ecc Isqed08 1 B2Skerlj Dbi Ecc Isqed08 1 B2
Skerlj Dbi Ecc Isqed08 1 B2
 
OPC UA TSN - A new Solution for Industrial Communication | White Paper
OPC UA TSN - A new Solution for Industrial Communication | White PaperOPC UA TSN - A new Solution for Industrial Communication | White Paper
OPC UA TSN - A new Solution for Industrial Communication | White Paper
 
XIO2200A PCI Express to 1394a Chip
XIO2200A PCI Express to 1394a ChipXIO2200A PCI Express to 1394a Chip
XIO2200A PCI Express to 1394a Chip
 
POWER EFFICIENT CARRY PROPAGATE ADDER
POWER EFFICIENT CARRY PROPAGATE ADDERPOWER EFFICIENT CARRY PROPAGATE ADDER
POWER EFFICIENT CARRY PROPAGATE ADDER
 
Design and implementation of DADCT
Design and implementation of DADCTDesign and implementation of DADCT
Design and implementation of DADCT
 
EC8791 consumer electronics-platform level performance analysis
EC8791 consumer electronics-platform level performance analysisEC8791 consumer electronics-platform level performance analysis
EC8791 consumer electronics-platform level performance analysis
 
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...IRJET -  	  High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
profibus theory solution
profibus theory solutionprofibus theory solution
profibus theory solution
 
Bus
BusBus
Bus
 

PCI Bridge Performance

  • 1. T he PCI Interface, which was originally created for the personal computer, has now been univer- sally adopted by system designers for use in datacom, telecom, PCs, servers, and many other systems. Currently, the PCI Interface is used mainly as an expansion bus to add PCI cards that have wide ranging applications. For example, a router, which was designed by a datacom company, could have eight expansion slots that allow a user to connect eight Ethernet cards. In this application, the designer would have integrated two PCI-to-PCI bridges enabling any Ethernet PCI card to transfer data to any other Ethernet device. The new 3-Port PCI-to-PCI Bridge from Pericom, a one-chip solution, enables designers to connect all eight Ethernet cards to one PCI bridge. This breakthrough architecture, which has two secondary PCI buses integrated onto the same chip, enables two devices sitting on two different secondary PCI buses to communicate to each other directly without involving the host CPU or main memory. The Triple Port PCI-to-PCI Bridge has a number of benefits: Board real estate saving Cost reduction Secondary-to-Secondary performance improvement Ease of design for redundant applications Isolating and saving the bandwidth on the primary bus Less loading on the primary device Saving real estate space on the board Because the triple-port bridge has dual secondary buses, it can replace two single-port bridges, thus alleviating the problem of crowded boards. This feature is extremely important for telephony or datacom systems. In these applications, the designer must de- sign for as many Ethernet cards as possible, especially for a tele- phony application. In a CompactPCI system design, you can in- sert up to 16 PCI peripheral cards on a single bridge using Pericom’s PI7C7300 Triple-Port PCI Bridge. When using a dual-port com- peting application, the limit is eight cards. The Single-Channel PCI-to-PCI Bridge is the same size as the Triple-Port PCI-to-PCI Bridge, but the Triple-Port PCI-to-PCI Bridge provides an addi- tional secondary PCI channel. Cost reduction Because the Triple-Port PCI-to-PCI architecture replaces two Single-Port PCI-to-PCI bridges, using Pericom’s device can result in reducing costs by one half. Theoretical analysis: Secondary-to-Secondary performance improvement Since the Triple-Port PCI-to-PCI Bridge has dual secondary channel communication between the two secondary devices, it takes much less time when transferring data from one bridge to another via main memory or another device to link the two secondary buses created by two separate PCI-to-PCI bridges. To further analyze performance, consider a 400-byte (100 double word) transfer between two devices, each on a sec- ondary PCI bus: Device 1 on the secondary bus 1 in Slot 1 Device 2 on the secondary bus 2 in Slot 1 Case #1: Using two PCI-to-PCI bridges Here data must go through main memory, traveling as follows: Expansion PCI device transfers data to the PCI bridge PCI bridge initiates a transfer to the Primary PCI device North Bridge picks up the data and writes it to main memory CPU performs an I/O write to indicate that the data transfer is complete DMA in the North Bridge reads data from main memory and writes it to the primary PCI bus PCI bridge transfers data to the second PCI-to-PCI Bridge and to the destination PCI device Assuming that the PCI devices can run burst PCI cycles of 4 transfers each, this is what is involved: PCI device to PCI bridge. Each burst is 3-1-1-1. Each burst is 6 clocks long. 100/4 = 25 Add two turnaround cycles for each transfer. Total time = 8 x 25 = 200 cycles PCI Bridge-to-PCI Bridge. To transfer data to main memory 8-1-1-1 = 11 PCI clocks Add two turnaround clocks. Each transfer takes 13 PCI clocks. Total time = 13 x 25= 325 cycles 6 PCI clocks for the I/O cycle Main memory to Primary PCI Bridge. Same as above: 13 x 25 = 325 cycles Primary PCI bus to destination. Same as above. Total time is 8 x 25 = 200 cycles Total time to transfer 400 bytes is T: T = 200 + 325 + 6 + 325 + 200 = 1056 PCI clocks 12345678901234567890123456789012123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789012 12345678901234567890123456789012123456789012345678901234567890121234567890123456789012345678901212345678901234567890123456789012 Three Port PCI Bridge PerformanceAdvantages Mohamad Tisani Application Note 40
  • 2. For comparison, in Table 1 we used different chipsets – 430 BX, 440GX, and the 840 Camino from Intel. Table 1 shows actual time in clock cycles to transfer 100 D_words from one I/O card to the next on two different secondary buses, S1 and S2 (Figure 2 shows this information graphically). Case #2: Using Pericom’s 3-Port PCI Bridge (data only goes through the bridge) Pericom’s PCI-to-PCI bridge, which has 128 bytes FIFO for each bus, keeps accepting data for each PCI clock as long as there is room in the FIFO. To receive the first few D_words from the first secondary bus and post them into the FIFO it takes a maximum of 5 clocks. After the first data has been received, the bridge will request the second secondary bus and initiate the transfer. If both PCI devices are capable of sustaining burst, then the bridge will be able to do 5-1-1-1- 1…1 transfer. That means it takes 5 clocks to the first data transfer but following data will take only one PCI clock. The bridge will be able to burst as long as the 4 Kbit line bound- ary transfer is not crossed. Total time in this example is T2 = 5 + 99 = 104. Comparison In terms of performance, we obviously have tremendous gains; T2 is about 10% of T1. Using Pericom’s 3-Port PCI Bridge device, PI7C7300, the gain is: [(1056 –104) / 1056] x 100 = 90%. Actual analysis, Secondary-to-Secondary performance improvement To get real data on performance, we used a PCI Exerciser/ Analyzer and the Logic Analyzer to send traffic (100 D_words) through the bridge to the main memory and back. This analysis shows how long it will take to transfer data from one I/O card to another I/O card when going through the main memory. The other example was to send the data across Pericom’s 3-Port PCI-to-PCI Bridge. We captured the waveform and recorded how long it took to transfer the data. Performance data will depend on what motherboard and chipset designers are using. In the following example, we send 100 D_words from the S1 secondary bus. We wrote 100 D_words to the main memory and then we read the data back, 100 memory read from S2 secondary PCI bus to main memory. Let us call this Path 2 (see Figure 1). We can achieve the same purpose if we wrote 100 D_words from S1 to S2 on the PI6C7300. Let us call this Path 1 (see Figure 1). PATH 1 PATH 2 Path 1 :S1 to S2 Path 2 : Memory Path2 : Memory Write & Path2 : Memory Write & Performance Memory write Write & Memory Read Memory Read Line Memory Read Multiple Gain 430 BX 103 1161 1160 821 87 % - 91 % 440GX 103 1169 1169 827 88% - 91% 820Camino 103 1152 933 562 82% - 91% Figure 2 Notice that in the Table 1, the performance increase is be- tween 82% to 91%. Also notice that the performance does not change when going from S1 to S2 on the 3-Port PCI Bridge. It is constant with the chipset that is on the motherboard. Also notice that there is a difference as to what command on the PCI bus is used. The other parameter to consider is the Cache Line Size programmed into the chipset and the PCI-to-PCI Bridge. Normally, if you are do- ing long bursts, then using the largest Cache Line Size and the commands Memory Read Multiple, and Memory Write and Invalidate, will lead to the best performance (especially true of this experiment). Table 1 1200 1000 800 600 400 200 S1 to S2 MW & MR NumberofClocks 100 D_Words Transfer Time of Different Platforms Type of Transfer MW & MRL MW & MRM 0 430 BX 440 XX 820
  • 3. Figure 3 is a logic analyzer snapshot illustrating S1 to S2, 400 Byte transfer performed in a single sustained burst, taking 103 clocks to transfer 100 double words. Figure 4 is a logic analyzer snapshot illustrating 100 D_word memory write cycles followed by 100 D_word memory Read cycles. Initiator is on S1, the secondary bus, behind the 3-Port Bridge. This is performed on the 840 Camino chipset. Isolation & Saving Bandwidth on the Primary Bus Reviewing the numbers mentioned in Table 1, notice that by moving data from one secondary to the other, you are actu- ally isolating traffic on the secondary bus alone. Therefore, you are making the Primary PCI bus and the memory bus idle during this time resulting in better I/O performance on the primary PCI bus by up to 80 or 90%. Meanwhile, the memory bus is idle and ready to be used by the CPU or other Primary PCI devices resulting in greater system bandwidth and per- formance (see Figure 6). Conclusion Pericom’s 66 MHz Triple-Port PCI-to-PCI Bridge is a hot- swap friendly CompactPCI device that enhances overall sys- tem performance. The PI7C7300 permits the attachment of up to 15 Compact PCI cards onto the same board, thus en- abling a system designer to save board space and compo- nent cost. Ease of Design for Redundant Applications The architecture of the 3-Port PCI Bridge makes it ideal for redundant applications (see Figure 5). In such an applica- tion, the system resources are duplicated so that if one re- source goes down, the second, or slave resource, will come up and run the system. This feature is very important for banking, defense, and other critical systems. Mohamad Tisani is the Director of Application Engineering at Pericom Semiconductor Corporation. His experience includes System Design and Architecture. He has an MSEE from Santa Clara University, and a BSEE from U.C. San Diego. In his spare time he enjoys Racquetball, Tennis, and Swimming. Figure 3 Figure 4 Figure 5 Figure 6