SlideShare a Scribd company logo
1 of 4
Download to read offline
Technical Note
        For Vivante GC Cores


Latency and Outstanding Request Analysis




                Version 1.0

                 January 2012
Technical Note for Vivante GC Cores

Revision History
Version            Date                 Author                                 Description
  1.0         January 16, 2012        Benson Tao   -Initial release




                                                                 Page 2 of 4                 Ver. 1.0 / January 2012
Technical Note for Vivante GC Cores




1 General System Performance Analysis
Q: Can you please provide additional details on load balancing and recommendations on optimizing CPU and GPU
interactions for peak system performance?

In a graphics subsystem, an application such as a 3D game or fancy GUI calls different APIs to access the graphics hardware
through programming calls to the operating system. When the application requests an image to be rendered onscreen, the
API calls the OS which in turn invokes the GPU driver to communicate with the GPU hardware to draw the image to be shown
onscreen. From a CPU perspective, the CPU accumulates and sets up graphics commands that are dispatched to the GPU for
processing and display rendering. As graphics performance and screen/object details increase with advances in technology an
optimized method for CPU-GPU communication must be maintained to ensure external/internal bandwidth availability,
internal communications with minimal latencies, cache coherence, and optimizations (system, processor, GPU), access
priorities for different system blocks and the GPU/CPU (ex. starving the display controller from accessing the GPU screen data
will cause display flickering and a negative user experience).

In general, since each SoC design has specific requirements, there needs to be a balance between all resources,
communications (AXI/AHB, OCP, NoC, proprietary, etc.), memory interfaces, OS/system specific optimizations, chip floor
planning, graphics compression technologies, and effective use of memories in the hierarchy (registers, caches, system
memories, efficient banking, coherency, single/dual port RAMS, access speeds, etc.). Since each design is customized, we help
provide test vectors and performance traces to test the graphics subsystem which can be included in a customer’s full chip
test simulation to test overall system loading. We also work with the customer directly based on application
type/usage/requirements in addition to analyzing CPU and system resources to recommend the best graphics core (and any
optimizations/derivative cores) based on power, size, performance, and features.




2 Latency Analysis
SoC architectures consist of multiple processing engines (CPU, 3D, video, etc.), memory units (DDR, caches, etc.) and I/O
blocks (network, USB, HDD, Flash, wireless, RF, etc.). All these functional blocks integrated into the SoC define the product,
target market, performance, features and key differentiations that are key selling points of the chip and end device. Each
processing engine like the CPU, video and 3D graphics have different (data) traffic profiles which effect bandwidth, latency
and performance requirements. A well-defined SoC interconnect scheme capable of handling multiple processing units in
parallel with minimal or no performance degradation is ideal in today’s multi-tasking environment (ex. video + composition or
video + 3D graphics + composition). In addition to the efficient interconnect (AXI, ACE-Lite, NoC, etc.) design, a well
architected memory controller subsystem is required to match the bandwidth, latency and data transfer requirements
needed by each processing unit. A designer does not want to have a high speed interconnect coupled with a low speed
memory controller (MC) and memory bus (speed, width), since the MC will be the bottleneck.

As system complexity increases, the overall performance is determined by the engine speed, interconnect design, and
memory system to provide sustained data bandwidth to the engine while also meeting the real time performance goals of
latency sensitive traffic. To design an optimized system, all parts of the data transfer from initiator to destination must be
analyzed in totality. In the following paragraphs we will focus on latency considerations along with the number of outstanding
requests.

Latency in a GPU design takes into account interconnect (bus) and data latencies. Bus latency is the total roundtrip latency
from the GPU (initiator) through the interconnect external DDR memory back to the GPU, as depicted below:




                                                                      Page 3 of 4                     Ver. 1.0 / January 2012
Technical Note for Vivante GC Cores




The total bus latency is the addition of all the latency summed as a request or data is transferred from the GPU. In the image
above, the total latency is the sum of the latency from one to six, which includes the latency through the interconnect fabric,
memory controller, and DDR memories. The latency for each component is system dependent, with a minimal latency (in
GPU clock cycles) required for optimal performance.

The second part is data latency which depends on priorities assigned to each functional unit. This data delay includes the
number of GPU cycles required to receive data through arbitration. In one example, the display controller can be assigned the
highest priority since it needs to refresh the screen. If the display controller is not assigned a high priority and all devices
request access to the interconnect or data in external DDR memory, then the display controller may be starved of data and
cause a display refresh glitch or incorrect display update. This display glitch will cause negative user impact since it is easily
visible. Engine priorities need to be balanced and analyzed to ensure sufficient latency and bandwidth during peak data
access.




To overcome bus and data latency, the number of outstanding requests needs to be correctly defined to make sure the GPU
is not data starved waiting for data to come back from system memory. This can be determined by the total latency to ensure
the GPU latency is held under 200 cycles. The general formula for calculating the number of outstanding requests is:

                                                       ,   ,               ,   ,                ,   ,
                                  ,    ,

OR = Number of Outstanding Requests
L = Latency in Bus Cycles = Lbus (Bus Latency) + Ldata (Data Latency)
N = Burst Length in Bus Cycles, Dependent on Bus Width and Bus Size (Bytes)

The number of outstanding requests in the GPU also affects the amount of storage available in the GPU for the FIFO, and we
can consider each entry in the FIFO as one outstanding transaction.

In the Vivante GC Core design, latency is hidden through other mechanisms including multi-threading, parallel execution, pre-
fetching, efficient use of cache, memory optimizations such as burst building, request merging, compression, smart banking
and other innovations. All these parts need to be considered when designing the GPU sub-system.


                                                                        Page 4 of 4                       Ver. 1.0 / January 2012

More Related Content

What's hot

Windows 7 and Windows Server 2008 R2 SP1 Overview
Windows 7 and Windows Server 2008 R2 SP1 OverviewWindows 7 and Windows Server 2008 R2 SP1 Overview
Windows 7 and Windows Server 2008 R2 SP1 Overview
Amit Gatenyo
 
JVC GY-HM650 Camcorder
JVC GY-HM650 CamcorderJVC GY-HM650 Camcorder
JVC GY-HM650 Camcorder
AV ProfShop
 
Blackmagic design cinema camera with ef mount
Blackmagic design cinema camera with ef mountBlackmagic design cinema camera with ef mount
Blackmagic design cinema camera with ef mount
Topend Electronics
 

What's hot (8)

Mips track a
Mips   track aMips   track a
Mips track a
 
Dvbshop
DvbshopDvbshop
Dvbshop
 
Windows 7 and Windows Server 2008 R2 SP1 Overview
Windows 7 and Windows Server 2008 R2 SP1 OverviewWindows 7 and Windows Server 2008 R2 SP1 Overview
Windows 7 and Windows Server 2008 R2 SP1 Overview
 
Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform Qualcomm Snapdragon Processors: A Super Gaming Platform
Qualcomm Snapdragon Processors: A Super Gaming Platform
 
JVC GY-HM650 Camcorder
JVC GY-HM650 CamcorderJVC GY-HM650 Camcorder
JVC GY-HM650 Camcorder
 
AM37x EVM
AM37x EVM AM37x EVM
AM37x EVM
 
ATI Graphic Card Driver Issues
ATI Graphic Card Driver IssuesATI Graphic Card Driver Issues
ATI Graphic Card Driver Issues
 
Blackmagic design cinema camera with ef mount
Blackmagic design cinema camera with ef mountBlackmagic design cinema camera with ef mount
Blackmagic design cinema camera with ef mount
 

Similar to GPU Latency Analysis

directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
Heiko Joerg Schick
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
anil0878
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systems
Pradeep Kumar TS
 
Design of embedded systems tsp
Design of embedded systems tspDesign of embedded systems tsp
Design of embedded systems tsp
Pradeep Kumar TS
 
Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...
Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...
Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...
Léia de Sousa
 
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
Alan Su
 

Similar to GPU Latency Analysis (20)

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
UNIT 1 SONCA.pptx
UNIT 1 SONCA.pptxUNIT 1 SONCA.pptx
UNIT 1 SONCA.pptx
 
Gpu
GpuGpu
Gpu
 
Avionics Paperdoc
Avionics PaperdocAvionics Paperdoc
Avionics Paperdoc
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
 
Cloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and OutlookCloud Gaming Onward: Research Opportunities and Outlook
Cloud Gaming Onward: Research Opportunities and Outlook
 
Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3Physical computing and iot programming final with cp sycs sem 3
Physical computing and iot programming final with cp sycs sem 3
 
TenAsys.Fall07
TenAsys.Fall07TenAsys.Fall07
TenAsys.Fall07
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 
Kairos aarohan
Kairos  aarohanKairos  aarohan
Kairos aarohan
 
Design of embedded systems
Design of embedded systemsDesign of embedded systems
Design of embedded systems
 
Design of embedded systems tsp
Design of embedded systems tspDesign of embedded systems tsp
Design of embedded systems tsp
 
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
 
Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...
Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...
Chrome server2 print_http_www_uni_mannheim_de_acm97_papers_soderquist_m_13736...
 
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
Optimizing Apple Lossless Audio Codec Algorithm using NVIDIA CUDA Architecture
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAMulti-coreSoftwareHardwareCo-DebugPlatform_Final
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
 

More from Benson Tao (6)

Composition Graphics Processor
Composition Graphics ProcessorComposition Graphics Processor
Composition Graphics Processor
 
GPU - HD Video White Paper
GPU - HD Video White PaperGPU - HD Video White Paper
GPU - HD Video White Paper
 
GPU - DisplayPort Interface
GPU - DisplayPort InterfaceGPU - DisplayPort Interface
GPU - DisplayPort Interface
 
GPU - HDMI White Paper
GPU - HDMI White PaperGPU - HDMI White Paper
GPU - HDMI White Paper
 
GPU - DirectX 10.1 Architecture White paper
GPU - DirectX 10.1 Architecture White paperGPU - DirectX 10.1 Architecture White paper
GPU - DirectX 10.1 Architecture White paper
 
GPU - DirectX 10 Architecture White Paper
GPU - DirectX 10 Architecture White PaperGPU - DirectX 10 Architecture White Paper
GPU - DirectX 10 Architecture White Paper
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

GPU Latency Analysis

  • 1. Technical Note For Vivante GC Cores Latency and Outstanding Request Analysis Version 1.0 January 2012
  • 2. Technical Note for Vivante GC Cores Revision History Version Date Author Description 1.0 January 16, 2012 Benson Tao -Initial release Page 2 of 4 Ver. 1.0 / January 2012
  • 3. Technical Note for Vivante GC Cores 1 General System Performance Analysis Q: Can you please provide additional details on load balancing and recommendations on optimizing CPU and GPU interactions for peak system performance? In a graphics subsystem, an application such as a 3D game or fancy GUI calls different APIs to access the graphics hardware through programming calls to the operating system. When the application requests an image to be rendered onscreen, the API calls the OS which in turn invokes the GPU driver to communicate with the GPU hardware to draw the image to be shown onscreen. From a CPU perspective, the CPU accumulates and sets up graphics commands that are dispatched to the GPU for processing and display rendering. As graphics performance and screen/object details increase with advances in technology an optimized method for CPU-GPU communication must be maintained to ensure external/internal bandwidth availability, internal communications with minimal latencies, cache coherence, and optimizations (system, processor, GPU), access priorities for different system blocks and the GPU/CPU (ex. starving the display controller from accessing the GPU screen data will cause display flickering and a negative user experience). In general, since each SoC design has specific requirements, there needs to be a balance between all resources, communications (AXI/AHB, OCP, NoC, proprietary, etc.), memory interfaces, OS/system specific optimizations, chip floor planning, graphics compression technologies, and effective use of memories in the hierarchy (registers, caches, system memories, efficient banking, coherency, single/dual port RAMS, access speeds, etc.). Since each design is customized, we help provide test vectors and performance traces to test the graphics subsystem which can be included in a customer’s full chip test simulation to test overall system loading. We also work with the customer directly based on application type/usage/requirements in addition to analyzing CPU and system resources to recommend the best graphics core (and any optimizations/derivative cores) based on power, size, performance, and features. 2 Latency Analysis SoC architectures consist of multiple processing engines (CPU, 3D, video, etc.), memory units (DDR, caches, etc.) and I/O blocks (network, USB, HDD, Flash, wireless, RF, etc.). All these functional blocks integrated into the SoC define the product, target market, performance, features and key differentiations that are key selling points of the chip and end device. Each processing engine like the CPU, video and 3D graphics have different (data) traffic profiles which effect bandwidth, latency and performance requirements. A well-defined SoC interconnect scheme capable of handling multiple processing units in parallel with minimal or no performance degradation is ideal in today’s multi-tasking environment (ex. video + composition or video + 3D graphics + composition). In addition to the efficient interconnect (AXI, ACE-Lite, NoC, etc.) design, a well architected memory controller subsystem is required to match the bandwidth, latency and data transfer requirements needed by each processing unit. A designer does not want to have a high speed interconnect coupled with a low speed memory controller (MC) and memory bus (speed, width), since the MC will be the bottleneck. As system complexity increases, the overall performance is determined by the engine speed, interconnect design, and memory system to provide sustained data bandwidth to the engine while also meeting the real time performance goals of latency sensitive traffic. To design an optimized system, all parts of the data transfer from initiator to destination must be analyzed in totality. In the following paragraphs we will focus on latency considerations along with the number of outstanding requests. Latency in a GPU design takes into account interconnect (bus) and data latencies. Bus latency is the total roundtrip latency from the GPU (initiator) through the interconnect external DDR memory back to the GPU, as depicted below: Page 3 of 4 Ver. 1.0 / January 2012
  • 4. Technical Note for Vivante GC Cores The total bus latency is the addition of all the latency summed as a request or data is transferred from the GPU. In the image above, the total latency is the sum of the latency from one to six, which includes the latency through the interconnect fabric, memory controller, and DDR memories. The latency for each component is system dependent, with a minimal latency (in GPU clock cycles) required for optimal performance. The second part is data latency which depends on priorities assigned to each functional unit. This data delay includes the number of GPU cycles required to receive data through arbitration. In one example, the display controller can be assigned the highest priority since it needs to refresh the screen. If the display controller is not assigned a high priority and all devices request access to the interconnect or data in external DDR memory, then the display controller may be starved of data and cause a display refresh glitch or incorrect display update. This display glitch will cause negative user impact since it is easily visible. Engine priorities need to be balanced and analyzed to ensure sufficient latency and bandwidth during peak data access. To overcome bus and data latency, the number of outstanding requests needs to be correctly defined to make sure the GPU is not data starved waiting for data to come back from system memory. This can be determined by the total latency to ensure the GPU latency is held under 200 cycles. The general formula for calculating the number of outstanding requests is: , , , , , , , , OR = Number of Outstanding Requests L = Latency in Bus Cycles = Lbus (Bus Latency) + Ldata (Data Latency) N = Burst Length in Bus Cycles, Dependent on Bus Width and Bus Size (Bytes) The number of outstanding requests in the GPU also affects the amount of storage available in the GPU for the FIFO, and we can consider each entry in the FIFO as one outstanding transaction. In the Vivante GC Core design, latency is hidden through other mechanisms including multi-threading, parallel execution, pre- fetching, efficient use of cache, memory optimizations such as burst building, request merging, compression, smart banking and other innovations. All these parts need to be considered when designing the GPU sub-system. Page 4 of 4 Ver. 1.0 / January 2012