Tegra 4 outperforms snapdragon
Upcoming SlideShare
Loading in...5

Tegra 4 outperforms snapdragon






Total Views
Views on SlideShare
Embed Views



20 Embeds 2,297

http://blogs.nvidia.com 2148
http://feeds.feedburner.com 68
http://www.nvidia-apac.com 23
http://www.cadalyst.com 17
http://blogs.nvidia.cn 13
http://androidsamo.blogspot.hk 7
https://blogs.nvidia.cn 4
http://dev.newsblur.com 3
http://www.newsblur.com 2
http://newsblur.com 2
http://www.aftercollege.com 1
http://androidsamo.blogspot.gr 1
http://androidsamo.blogspot.tw 1
http://webcache.googleusercontent.com 1
http://androidsamo.blogspot.kr 1
http://nvinfo.nvidia.com 1
http://webdomino.net 1
http://cw4.videotron.ca 1
http://bazqux.com 1
http://translate.googleusercontent.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Tegra 4 outperforms snapdragon Tegra 4 outperforms snapdragon Document Transcript

  • TEGRA 4 OUTPERFORMS SNAPDRAGON Nvidia Emphasizes Performance but Faces Power Challenge By Kevin Krewell (March 11, 2013) ................................................................................................................... Nvidia released benchmark results, performance char- Nvidia says that Tegra 4’s die size is in the mid-80mm2acteristics, and architecture details of its Tegra 4 application range. Once a real die teardown becomes available, we willprocessor at the 2013 Mobile World Congress (MWC). The be able to confirm the size and layout (to date, Nvidia has1.9GHz Cortex-A15-based device is on track to ship in only provided artistic renderings of the Tegra 4 die).2Q13. The company also announced its first smartphonecustomer, ZTE, which will build a phone using Tegra 4 Benchmark Battlesalong with Nvidia’s i500 LTE cellular modem. The unnamed The clear target of Nvidia’s performance comparisons isZTE handset is set to launch in mid-2013 in China. (ZTE Qualcomm’s Snapdragon processors, which have picked upcurrently ships the Mimosa X, the only smartphone that numerous design wins since their introduction. The firstcombines Tegra 3 with the i450 HSPA modem.) Krait-based Snapdragon chip had a six-month jump on When Nvidia originally announced Tegra 4, it pro- Cortex-A15-based processors, and Qualcomm is alreadyvided little performance data, but it did demonstrate work- sampling new processors with higher performance (see MPRing silicon (see MPR 1/18/13, “Tegra 4 Shows First Quad 1/18/13, “Qualcomm Krait 400 Hits 2.3GHz”). Despite thatA15”). At MWC, the company continued its product reve- company’s early lead, Nvidia is now ready to challenge thelation by disclosing new details of the architecture and its Snapdragon CPUs on performance.capabilities (as well as limitations). Judging from this pre-liminary information, Tegra 4 rates as the highest-perform- Benchmark Category Resulting mobile ARM processor—if power constraints don’t SpecINT2000 1168throttle the cores. This performance takes aim at Qual- Sunspider 0.91 506ms Web Page Load 28seccomm’s newest Snapdragon processors; Nvidia offered a WebGL Aquarium (50 fish) 60fpswide range of benchmark results that clearly showed Tegra 4 Google Octane 4582leading both the APQ8064 and (judging from our estimates) CPU & System Kraken 1.1 6799msthe forthcoming Snapdragon 800. Geekbench 1.0 4285 Despite Tegra 4’s more powerful Cortex-A15 CPUs Antutu 3.1.1 36127 Quadrant Pro 2.0 16449and much improved graphics unit, Nvidia claims the chip CFBench 1.1 41227will increase battery life under normal user workloads com- GLBench 2.5 HD Egyptpared with Tegra 3. Power efficiency improves through the 57fps (1080p offscreen)use of 28nm HPL transistors, which reduce static leakage, GPU GLBench 2.5 HD Classic 274fpsand because the more powerful A15 core can process typical (720p offscreen)workloads at a lower clock frequency and core voltage Basemark ES 2 Hoverjet 59fps(reducing active power). In contrast, Nvidia chose a higher- Table 1. Tegra 4 benchmark scores. Nvidia conducted theseperformance 28nm HPM process for its Tegra 4i processor, tests on a Tegra 4–based development system, which em-which is not as constrained by power (see MPR 3/11/13, ploys a 1.9GHz Tegra 4 processor with DDR3-1866 DRAM“Tegra 4i Expands Market”). and no thermal constraints. (Source: Nvidia)© The Linley Group • Microprocessor Report March 2013
  • 2 Tegra 4 Outperforms Snapdragon Nvidia published a number of benchmark scores to As Figure 1 shows, we compared these scores to third-prove its performance superiority over Krait. On these tests, party testing of Qualcomm’s APQ8064 in an off-the-shelfa 1.9GHz Tegra 4 outperforms a 1.5GHz APQ8064 by 2x to smartphone, the Google Nexus 4, which uses LPDDR2-10004x. Even when we project the increased performance of the (barely half the speed of the DRAM in Nvidia’s test plat-next-generation Qualcomm Series 800 processor, Nvidia form) and has strict thermal limitations. On most bench-holds a clear performance advantage. All of Nvidia’s testing, marks, Tegra 4 outperforms the APQ8064 by 2x to 2.5x,however, used a PC-style development platform configured even though it has only a 1.27x higher clock speed (1.9GHzwith DDR3-1866 memory. vs. 1.5GHz). Nvidia prefers the SPEC integer benchmark, as it offers We also estimated the performance of the Snap-a more complete system test than most mobile benchmarks, dragon Series 800, assuming that it will achieve its ratedtaking into account factors such as memory size and latency 2.3GHz clock speed and applying a 10% gain to representand hard-drive speed. Nvidia used a vanilla GCC compiler the improvements from the Krait 200 CPU to the Krait 400(version 4.7) with no specific optimizations for Tegra 4—a CPU. This approach is probably optimistic, as most bench-step in the right direction for fairness. The single SPEC scoremark scores do not improve linearly with CPU speed,reported is the geometric mean of the individual normalized although the Series 800 will also get a boost from fasterapplications in the suite. With ARM entering the server DRAM. Based on these estimates, even Qualcomm’s bestmarket and already dominating the tablet market, using processor, which is due to enter production at about thethis benchmark makes sense, but we expect server vendors same time as Tegra 4, won’t surpass Nvidia in these tests.will chose SPECint2006 instead, avoiding the outdated The likely reason for this performance advantage is aSPECint2000 that Nvidia uses. The company also provided a number of optimizations to Cortex-A15’s microarchitec-SPECint2006 score for Tegra 4: 8.08. ture and cache subsystem. Compared to Qualcomm’s Krait Fortunately, Nvidia’s benchmark testing didn’t stop at CPU, Cortex-A15 has a larger reorder buffer to reduceSPEC. The company published an extensive set of bench- pipeline stalls and improve the instruction issue rate.mark numbers including CPU and system results as well as Nvidia did not publish scores for Dhrystone and Core-graphics results, all shown in Table 1. These tests were Mark. Both are CPU-centric, especially Dhrystone, which isperformed under best case thermal conditions without the so small it easily fits into an L1 cache. As such, Dhrystone isneed to throttle the processors. The results are excellent. Fora poor proxy for system-level performance. These tests areexample, the GLBenchmark 2.5 HD Egypt (1080p offscreen) among the few in which Qualcomm outscores Nvidia.score beats the best reported score for a shipping product Although the range of system-oriented benchmarks(52fps for the 4th generation Apple iPad). where Nvidia decisively outperformed the Qualcomm S4 Pro is extensive, definitive comparisons of450% Tegra 4 and the Snapdragon 800 must wait until both ship in real systems in 2Q13.400% Tegra 4 Given Tegra 4’s performance capabilities, designers must be careful to keep power in350% Snapdragon 800† check. Nvidia provided only limited power APQ8064 data for Tegra 4, and no data for the power300% at full throttle, but the initial data indicated250% that power is a problem for the quad A15 cores. Similar power concerns have so far200% limited Samsung’s Cortex-A15-based Exynos 5250 to netbook-size devices.150% To help ameliorate Tegra 4’s power, Nvidia invested in a fifth power-saving100% Cortex-A15 core that is limited to 825MHz 50% and optimized for low power. The com- pany says its latest design improves battery 0% life compared with Tegra 3—a claim that is Google Antutu Quadrant CFBench Geekbench GLBench likely true for moderate workloads where Octane 3.1.1 Pro 2.0 1.1 1.0 2.5 HD* the A15 can run near its minimum coreFigure 1. Tegra 4 benchmark scores relative to Qualcomm’s Snapdragon voltage, whereas Tegra 3’s A9 cores wouldAPQ8064. Scores are based on a 1.5GHz APQ8064 in the Nexus 4 and a have to work harder and use a higher core1.9GHz Tegra 4 in its development system. *Egypt 1080p Offscreen. (Source: voltage. Applications such as video encod-Nvidia, CPUBoss.com, GLBenchmark.com, Engadget, and Anandtech; except ing and web-page rendering with Flash ani-†Linley Group estimates) mation represent typical workloads on© The Linley Group • Microprocessor Report March 2013
  • Tegra 4 Outperforms Snapdragon 3mobile devices; these tasks can place moderate stress on and the dual-channel DRAM controller. Using only onethe CPU. As it approaches its maximum clock speed, how- active CPU, the processor can run at its maximum clockever, the Cortex-A15’s power becomes problematic. speed with less concern about hitting the thermal limit. Thus, the single-core SPEC score is unlikely to scale lin-Nvidia Tries to Resurrect SPECint2000 early across all four cores.Nvidia published many benchmarks, but it decided to focus Despite those caveats, SPECint is often viewed as one ofon an unorthodox choice: the SPECint2000 benchmark. the few challenging cross-platform and instruction-set-Nvidia promotes the SPEC benchmark because it can stress agnostic benchmarks. Running the benchmark requires com-memory design, not just the CPU or GPU in isolation. This pilation of C and C++ source code, and it enables compar-version, though obsolete, can be run in mere minutes, rather isons among ARM, MIPS, PowerPC, SPARC, and x86than the hours required for the newer 2006 edition. As such, architectures. We applaud Nvidia for publishing SPEC scoresthe company believed the SPECint2000 benchmark would and hope that other ARM-processor vendors follow suit.appeal to the testers at enthusiast and gadget web sites. Nvidia’s use of SPECint2000 is problematic, however, GeForce ULP Evolvesfor three main reasons: SPEC designed its CPU benchmarks As the first quad-core application processor, Tegra 3 won a(including SPECint) for workstation and server workloads, number of high-profile designs in 2012, including thenot smartphones; SPECint2000 was retired in 2007 and is Google Nexus 7 and Microsoft Surface (RT) tablets. As new-inactive; and the benchmark only measures single-core/ er competing processors came to market, Tegra 3 becamesingle-thread performance. the favorite subject of competitive benchmarks. The chip’s Nvidia addressed the first concern by breaking out the GPU quickly proved to be underpowered compared with12 individual components of SPECint2000 to show how processors such as the Apple A6 and A6X, HiSilicon K3V2,each can apply to modern PC and mobile applications (see Intel Clover Trail, and Qualcomm Snapdragon S4 Pro, es-Table 2). Some associations are obvious, such as 186.crafty, pecially as screen resolutions surpassed 1080p. Nvidia com-which plays a game of chess. The use of 300.twolf and pensated with a strong developer program, carefully opti-175.vpr as routing algorithms for a navigation proxy is mizing games for the Tegra 3 GPU.original, however, although most smartphone navigation With Tegra 4, Nvidia made a significant move to cor-routing is performed in the cloud. rect the graphics-performance deficiency. In the 40nm pro- The SPEC organization retired this benchmark in cess node, Nvidia focused on the CPUs. The shrink to a2007, when it published the successor: SPECint2006. Since 28nm process enabled the company to assign many morethen, SPEC hasn’t accepted or published new SPECint2000 transistors to graphics, even despite the larger A15 CPUscores, making comparison with other contemporary pro- cores. It increased the total number of shaders to 72 incessors difficult. Qualcomm, for example, has not published Tegra 4 from only 12 in Tegra 3.any SPEC scores for its processors. At 1.9GHz, Tegra 4achieved a SPECint2000 score of 1,168 (base), slightly lag- Equivalent Mobileging that of a 1.8GHz single-core AMD Opteron 144 pro- Benchmark Category Application / Use Casecessor, which earned a score of 1,240 in 2005. The problem 300.twolf Place and route simulator Navigation / directionsremains that the scores reported on the SPEC web site are 175.vpr FPGA circuit placement Navigation / directionsmostly for server and workstation processors from 2007 and routing 181.mcf Combinatorial optimization Navigation / directionsand earlier; they include no ARM processors (in 2007, 252.eon Computer visualization 3D object recognitioncompanies were shipping ARM11 processors, which would Object-oriented Contacts, mail, 255.vortexperform poorly on SPEC workloads). database search, playlist The SPEC-supported 2006 edition of the benchmark Keyboard wordhas similar elements to the 2000 edition, but the scores 197.parser Word processing prediction and language parsingfrom the 2000 and 2006 versions cannot be compared di- PERL programming Webpages withrectly because the individual component benchmarks, test- 253.perlbmk language Javascripting methodology, reference computer, and the score scal- Group theory, Webpages with 254.gaping all changed. interpreter Javascript Given this situation, Tegra 4 cannot be compared with C Programming 176.gcc Java applications (JIT) language compilerother recent processors using SPEC. Nvidia disclosed a 186.crafty Game Playing: Chess Gaming logicSPEC score for Qualcomm’s APQ8064, but no data from 256.bzip2 Compression File operations ZIP, JPEGQualcomm or third parties exists to validate this score. 164.gzip Compression File operations ZIP, JPEG Another problem is that SPECint is composed of Table 2. Individual SPECint2000 components. Even though thissingle-threaded applications that use only one CPU core. benchmark was originally designed for server and workstationThe other three cores will likely enter a sleep state, giving processors, this chart shows how the original applications canthe lone active core exclusive access to the 2MB L2 cache apply to modern mobile workloads. (Source: Nvidia)© The Linley Group • Microprocessor Report March 2013
  • 4 Tegra 4 Outperforms Snapdragon In Nvidia terminology, each shader “core” is a pro- compared with Tegra 3’s GPU, but in a few key areas, Nvidiagrammable element that can execute a multiply-add (on a avoided making more-radical changes. Lacking unified FP32single pixel); by contrast, for Imagination Technologies, a shaders, Nvidia cannot support Microsoft’s DirectX 11core is a licensed IP block (which can have multiple shader Shader Model 3 (SM 9_3); so far, however, Microsoft hasALUs). required only SM 9_1 in Windows RT. In addition, Nvidia is This sixfold increase in total shaders is split between missing support for ETC2 and EAC texture compressionthe 24 vertex shaders and the 48 texture shaders, as Figure 2 and the 32-bit floating-point GLSL ES shader language forshows. The shader units also have a large register file to hide OpenGL ES 3.0.latency. Each of the six vertex units supports 4 multiply-add Without unified FP32 shaders, Tegra can’t support(MADD) operations, and the four texture units each OpenCL on the GPU, or even its own CUDA GPU-support 12 MADDs. Whereas the vertex shaders use 32-bit compute language. The lack of GPU compute isn’t a bigfloating point (FP32), the pixel shaders use only FP20. problem now, because no major application currently usesDespite the large increase, each shader only consumes a it. The most likely early candidate for GPU compute issmall amount of silicon real estate. computational photography, and Nvidia has its own solu- Nvidia is unique in that it still uses a split-shader ar- tion to this problem. At some point, Google will move thechitecture; all of its competitors have moved to a unified- Android OS to OpenGL ES 3.0, but until then, the ES 2.0shader architecture. Unified shaders can adapt to different version is sufficient.graphics programs—some complex 3D scenes benefit from Despite its lack of full SM 9_3 support, Tegra 4 ena-more vertex processing for the high number of triangles, and bles shadow maps and HDR images, as well as the DXTsome scenes require more blending as well as pixel and tex- texture format. Nvidia works with game and game-engineture processing. For Nvidia’s older design, these resources developers to employ all of Tegra 4’s graphics features,are fixed. The company’s PC GPUs have employed unified including those that are unsupported (or only partiallyshaders since the GeForce GTX8800 in 2006. The Tegra 4 supported) by standard graphics APIs.design is more akin to the much earlier Nvidia NV40 The company argues that its GPU architecture is more(GeForce 6x00) PC design from 2006 and is an evolution of die-area and power efficient than unified-shader design.the original Tegra APX 2500 graphics design from 2008. Even so, we expect next-generation Tegra processors will The improvements in Tegra 4’s GPU include lossless integrate a low-power version of the Kepler PC-graphicscolor and Z compression, along with support for 4Kx4K architecture (see MPR 4/30/12, “Nvidia Lowers the Heat ontextures. The fragment pipe now has dual channels, offering Kepler”), which uses unified shaders.1.5x the bandwidth of Tegra 3. Overall, Tegra 4’s system Most mobile graphics processors also use a tile-basedinterface to the GPU has three times the bandwidth of its deferred-rendering architecture that operates only on smal-predecessor. The new GPU improves a number of features ler sections of the display image at a time. These sections are held in a local memory with state information, and the graphics processors complete each one before moving toVertex Vertex Vertex Vertex Vertex Vertex another. Nvidia uses an immediate-mode renderer that also serves in PC graphics; this function is driven by a display list that accesses the entire screen space. Immediate-mode ren- IDX / Clip / Setup derers tend to have lower latencies, but they often have higher memory-bandwidth needs. Nvidia compensates for Raster / Early Z the bandwidth requirements of immediate-mode rendering by using L1 and L2 caches. It uses 24-bit early-Z (triangle depth) calculations to avoid rendering triangles that are in- Texture Texture Texture Texture visible in the scene, saving processing time and energy. L1 L1 L1 L1 Supporting the bandwidth needs of the enhanced GPU and the more powerful CPUs is a faster, dual-channel memory bus. The DRAM interface supports LPDDR3-1600 and DDR3-1866 today and may eventually reach DDR3-2133 L2 Cache when those parts become available. Chan 0 Memory Controller Chan 1 Heterogeneous Image Processing 32 32 Nvidia has developed a unique system for pipelined com-Figure 2. Tegra 4 GPU architecture. The vertex shaders sup- putational image processing that combines the CPU, GPU,port 32-bit floating point (FP32), but the texture shaders and a more traditional image-processing (ISP) engine thatonly support FP20. The L2 cache and dual 32-byte bus im- Nvidia calls Chimera, as Figure 3 shows. Chimera’s imageprove memory-subsystem performance. pipeline uses dual buses: a state bus controls the image data,© The Linley Group • Microprocessor Report March 2013
  • Tegra 4 Outperforms Snapdragon 5and an image bus moves the pixels. The architecture’sflow-through image processing reduces latency by F0 F1 … Fn Frame / Image Busnot storing the entire image in the cache for post- Imagesprocessing. Kernels Kernels HW-ISP Kernels Kernels The pipeline processes the image using the GPU CPU CPU GPUGPU, CPU, or both in the Bayer domain (individual BP LS AWB VI-Mux K0 K0 K0 K0 CSIcolor pixels from the sensor), then the dedicated K1 K1 EE CCM DM K1 K1hardware ISP converts the image to the YUV format Camera … … … …(Y contain image details in grayscale; U and V con- Sensor Kn Kn ᵞ YUV Kn Kntain chroma information, which is helpful for com- Statepression). The CPU, GPU, or both can postprocess S0 S1 … Sn State Busthe image in the YUV format as well. A unique imaging feature that Chimera en- Figure 3. Flow chart of Nvidia’s Chimera process. The image-ables is continual high-dynamic-range (HDR) pho- processing algorithm can use the CPU cores and GPU shaders withouttography. Nvidia has already demonstrated real- employing unified shaders or supporting OpenCL.time HDR imaging and HDR panoramic imagingusing this feature. To make an HDR image, an them busy. For example, the core has a large 128-micro-opimage processor must capture two pictures with different window to find load misses. Also keeping the core fed areexposures, match individual pixels by warping the two 32KB L1 caches, giving the A15 an advantage over Qual-images, then index and blend the images. The whole pro- comm’s Krait, which only has 16KB L1 caches.cess can consume roughly 1Gflops for an eight-megapixel The A15 stores loop targets, and its 32-entry loopimage. The GPU, with its many parallel-computing ele- buffer is larger than that of the A9. When instructions canments, makes a perfect image processor, as the texture be pulled from the loop buffer, the A15 clock-gates theshaders can easily warp and resample images. The A15 branch predictor as well as the instruction-fetch and in-CPUs can process the image without the GPU, but at much struction-decode logic blocks, saving power. Although thehigher power levels. core goes a long way to improve IPC and keep its pipeline To implement real-time HDR, Chimera extracts two fed, it stops short of a desktop-class out-of-order architec-images from the sensor with differently timed exposures. It ture because the power penalty would be too high.then dedicates GPU pixel/texture-shader elements to Nvidia’s circuit designers added value through semi-process the two images in the Bayer domain, as Figure 4 custom arrays and custom gate sizing, and the TSMCshows. The hardware ISP converts the blended image to 28nm HPL process for the larger A15 cores saves leakageYUV, and the CPU completes the tone mapping in the YUV power at the expense of somewhat lower clock speed. Fordomain to produce the final result. When all three elements example, the company normalized the performance ofwork together, the image can be processed with minimal lag Tegra 4 and Tegra 3 and showed the power-efficiency ben-by pipelining, enabling real-time HDR photography. efits. It measured Tegra 4’s power at a lower clock (ap- Because Chimera has programmable elements, it can proximately 845MHz) and core voltage, achieving aperform many image-processing tasks. Nvidia has demon- SPECint2000 score of 520 and a SPECint2000/W rating ofstrated an ability to track an arbitrary object (not just faces) 780, which translates to power consumption of 670mWusing the camera, maintaining constant focus on thatobject. It has also demonstrated modulation of multicolorLED flashes to dynamically perform white balancing in Frame Image Buffers Image Buffers Frameflash photography. For now, the company is working on Bus Busthe image-processing software in house, but we expect itwill eventually release a Chimera API to enable third-partydevelopment. State State GPU ISP CPU Bus BusUnleashing Performance and Taming PowerTegra 4’s Cortex-A15 CPUs have higher IPC (instructions Bayer Domain YUV Domainper clock cycle) performance than Tegra 3’s Cortex-A9cores, as well as a slightly higher clock speed (1.9GHz ver-sus 1.7GHz). To improve the IPC compared with the Control Flow Data Floworiginal A9, the A15 adds numerous enhancements, but Figure 4. Chimera programming model. In an HDR ex-these enhancements challenge designers to keep power ample, the GPU complex processes pixels in the Bayer do-under control. The A15 has eight functional units, and the main, and the CPU cores perform image postprocessing inmicroarchitecture is optimized for throughput to keep the YUV space.© The Linley Group • Microprocessor Report March 2013
  • 6 Tegra 4 Outperforms Snapdragon form factor, the chip should have more thermal room to Price and Availability perform. Even a typical tablet, however, will not support the full power of Tegra 4, so some throttling will still be Tegra 4 will be available in production quantities in required when all four CPUs are running. 2Q13. Nvidia declined to reveal pricing, but we esti- mate prices are in the $25–$30 range. The company has published a series of whitepapers on Tegra 4 at www. Nvidia Tries to Outmuscle Qualcomm nvidia.com/object/tegra-4-processor.html. Nvidia is not a company that dodges competition. Larger companies (TI and Freescale) have already removed their SoCs from the mobile market, instead seeking the relativewith one core running. Tegra 3, by contrast, only earned a stability of the embedded market. Nvidia is committed to450 SPECint2000/W rating, placing the A9 core’s power at fighting it out in the mobile market and has focused most1.15W. In minimal-power mode, the quad A15 CPUs will of its attention on the 500-pound gorilla of mobile proces-draw less than 3W. But we don’t know how the processor sors: Qualcomm. And the Tegra 4 launch was clearly awould behave under workloads that push all four cores to frontal assault on Qualcomm’s performance.the maximum clock speed of 1.9GHz. On the lab bench, Nvidia’s Cortex-A15 CPU core As we noted earlier, the SPECint benchmark only clearly outperforms Qualcomm’s Krait CPU. Real tabletsstresses one core. Judging from the power dissipation at and smartphones, however, will likely use slower memory845MHz, we can assume the processor voltage is cut to than the PC-style DDR3 in Nvidia’s test system. More im-0.8V for low-power operation. At 1.9GHz, assuming the portant, few products have the thermal capacity to run allcore voltage is the nominal 1.0V for the 28nm HPL process four Tegra 4 CPUs at their maximum speed, even for short(and ignoring the effect of static power dissipation, as this periods. For these reasons, Tegra 4 systems will not matchis a low-leakage process), the lone core consumes about Nvidia’s quoted performance. But until we see actual2.35W (P=CfV2). If all four A15 cores were fully engaged at products, we don’t know how big the shortfall will be.this power level, the processor would draw 9.4W (not in- Nvidia’s published scores are far enough ahead that somecluding GPU or I/O power). If the cores require overvolt- degradation may be acceptable.age to hit the maximum clock speed, the power situation Tegra 4 delivers outstanding benchmark results,would be even worse. showing off its considerable capabilities. These results Obviously, a 10W processor can’t serve in a smart- seem to have caught Qualcomm by surprise and exposephone without extensive clock throttling. Smartphones will some deficiencies in the memory subsystem of the Krait-likely run Tegra 4 at lower clock speeds, with only a single based processors. Qualcomm, however, has designed itsA15 core allowed to approach the 1.9GHz rate. The quad- CPU for power efficiency, whereas Tegra 4 delivers maxi-A15 cores seem overpowered for a smartphone design and mum performance at maximum power. Snapdragon willwould better serve a tablet with its higher-resolution challenge Tegra 4’s performance in smartphones, but inscreens and more-PC-like use models. In the larger tablet tablets, Tegra 4 will really shine. ♦ To subscribe to Microprocessor Report, access www.linleygroup.com/mpr or phone us at 408-270-3772.© The Linley Group • Microprocessor Report March 2013