JAN U A RY 2 0 1 3PROCESSORADVANCEMENTSFOR EMBEDDEDOPENCL™AND PARALLELPROCESSINGAMDE M B E D D EDAPUSOLUT IONSG UID E     ...
Introducing                               open source, infinite possibilities  Unleash Your Inner Inventor With  GizmoSpher...
AMD EMBEDDED APU SOLUTIONS GUIDEAPUs and ProcessorAdvancementsfor EmbeddedApplications                                    ...
AMD EMBEDDED APU SOLUTIONS GUIDEAMD EMBEDDED R-SERIES PLATFORMDelivering exceptional performance in apower efficient platf...
AMD EMBEDDED APU SOLUTIONS GUIDE                       micro-ATX                                                          ...
AMD EMBEDDED APU SOLUTIONS GUIDE                        Digital Signage                                                   ...
AMD EMBEDDED APU SOLUTIONS GUIDE                       Digital Gaming                                                     ...
AMD EMBEDDED APU SOLUTIONS GUIDE                                                                                          ...
AMD EMBEDDED APU SOLUTIONS GUIDEand associated with a specific compute                                                    ...
AMD EMBEDDED APU SOLUTIONS GUIDEprovided by OpenCL is a good fit for many        that allow and sometimes require some lev...
AMD EMBEDDED APU SOLUTIONS GUIDEAMD EMBEDDED G-SERIES PLATFORMThe world’s first combination of low-powerCPU and advanced G...
AMD EMBEDDED APU SOLUTIONS GUIDE                        Fanless Industrial                                                ...
AMD EMBEDDED APU SOLUTIONS GUIDE                           Semi-Industrial Mini-                                          ...
AMD EMBEDDED APU SOLUTIONS GUIDE                         PC/104+ module                                                   ...
AMD EMBEDDED APU SOLUTIONS GUIDE                                                                                          ...
AMD EMBEDDED APU SOLUTIONS GUIDEtem that was compact and low power            faster-than-real-time processing, CASS      ...
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
AMD Embedded Solutions Guide
Upcoming SlideShare
Loading in...5
×

AMD Embedded Solutions Guide

1,816

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,816
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

AMD Embedded Solutions Guide

  1. 1. JAN U A RY 2 0 1 3PROCESSORADVANCEMENTSFOR EMBEDDEDOPENCL™AND PARALLELPROCESSINGAMDE M B E D D EDAPUSOLUT IONSG UID E ANDROID GOES BEYOND GOOGLE APUs SOAR IN REAL-TIME IMAGE PROCESSINGE XCL USIVE I N S ID E : NEW DE V E L O P M E N T A N D E V A L U A T I O N B O A R D
  2. 2. Introducing open source, infinite possibilities Unleash Your Inner Inventor With GizmoSphere, An Embedded Development Environment & Community GizmoSphere enables developers to create embedded design solutions plus communicate ideas and shared interests with a global community of embedded innovators. At GizmoSphere.org you can: • Join discussion groups • Buy the Gizmo Explorer Kit • View diagrams and read guides • Access open source software • Even enter contests! www.gizmosphere.org Gizmo Board Features Gizmo Explorer Kit• Multicore computing and mixed core architecture with Your development kit includes: a performance capacity of 52.8 GFLOPS • Gizmo, a compact 4”x4” development board powered• High-speed card edge connector enables PCIe, SATA, by the AMD Embedded G-Series APU USB, Display Port • Explorer, an expansion board for additional I/O• Low-speed card edge connector supports SPI, I2C, functionality including SPI, I2C, GPIO, PWM GPIO, PWM, ADC, DAC • SmartProbe® by Sage, the automated, configurable• Perfect for development with open source coreboot®, plug-in development tool for AMD-based embedded including SeaBIOS and SageBIOS™ designs• Board also supplies JTAG header, VGA video output, • Trial license for the Sage EDK graphical interface (IDE) Audio input/output, Ethernet, USB and SmartProbe® trial time use• Supports both high performance PC-style I/O and • On-board SageBIOS™, a distribution of open source easily accessible embedded I/O coreboot®• AMD Embedded G-Series APU boasts low power • Installation DVD and Quick Start Guide consumption – only 6.4W max TDP • Power Supply with U.S.-standard power cord• Convenient compact size – just 4” x 4” All For The Unbelievably Low Price Of $199!!• Advanced features include 64-bit processing and GizmoSphere partners: hardware virtualization• Super energy efficient at an amazing 8.25 GFLOPS/W AMD Sage Electronic Engineering For information on becoming a GizmoSphere partner visit: www.gizmosphere.org/partners
  3. 3. AMD EMBEDDED APU SOLUTIONS GUIDEAPUs and ProcessorAdvancementsfor EmbeddedApplications Kelly GillilanA Marketing Manager,By Kelly Gillilan, Marketing Manager, AMD Embedded Solutions AMD Embedded Solutions t the beginning of a new tectures able to handle multi-threaded Kelly Gillilan has worked extensively in embedded applications for most of year, it’s interesting to applications efficiently. However, there the past decade. He currently is the look back at the techni- are limits to how many cores a design Product Marketing Manager for the AMD cal advancements that can incorporate before the throughput Embedded Solution division, overseeing have occurred in the pri- performance levels off—that’s because worldwide marketing strategy and activities. He holds a degree in Computeror 12 months. of the increase in “overhead” to manage Engineering and is fluent in Mandarin Take AMD’s Embedded Accelerated such an architecture. GPUs also have Chinese.Processing Units (APUs) for example. undergone a significant transforma-In the past year we launched the high- tion—from simply driving a display toperformance AMD Embedded R-Series driver-based programs to system-based lation. Some of our partners have com-APU platform consisting of quad- and programming models with power effi- bined the AMD R-Series APU—whichdual-core models running at 2.3 GHz (3.2 ciency enhancements. supports four independent displays—GHz boost) with AMD Radeon™ 7000 se- The next phase in processor evolu- with the AMD Radeon™ E6760 embed-ries integrated graphics providing more tion is currently unfolding. AMD’s APUs ded discrete GPU—which supports sixthan 570 GFLOPS of performance in combine multiple x86 CPU cores to han- independent displays—to create systemsjust a 35 W TDP. Shortly after the launch dle serialized data with dozens or even capable of driving a total of 10 indepen-of the AMD R-Series APUs, we released hundreds of compute units in the GPU. dent displays. These types of systems area new addition to our low-power AMD These cores process parallelized data to ideal solutions for applications such asEmbedded G-Series APU line—the 4.5 provide a heterogeneous system architec- casino gaming and digital signage, whereW TDP AMD T-16R APU for power-sen- ture with excellent performance potential multimedia content must be large, bold,sitive applications requiring efficient per- in low-power bands. and eye-catching. Metrics for these sys-formance and high-definition graphics. What does this mean for embedded tems also can be efficiently processed byThese two families of APUs service dif- applications? Over the past year I have using programming languages such asferent market segments, but both provide seen our partners develop hardware so- OpenCL™ that compile specifically forthe unique combination of a powerful, yet lutions in very small form factors (such heterogeneous system designs such aspower-efficient CPU with a discreet-level, as Qseven) that are able to drive two those based on AMD’s APUs.high-performance GPU for a heteroge- full-HD independent displays from a I can’t predict the future, but one thingneous system architecture. fanless, compact enclosure. Systems like is certain: as these technologies continue to So how did we get here? We’ve seen these are ideal for powering industrial evolve and new innovations are introduced,CPUs transition from single-core ar- control and factory automation systems embedded system designers and integratorschitectures, where performance boosts as the industry phases out old arrays of will capitalize on new application and mar-typically were accomplished by increas- buttons, knobs, and switches in favor of ket opportunities that can lead to increaseding clock speed, to multi-core archi- touch-panel controls with 3D manipu- revenue streams. WWW. AMD . C OM/ EMBED D ED / C ATALOG 03
  4. 4. AMD EMBEDDED APU SOLUTIONS GUIDEAMD EMBEDDED R-SERIES PLATFORMDelivering exceptional performance in apower efficient platform.The AMD Embedded R-Series platform delivers high-performance processing coupled witha premium high-definition visual experience in a solution that is still power efficient, enablingunprecedented integrated graphics and multi-display capabilities in embedded applications thatcan be compact and low power.The AMD R-Series APU (Accelerated Processing Unit) is designed to efficiently handle your advancedmultimedia and computational workloads. With average power below 13 watts and discrete-class AMDRadeon™ graphics performance integrated into the AMD R-Series APU, applications that previouslyrequired a discrete graphics card can be developed in smaller form factors with lower power and cost.For more demanding graphics applications, AMD Radeon™ Dual Graphics technology can combinethe processing power of AMD R-Series APUs and AMD Radeon™ Embedded 6000 Series GPUs tomore than double graphics performance compared to using discrete graphics alone. x86 Core Clock Speed DDR3 x86 AMD-V™ Model Base/Boost L2 Cache GPU Speed Cores UVD1 3 Tech. 2 EVP3 Package Max TDPAMD Embedded R-Series APU – FS1r2 PGA R-464L 2.3/3.2 GHz AMD Radeon™ HD 7660G 2MB x 2 4 R-460H 1.9/2.8 GHz AMD Radeon™ HD 7640G FS1r2 DDR3-1600 Yes Yes Yes 35W R-272F 2.7/3.2 GHz AMD Radeon™ HD 7520G (722-PGA) 1 MB 2 R-268D 2.5/3.0 GHz AMD Radeon™ HD 7420GAMD Embedded R-Series APU – FP2 BGA R-460L 2.0/2.8 GHz AMD Radeon™ HD 7620G 25W 2 MB x 2 4 R-452L 1.6/2.4 GHz AMD Radeon™ HD 7600G FP2 19W DDR3-1600 Yes Yes Yes R-260H 2.1/2.6 GHz 2 MB AMD Radeon™ HD 7500G (827-BGA) 2 17W R-252F 1.7/2.3 GHz 1 MB AMD Radeon™ HD 7400G1. Unified Video Decoder (UVD 3) for hardware decode of high definition video. 2. AMD Virtualization™ technology. When used as part of a DAS 1.0 implementation can improve the performance, reliability and security of embedded applications.3. As part of a comprehensive security program, AMD strongly recommends enabling Enhanced Virus Protection (EVP) and using up-to-date third- party anti-virus software.Note: Always refer to the processor/chipset data sheets for technical specifications. Feature information in this document is provided for reference only.04 J ANU ARY 201 3
  5. 5. AMD EMBEDDED APU SOLUTIONS GUIDE micro-ATX Digital Gaming SBC Motherboard DPX-S430 GMB-A75 • AMD Embedded R-Series Platform • AMD Radeon™ HD 7000G Series Graphics • AMD Embedded R-Series Platform integrated • AMD Radeon™ HD 7000G Series Graphics integrated • AMD A75 Controller Hub • AMD A75 Controller Hub • Quad and Dual Core APUs • Built-in SMI 750 GPU to support extension • Comprehensive Gaming features 2 display • High-performance integrated or PCI Express • Extended up to 2 PCIex16 slot graphics • System memory up to 16 GB (DDR3) • Up to 4 independent displays from the chipset • Comprehensive I/O: 2 x RJ45, 10 x USB (2.0/3.0), • Low power consumption 8 x COM, 4 x VGA, 2 x DVI-D, CF card, and audio • Small format • Gaming, Information Appliance, Digital Signage, Point of SaleAdvantech Advantech-InnocorePHONE (949) 789-7178 EMAIL sales@advantech-innocore.com PHONE (949) 789-7178 EMAIL sales@advantech-innocore.comFAX (949) 789-7179 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore Mini-ITX Single Board COM Express / Type 6 Computer conga-TFS MANO111 • AMD Embedded R-Series APU • AMD Radeon™ HD 7000G Series Graphics • AMD Embedded R-Series APU integrated • AMD Radeon™ HD 7000G Series Graphics integrated • AMD A70 Controller Hub • AMD A75 Controller Hub • SODIMM, 16GB, DDR3, 2x, 1066/800 • DDR3 Dual channel SO-DIMM 1333/1600 • 7 x PCI Express™ max. up to 16 GB • 4 x SATA • 4 SATA-600 support RAID 0,1,5,10 • 4 x USB 3.0, 4 x USB 2.0 • 4 USB 3.0 supported • High-performance DirectX®11 GPU supports • 3 independent displays OpenCL™ 1.1 and OpenGL 4.2 • DisplayPort 2 supports multi-stream • Gaming, Server, Information Appliance, • Gaming, Communications, Industrial Control- Communications, Industrial Controllers, lers, Medical, Digital Signage, Point of Sale Medical, Digital SignageAxiomtek congatec Inc.PHONE (626) 581-3232 EMAIL info@axiomtek.com PHONE (858) 457-2600 EMAIL sales-us@congatec.comFAX (626) 581-3552 WEB www.axiomtek.com/US FAX (858) 457-2602 WEB www.congatec.us COM Express Mini-ITX Compact R2.0, Type 6 Motherboard CM901-B CM100-C • AMD Embedded R-Series APU • AMD Embedded R-Series APU • AMD Radeon™ HD 7000G Series Graphics integrated • AMD Radeon™ HD 7000G Series Graphics • AMD A70 Controller Hub integrated • 2 DDR3 SODIMM up to 8GB • AMD A70 Controller Hub • VGA, LVDS, DDI (DisplayPort, LVDS, VGA) • 2 DDR3 SODIMM up to 8GB • 1 PCIe x16, 7 PCIe x1 (first 4 PCIe support • 1 HDMI, 2 DVI (1 supports DVI-D signal), 1 LVDS PCIe x4) • 1 PCIe x16, 2 PCIe x1 gold fingers, 1 Mini PCIe • 4 SATA 3.0 • 4x SATA 3.0 • 8 USB 2.0 (first 4 USB ports support up to USB 3.0) • 4x USB 3.0, 6x USB 2.0 • Gaming, Information Appliance, Industrial Con- • Gaming, Industrial Controllers, Medical, Digital trollers, Medical, Digital Signage, Point of Sale Signage, Point of SaleDFI DFIPHONE (916) 568-1234 EMAIL sales@dfitech.com PHONE (916) 568-1234 EMAIL sales@dfitech.comFAX (916) 568-1233 WEB www.dfi.com FAX (916) 568-1233 WEB www.dfi.com WWW. AMD . C OM/ EMBED D ED / C ATALOG 05
  6. 6. AMD EMBEDDED APU SOLUTIONS GUIDE Digital Signage Mini-ITX Player Motherboard SI-38 MI959 • AMD R-Series Quad-Core / Dual-Core APU, • AMD Embedded R-Series APU up to 35W • AMD Radeon™ HD 7000G Series Graphics • Integrated AMD Radeon™ 384/240 Cores integrated DirectX® 11 GPU in Processor • AMD A70 Controller Hub • Winner of Computex 2012 Design Innovation Award • 2 x DDR3-1600 Memory, up-to 16GB Dual • Dual independent 1080p Hybrid DVI-I display outputs Channel • Supports DDR3 memory up to 16GB • 1 x DVI-, DVI-D, Display Port LVDS • iSMART - for EuP/ErP power saving, auto- • 2 x Mini PCI-E(x1), PCI-E(x16) scheduler and power resume • 4 x USB 3.0 + 8 USB 2.0 • Dual Mini PCI-E(x1) slots for WiFi and TV • Server, Communications, Industrial Controllers, tuner options Medical, Networking, Digital Signage, Point of Sale • 2x USB 3.0 and serial port (RS232)IBASE IBASEPHONE +886-2-2655-7588 EMAIL sales@IBASE.com.tw PHONE +886-2-2655-7588 EMAIL sales@IBASE.com.twFAX +886-2-2655-7388 WEB www.IBASE-usa.com FAX +886-2-2655-7388 WEB www.IBASE-usa.com Mini-ITX Mini-ITX Motherboard Motherboard NF82 KTA70M/mITX • AMD Embedded R-Series APU • AMD Embedded R-Series APU • AMD A75 Controller Hub • AMD Radeon™ HD 7000G Series Graphics • 2 x SODIMM Sockets for un-buffered Dual integrated Channel DDR3 1600 SDRAM up to 16 GB • AMD A70 Controller Hub • 6 x Serial ATA3 6Gb/s connectors support • 2x SO-DIMM, up to 8GB each (in total 16 GB) RAID 0, 1 10 functions • 4 independent display outputs • 1 x PCI x 16 slot, 1 x Mini PCI-E • 10x USB 2.0, 4x USB 3.0 • Embedded 4 x USB 3.0 8 x USB 2.0/1.1 • 1x PCIe x8 1x PCIe x4 • Gaming, Digital Signage, Point Of Sale • 2x SO-DIMM, up to 8GB each (in total 16 GB) • Medical, Industrial Automation, Gaming, Digital SignageJETWAY Information Co., Ltd Kontron AmericaPHONE +886 2 89132711 EMAIL louis.chang@jetway.com.tw PHONE (858) 677-0877 EMAIL info@us.kontron.comFAX +886 2 89132722 WEB www jetway.com.tw FAX (858) 677-0898 WEB www.kontron.com Digital Signage Digital Gaming Player System NDiS165 QX-40 • AMD Embedded R-Series APU • AMD Embedded R-Series APU • AMD Radeon™ HD 7000G Series Graphics • AMD Radeon™ HD 6760 Graphics integrated integrated • AMD A75 Controller Hub • AMD A70 Controller Hub • SODIMM, 8GB, DDR3, 2x, 1600/1333/1066 • SODIMM, 16GB, DDR3 non-ECC, 2x, • OpenGL 4.1, DirectX® 11, OpenCL™ 1.1 1333/1066/800 compatible • 3x HDMI, 1920 x 1080 • Advanced PCI Express® gaming logic SRAM • 2x Mini-PCIe, X1 FOR WLAN and TV Tuner / MRAM • 2x SATA, 6.0Gbps, 3.0 compliant • Support for up to 10 independent monitors • 4x TypeA, USB 3.0, Host, 4x Header, USB • 7x DisplayPort, 2560 x 1600 2.0, Host • 4x DVI, 2560 x 1600Nexcom Quixant UK LtdPHONE (510) 656-2248 EMAIL marketing@nexcom.com PHONE +44 (0) 1223 89296 EMAIL sales@quixant.comFAX (510) 656-2158 WEB www.nexcom.com FAX +44 (0) 1223 892401 WEB www.quixant.com06 J ANU ARY 201 3
  7. 7. AMD EMBEDDED APU SOLUTIONS GUIDE Digital Gaming Mini-ITX Single Board System Computer QXi-4000 ITX-AT2X21B • AMD Embedded R-Series APU • AMD Embedded R-Series APU • AMD Radeon™ HD 7000G Series Graphics integrated • AMD Radeon™ HD 7000G Series Graphics integrated • AMD A70 Controller Hub • AMD A70 Controller Hub • Fanless all-in-one PC-based gaming controller • 1 x SO DIMM DDR3 1066/1333MHz, Max for slot machines up to 8GB • Supports up to four independent HD monitors • Supports DirectX11, 1080i, 1080P and H.264. • Advanced PCI Express® gaming logic SRAM/ VC-1, MPEG-G MRAM • 2x SATA 3Gb/s with power • 4x, DisplayPort, 2560 x 1600 • 2x MINI PCI-E slot (1pcs for wifi, 1 pcs for SSD) • 5x TypeA USB 2.0, Host • 8x USB, 4x USB3.0, 4x USB2.0 • 4x SATA, 6.0Gbps, 3.0 compliant, 2 x CFAST • Gaming, Industrial Controllers, Digital Signage, sockets Thin ClientsQuixant UK Ltd SHENZHEN XINZHIXIN ENTERPRISE DEVELOPMENT CO.,LTDPHONE +44 (0) 1223 89296 EMAIL sales@quixant.com PHONE 86-13590127552 EMAIL sale@xzx.net.cnFAX +44 (0) 1223 892401 WEB www.quixant.com WEB www.micputer.com Mini-ITX Motherboard PICO-ITX Fanless EMB-A70M Board • AMD Embedded R-Series APU (dual-core) PICO-HD01 • Supports DDR3 1333MHz memory up to 8GB • AMD Embedded G-Series Platform • 4 x HDMI Supporting Full HD display • AMD A50M Controller Hub • 1 x Audio Line-out, 1 x Mic-in • 204-pin SODIMM DDR3 1066MHz up to • 2 x USB3.0 (Internal) ; 1 x USB2.0 (Rear I/O); 4GB 4 x USB2.0 (Internal) • One Reatek 8111E for 10/1000/1000Base-TX • 2 x RJ45 with LEDs for 10/100/1000Mbps Ethernet • CRT, 18-bit Single Channel LVDS, HDMI • 2 x COM (COM1: RS-232/422/485; COM2: • SATA 3.0Gb/s x 1, mSATA (Half-Size) Slot x 1 RS-232) Co-lay with Mini Card • 1 x miniPCIe (Full Size) with SIM Socket • USB 2.0 x 5, COM x 2, 4-bit Digital I/O • 1 x mini PCIe ( Optional PCIe SATA signal, • Gaming, Communications, Digital Signage, default for mSATA) Point Of SaleAaeon AaeonPHONE (714) 996-1800 EMAIL info@aaeon.com PHONE (714) 996-1800 EMAIL info@aaeon.comFAX (714) 996-1811 WEB www.aaeon.com FAX (714) 996-1811 WEB www.aaeon.com EPIC Board 3.5” SubCompact EPIC-HD07 Board • AMD Embedded G-Series Platform GENE-HD05 • AMD A55E Controller Hub • AMD Embedded G-Series Platform • SODIMM DDR3 1066/1333 MHz, Max 4G • AMD A50M Controller Hub • Up to 24-bit Dual Channel LVDS LCD, CRT, DVI (Optional) • SODIMM DDR3 1066/1333(T56N) up to 4GB • SATA 3.0Gb/s x 1, mSATA Slot x 1 • CRT, 18/24/36/48-bit LVDS, HDMI • USB 2.0 x 8, COM x 6, 16-bit Digital I/O • SATA 3.0Gb/s x 2, CFast x 1, mSATA x 1 (Config. by BIOS) • Expansion: Mini Card x 1, PCI-104 x 1 (Co-lay with LPT) • USB2.0 x 8, COM x 4, Parallel x 1, 8-bit Digital I/O • Gaming, Information Appliance, Industrial • Onboard 4/5-wire Resistive Touch Screen Controllers, Digital Signage, Point of Sale Controller • Gaming, Information Appliance, Digital SignageAaeon AaeonPHONE (714) 996-1800 EMAIL info@aaeon.com PHONE (714) 996-1800 EMAIL info@aaeon.comFAX (714) 996-1811 WEB www.aaeon.com FAX (714) 996-1811 WEB www.aaeon.com WWW. AMD . C OM/ EMBED D ED / C ATALOG 07
  8. 8. AMD EMBEDDED APU SOLUTIONS GUIDE be run on a wide range of systems without having to rewrite everything for each system.OpenCLTM OpenCLTM for Unified, Portable Source CodeProgramming for OpenCL™, the first open and royalty- free programming standard for general- purpose parallel computations on heteroge- neous systems, is quickly growing in popu-Heterogeneous larity as a means for developers to preserve their expensive source code investments and easily target multi-core CPUs and GPUs. OpenCL is maintained by the KhronosComputing Group, a not-for-profit industry consortium that creates open standards for the author- ing and acceleration of parallel computing,Systems: Parallel graphics, dynamic media, computer vision and sensor processing on a wide variety of platforms and devices. Developed in an open standards committee with representa-Processing Made tives from major industry vendors, OpenCL affords users a cross-vendor, non-proprietary solution for accelerating their applicationsFaster and Easier across mainstream processing platforms, and provides the means to tackle major development challenges, such as maximiz- ing parallel compute utilization, efficientlyThan Ever handling data movement and minimizing dependencies across cores. Ultimately, OpenCL enables develop- ers to focus on applications, not just chipP architectures, via a single, portable sourceBy Todd Roberts, Software Manager, Embedded Solutions, AMD code base. When using OpenCL, developers can use a unified tool chain and language to arallel processing isn’t really GPUs, which have always been very paral- target all of the parallel processors currently new. It has been around in lel—counting hundreds of parallel execu- in use. This is done by presenting the devel- one form or another since the tion units on a single die—have now be- oper with an abstract platform model that early days of computing. As come increasingly programmable, to the conceptualizes all of these architectures in a traditional CPUs have be- point that it is now often useful to think similar way, as well as an execution modelcome multi-core parallel processors, with of GPUs as many-core processors instead of supporting data and task parallelism acrossmany cores in a socket, it has become more special purpose accelerators. heterogeneous architectures.important for developers to embrace par- All of this diversity has been reflectedallel processing architectures as a means in a wide array of tools and programming Key Concepts andto realize significant system performance models required for programming these Workflowsimprovements by taking advantage of the architectures. This has created a dilemma OpenCL has a flexible execution modelextra cores. This move towards parallel for developers. In order to write high- that incorporates both task and data paral-processing has been complicated by the performance code they have had to write lelism. Tasks themselves are comprised ofdiversity and heterogeneity of the various their code specifically for a particular archi- data-parallel kernels, which apply a singleparallel architectures that are now avail- tecture and give up the flexibility of being function over a range of data elements inable. A heterogeneous system is made up of able to run on different platforms. In order parallel. Data movements between the hostdifferent processors each with specialized for programs to take advantage of increases and compute devices, as well as OpenCLcapabilities. Over the last several years, in parallel processing power, however, they tasks, are coordinated via command queues.GPUs have been targeted as yet another must be written in a scalable fashion. Devel- An OpenCL command queue is cre-source of computing power in the system. opers need the ability to write code that can ated by the developer through an API call,08 J ANU ARY 201 3
  9. 9. AMD EMBEDDED APU SOLUTIONS GUIDEand associated with a specific compute tion of the computation being performed. Write A Write Bdevice. To execute a kernel, the kernel is OpenCL kernels are executed over an indexpushed onto a particular command queue. space, which can be 1, 2 or 3 dimensional.Enqueueing a kernel can be done asynchro- Write C Kernel A Kernel C In Figure 2, we see an example of a 2 di-nously, so that the host program may en- mensional index space, which has Gx * Gyqueue many different kernels without wait- elements. For every element of the kerneling for any of them to complete. When Kernel B Read A index space, a work-item will be executed.enqueueing a kernel, the developer option- All work items execute the same program,ally specifies a list of events that must occur Kernel D although their execution may differ due tobefore the kernel executes. If a developer branching based on data characteristics orwishes to target multiple OpenCL com- the index assigned to each work-item.pute devices simultaneously, the developer Read B The index space is regularly subdi-would create multiple command queues. vided into work-groups, which are tilings Command queues provide a general FIGURE 1 ask Parallelism within a T of the entire index space. In Figure 2, weway of specifying relationships between Command Queue. see a work-group of size Sx * Sy elements.tasks, ensuring that tasks are executed in an Each work-item in the work-group receivesorder that satisfies the natural dependencies try to synchronize or share data, since the a work-group id, labeled (wx, wy) in thein the computation. The OpenCL runtime is runtime provides no guarantee that all work- figure, as well as a local id, labeled (sx, sy)free to execute tasks in parallel if their depen- items are concurrently executing, and such in the figure. Each work-item also receivesdencies are satisfied, which provides a gen- synchronization easily introduces deadlocks. a global id, which can be derived from itseral-purpose task parallel execution model. Developers are also free to construct work-group and local ids. Events are generated by kernel comple- multiple command queues, either for parallel- Work-items in different work-groupstion, as well as memory read, write, and izing an application across multiple compute may coordinate execution through thecopy commands. This allows the developer devices, or for expressing more parallelism via use of atomic memory transactions, whichto specify a dependence graph between completely independent streams of computa- are an OpenCL extension supported bykernel executions and memory transfers in tion. OpenCL’s ability to use both data and some OpenCL runtimes. For example,a particular command queue or between task parallelism simultaneously is a great ben- work-items may append variable num-command queues themselves, which the efit to parallel application developers, regard- bers of results to a shared queue in globalOpenCL runtime will traverse during ex- less of their intended hardware target. memory. However, it is good practice thatecution. Figure 1 shows a task graph illus- work-items do not, generally, attempt totrating the power of this approach, where Kernels communicate directly, as without carefularrows indicate dependencies between As mentioned, OpenCL kernels pro- design scalability and deadlock they cantasks. For example, Kernel A will not ex- vide data parallelism. The kernel execution become difficult problems. The hierarchyecute until Write A and Write B have fin- model is based on a hierarchical abstrac- of synchronization and communicationished, and Kernel D will not execute untilKernel B and Kernel C have finished. Sx The ability to construct arbitrary taskgraphs is a powerful way of constructing task- work group (Wx , Wy)parallel applications. The OpenCL runtimehas the freedom to execute the task graph in sx = 0 sx = Sx - 1parallel, as long as it respects the dependen- Gx sy = 0 sy = 0cies encoded in the task graph. Task graphs work item work itemare general enough to represent the kinds (WxSx + sx , (WxSx = sx ,of parallelism useful across the spectrum of WySy + sy) WySy + sy)hardware architectures, from CPUs to GPUs. Besides the task parallel constructs pro- Gy Syvided in OpenCL, which allow synchroni- sx = 0 s x = Sx - 1zation and communication between kernels, s y = Sy - 1 sy = Sy - 1OpenCL supports local barrier synchroniza- work item work itemtions within a work-group. This mechanism (WxSx = sx , (WxSx + sx ,allows work-items to coordinate and share WySy + sy) WySy + sy)data in the local memory space using onlyvery lightweight and efficient barriers. Work-items in different work-groups should never FIGURE 2 EQ Executing Kernels - Work-Groups and Work-Items. S WWW. AMD . C OM/ EMBED D ED / C ATALOG 09
  10. 10. AMD EMBEDDED APU SOLUTIONS GUIDEprovided by OpenCL is a good fit for many that allow and sometimes require some level ages, are allocated per context; but changesof today’s parallel architectures, while still of tuning to achieve optimal performance. made by one device are only guaranteed toproviding developers the ability to write ef- be visible by another device at well-definedficient code, even for parallel computations voidtrad_mul(int n, synchronization points. For this, OpenCLwith non-trivial synchronization and com- const float *a, provides events, with the ability to synchro- const float *b,munication patterns. float *c) nize on a given event to enforce the correct The work-items may only commu- { order of execution.nicate and synchronize locally, within a int i; Most OpenCL programs follow thework-group, via a barrier mechanism. This for (i=0; in; i++) same pattern. Given a specific platform, c[i] = a[i] * b[i]; }provides scalability, traditionally the bane select a device or devices to create a con-of parallel programming. Because com- text, allocate memory, create device-spe-munication and synchronization at the FIGURE 3 xample of traditional loop E cific command queues, and perform datafinest granularity is restricted in scope, the (scalar). transfers and computations. Generally, theOpenCL runtime has great freedom in how platform is the gateway to accessing specificwork-items are scheduled and executed. kernel void devices. Given these devices and a corre- dp_mul (global const float *a, sponding context, the application is inde-A Typical OpenCL Kernel global const float *b, pendent of the platform. Given a context, global float *c) As already discussed, the core pro- { the application can:gramming goal of OpenCL is to provide int id = get_global_id (0); • reate one or more command Cprogrammers with a data-parallel execu- queues.tion model. In practical terms this means c[id] = a[id] * b[id]; • reate programs to run on one or Cthat programmers can define a set of in- } // execute over “n” work-items more associated devices.structions that will be executed on a large • reate kernels within those pro- Cnumber of data items at the same time. grams.The most obvious example is to replace FIGURE 4 ata parallel OpenCL. D • A llocate memory buffers or images, loops with functions (kernels) executing at either on the host or on the device(s)each point in a problem domain. For example, OpenCL memory object types - Memory can be copied between the Referring to Figures 3 and 4, let’s and sizes can impact performance. In most host and device.say you wanted to process a 1024 x 1024 cases key parameters can be gathered from • Write data to the device. image (your global problem dimension). the OpenCL runtime to tune the operation • Submit the kernel (with appropriate You would initiate one kernel execution of the application. In addition, each ven- arguments) to the command queueper pixel (1024 x 1024 = 1,048,576 kernel dor may choose to provide extensions that for execution.executions). provide for more options to tune your appli- • Read data back to the host from the Figure 3 shows sample scalar code cation. In most cases these are parameters device.for processing an image. If you were writ- used with the OpenCL API and should noting very simple C code you would write a require extensive rewrite of the algorithms. The relationship between context(s),simple for loop, and in this for loop you device(s), buffer(s), program(s), kernel(s),would go from 1 to N and then perform Building an OpenCL and command queue(s) is best seen byyour computation. Application looking at sample code. An alternate way to do this would be An OpenCL application is built by firstin a data parallel fashion (Figure 4), and in querying the runtime to determine which Summarythis case you’re going to logically read one platforms are present. There can be any OpenCL affords developers an elegant,element in parallel from all of a (*a), multi- number of different OpenCL implementa- non-proprietary programming platform toply it from an element of b in parallel and tions installed on a single system. The de- accelerate parallel processing performancewrite it to your output. You’ll notice that in sired OpenCL platform can be selected by for compute-intensive applications. WithFigure 4 there is no for loop—you get an matching the platform vendor string to the the ability to develop and maintain a sin-id value, read a value from a, multiply by desired vendor name, such as “Advanced Mi- gle source code base that can be applied toa value from b and then write the output. cro Devices, Inc.” The next step is to create a CPUs, GPUs and APUs with equal ease, As stated above, a properly written context. An OpenCL context has associated developers can achieve significant program-OpenCL application will operate correctly with it a number of compute devices (for ming efficiency gains, reduce developmenton a wide range of systems. While this is example, CPU or GPU devices). Within a costs, and speed their time to market.true it should be noted that each system and context, OpenCL guarantees a relaxed con-compute device available to OpenCL may sistency between these devices. This means Advanced Micro Deviceshave different resources and characteristics that memory objects, such as buffers or im- www.amd.com10 J ANU ARY 201 3
  11. 11. AMD EMBEDDED APU SOLUTIONS GUIDEAMD EMBEDDED G-SERIES PLATFORMThe world’s first combination of low-powerCPU and advanced GPU integrated into asingle embedded device.The AMD Embedded G-Series processor is the world’s first integrated circuit to combine a low-power CPU and a discrete-level GPU into a single embedded Accelerated Processing Unit (APU).This unprecedented level of graphics integration builds a new foundation for high-performancemultimedia content delivery in a small form factor and power-efficient platform for a broad range ofembedded designs. Based on a new power-optimized core, the AMD Embedded G-Series platformdelivers levels of performance in a compact BGA package that is ideal for low-power designsin embedded applications such as Digital Signage, x86 Set-Top-Box (xSTB), IP-TV, Thin Client,Information Kiosk, Point-of-Sale, Casino Gaming, Media Servers, and Industrial Control Systems. x86 Core Clock Speed Base/ Model Boost L2 Cache GPU DDR3 Speed x86 Cores UVD1 3 Display Ouptuts Max TDPAMD Embedded G-Series APU – FT1 413-pin T56N 1.65GHz AMD Radeon™ HD 6320 2 18W DDR3-1333 T56E 1.65GHz AMD Radeon™ HD 6250 2 Dual independent Unbuffered 18W display controllers AMD Radeon™ HD 6310 1 2 active outputs 18W T52R 1.5GHz AMD Radeon™ HD 6250 from: 18W DDR3-1066 2 T48E 1.4GHz 1xVGA Unbuffered Yes 2x single link DVI T44R 1.2GHz AMD Radeon™ HD 6250 DDR3-10663 1 9W T40N 1.0GHz3 AMD Radeon™ HD 6290 2 1X single link LVDS 9W Unbuffered 512KB 2x DisplayPort 1.1a DDR3-1066 T48n 1.4GHz AMD Radeon™ HD 6310 2 18W Unbuffered 1x HDMI T40E 1.0GHz AMD Radeon™ HD 6250 DDR3-10663 2 1X DVO 6.4W T40R 1.0GHz AMD Radeon™ HD 6250 Unbuffered 1 5.5W T16R 615mHz AMD Radeon™ HD 6250 LVDDR3-1066 1 4.5W T48L 1.4GHz N/A DDR3-1066 2 18W T30L 1.4GHz N/A Unbuffered 1 18W N/A N/A DDR3-10663 T24L 1.0GHz N/A 1 5W Unbuffered1. nified Video Decoder (UVD 3) for hardware decode of high-definition video. U2. ow voltage (1.35V) DDR3 is assumed for the 9W TDP processors. The use of 1.5V DDR3 will incur a power adder. L3. odels enabled by AMD Turbo CORE technology, up to 10% clock speed increase is planned. For CPU boost, only one processor core of a dual- M core has boost enabled.Note: Always refer to the processor/chipset data sheets for technical specifications. Feature information in this document is provided for reference only. WWW. AMD . C OM/ EMBED D ED / C ATALOG 11
  12. 12. AMD EMBEDDED APU SOLUTIONS GUIDE Fanless Industrial Mini-ITX Computer Motherboard TKS-E21-HD07 EMB-A50M • AMD Embedded G-Series Platform • AMD Embedded G-Series Platform • AMD A55E Controller Hub • AMD A50M Controller Hub • Includes the Aaeon EPIC-HD07 board • DDR3 800/1066 DIMM x 2, Unbuffered • Realtek Gigabit Ethernet x 2, Mini Card x 1, Memory, Max. 8 GB mSATA x 1 • DVI-I, HDMI • Low Profile 1U Height and Weight Less Than • SATA 6.0Gb/s x 5 1.5Kg • USB3.0 x 2, USB2.0 x 10, COM x 4 • Easy Reconfiguration System • PCI Express 2.0 [x4] x 1, Mini PCIe (Half size) • Fanless With Excellent Thermal Solution x 1, 8-bit Digital I/O • Anti-Vibration Up to 1G (HDD, Random) • Gaming, Communications, Digital Signage, Point of SaleAaeon AaeonPHONE (714) 996-1800 EMAIL info@aaeon.com PHONE (714) 996-1800 EMAIL info@aaeon.comFAX (714) 996-1811 WEB www.aaeon.com FAX (714) 996-1811 WEB www.aaeon.com Gaming System Digital Signage ACE-S7400 Player • AMD Embedded G-T56N APU ARK-DS306 • with AMD Radeon™ HD 6320 Graphics • AMD Embedded G-T40N APU • Digital inputs and digital outputs with Micro fit 3.0 connector • with AMD Radeon™ HD 6290 Graphics AMD A50M Controller Hub • 2 x ccTalk • Dual display: HDMI, VGA • 2MB Battery back up SRAM • Built-in Mini PCIe slot • Timer Meter pulse generator, counters • Supports 2 GLAN, HD audio, I/O interface • Intrusion Logger with 2 x COM, 2 x USB • Storage:2 x CF connectors • 1 x 2.5” SATA HDD drive bay, 1 x CFast slot • Supports VESA mounting (Optional)aCrosser Technology Limited AdvantechPHONE (866) 401-9463 EMAIL juni_hung@acrosser.com.tw PHONE (949) 789-7178 EMAIL ECGInfo@advantech.comFAX (714) 903 5629 WEB www.acrosser.com FAX (949) 789-7179 WEB www.advantech.com/embcore MI/O Extension SBC Mini-ITX Single MIO-5270 Board Computer • AMD Embedded G-Series Platform AIMB-223 • AMD A50M Controller Hub • AMD Embedded G-Series Platform • 1 x DDR3 memory support up to 4 GB • One 204-pin SODIMM up to • Multiple display: 48-bit LVDS, HDMI, VGA 2 GB DDR3 1333 MHz SDRAM • 2 GbE support, HD Audio, Rich I/O interface • Supports VGA/LVDS/HDMI with 4 COM, 2 SATA, 6 USB and GPIO • Dual LANs, 6 COM, Mini PCIe, and Cfast • Supports embedded software APIs and Utilities • Supports embedded software APIs and UtilitiesAdvantech AdvantechPHONE (949) 789-7178 EMAIL ECGInfo@advantech.com PHONE (949) 789-7178 EMAIL ECGInfo@advantech.comFAX (949) 789-7179 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore12 J ANU ARY 201 3
  13. 13. AMD EMBEDDED APU SOLUTIONS GUIDE Semi-Industrial Mini- ETX3.0 CPU Module ITX Motherboard SOM-4466 SIMB-M22 • AMD Embedded G-Series Platform • AMD A55E Controller Hub • AMD Embedded G-Series Platform • Supports DX11, OGL3.2, and H.264/AVC, • AMD A55E Controller Hub VC-1 HW Acceleration • Fanless design • DDR3-1066 up to 4GB • dual display: HDMI, VGA, 18-bit single • PCI, ISA, SMBus, I2C channel LVDS, eDP (optional) • 10/100 LAN, SATA, mSATA Socket, IDE, USB2.0 • Information Appliance, Communications, Industrial Controllers, Medical, Networking, Digital SignageAdvantech AdvantechPHONE (949) 789-7178 EMAIL ECGInfo@advantech.com PHONE (949) 789-7178 EMAIL ECGInfo@advantech.comFAX (949) 789-7179 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore Industrial Computer Mini-ITX System Motherboard DPX-E120 GMB-A55E • AMD Embedded G-Series Platform • AMD Embedded G-Series Platform • AMD A55E Controller Hub • AMD A55E Controller Hub • DIMM, 8GB, DDR3, 2x, 1333/1066/800 • Supports HDMI 1.3a, DirectX 11, OpenGL4.0, • 4x, TypeA, USB 2.0; 2x, Header, USB 2.0 dedicated hardware (UVD3.0) for H.264, VC- • Comprehensive gaming features 1, MPEG 2, DivX decode • Low power and compact design • Dual display functions: HDMI, VGA, 18-bit single channel LVDS • Easy integration for gaming applications • Dual Realtek RTL8111DL Gigabit LAN • Dual monitor support • 1 PCIe x4 Gen.2, 1 Mini PCIe • 5 SATA 3.0 (6Gb/s) • 4 COM (2Powered COM), 8 USB2.0Advantech AdvantechPHONE 886-2-2792-7818 EMAIL ECGInfo@advantech.com PHONE (949) 789-7178 EMAIL ECGInfo@advantech.comFAX 886-2-2792-7337 WEB www.advantech.com/embcore FAX (949) 789-7179 WEB www.advantech.com/embcore PC/104 Single Board Mini-ITX Computer Motherboard PCM-3356 MB-7210 • AMD Embedded G-Series Platform • AMD Embedded G-Series Platform • AMD A55E Controller Hub • SODIMM, 8GB, DDR3, 2x • Ultra low power • Supports Direct X 11 and Open GL 4 • Supports up to 4 GB DDR3 SODIMM or 1 GB • Supports full bitstream decoding of H.264/ DDR3 on-board memory MPEG-4, AVC, VC-1, DivX, Xvid, MPEG2, as • 18-bit LVDS and VGA well as Blu-ray 3D • 3 COM ports, 4 USB 2.0 ports, dual GbE and • Support one PCIe x16 and one PCI slot audio codec • Support 6 x COM, GbE, 1 x DVI and CFast • Support extended temp: -40 ~ 85C, HALT applied • Medical, Digital Signage, Point of Sale • Expansion: PC/104 and miniPCIe • Communications, Industrial Controllers, Point of SaleAdvantech AEWIN Technologies Co., Ltd.PHONE 886-2-2792-7818 #1508 EMAIL austin.lo@advantech.com.tw PHONE +886-2-8692 6677 EMAIL sales@aewin.com.twFAX 886-2-2792-7337 WEB www.advantech.com/embcore FAX +886-2-8692 6655 WEB www.aewin.com.tw WWW. AMD . C OM/ EMBED D ED / C ATALOG 13
  14. 14. AMD EMBEDDED APU SOLUTIONS GUIDE PC/104+ module Networking PM-6101 Appliance • AMD Embedded G-Series Platform SCB-6979 • Supports ulta-low-power AMD G-T16R processors • AMD Embedded G-Series Platform • Dual 10/100/1000Mbps ethernet • One SO-DIMM up to 4GB DDR3 1066MHz SDRAM • Support One SATAIII, Two serial ports and four USB 2.0 ports • Max 6 GbE ports via PCI-e by1 • One DDR3 SO-DIMM socket supports up to • Robust I/O with USB 2.0; 2.5” SATA HDD 4GB memory bay, CF socket, Minicard slot and Console port • Digital Signage • Built with long-life AMD Embedded components • RoHS compliantAEWIN Technologies Co., Ltd. AEWIN Technologies Co., Ltd.PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw PHONE +886-2-8692 6677 EMAIL sales@aewin.com.twFAX +886-2-8692 6655 WEB www.aewin.com.tw FAX +886-2-8692 6655 WEB www.aewin.com.tw Custom Gaming Gaming System Motherboard SGA-2200 GA-2200 • AMD Embedded G-T56N APU • with AMD Radeon™ HD 6320 Graphics • AMD Embedded G-T56N APU • Also supports AMD G-T44R with AMD • with AMD Radeon™ HD 6320 Graphics Radeon™ HD 6250 • Also supports AMD G-T44R with AMD • 10 x COM, 2nd RTC and NVRAM Radeon™ HD 6250 • 10 x COM, 2nd RTC and NVRAMAEWIN Technologies Co., Ltd. AEWIN Technologies Co., Ltd.PHONE +886-2-8692 6677 EMAIL sales@aewin.com.tw PHONE +886-2-8692 6677 EMAIL sales@aewin.com.twFAX +886-2-8692 6655 WEB www.aewin.com.tw FAX +886-2-8692 6655 WEB www.aewin.com.tw Mini-ITX Industrial Tablet Motherboard WA-10 AMDY-7002 • AMD Embedded G-T56N APU • with AMD Radeon™ HD 6320 Graphics • AMD Embedded G-Series Platform • Memory: SODIMM, 2GB, DDR3 • AMD A55E Controller Hub • Support 1 x SODIMM DDR3 up to 4GB • 1 x PCIe, 1 x PCI and half size Mini-PCIe Socket • CF Socket, CFast Socket and 5 x SATA Connector • Gaming, Industrial Controllers, Medical, Networking, Digital Signage, Point of SaleAmerican Portwell Technology, Inc. Amtek System CompanyPHONE (510) 403-3399 EMAIL info@portwell.com PHONE +886-2-26492212#133 EMAIL ch.chang@amtek.com.twFAX (510) 403-3184 WEB www.portwell.com FAX +886-2-26492363 WEB www.amtek.com14 J ANU ARY 201 3
  15. 15. AMD EMBEDDED APU SOLUTIONS GUIDE No one else provides a similar solution in terms of performance per watt” Butrash-AMD APUs Soar vily said. “One additional advantage of the AMD G-Series APU is that they are sold as embedded solutions, meaning ain Real-Time good fit for defense solutions that require long-term availability and durability in harsh environments.”Image Processing Real-Time Threat Detection “[The defense contractor] needed semiautomatic systems that could helpW aid pilots in making decisions” ex- plained Butrashvily. “The way to do that is to take images from both aerial hen the Company quirements. The resulting solution built by and ground systems to stabilize video for Advanced Su- CASS can serve as a new-generation DSP streams, enabling the detection of im- percomputing Solu- for sensor and computer-vision platforms, mediate threats.” They required a sys- tions (CASS) was leveraging a combination of parallel and approached by an serial processing on a heterogeneous sys-Israeli defense contractor to create a new tem architecture. Image Registrationfield video image registration solution, Image registration is the process ofit was their first venture in working with The challenge transforming a set of sequential im-an AMD Embedded accelerated processor “Lots of industries use graphics ages (video stream acquired from aunit (APU). It won’t be the last. processing units (GPUs) for projects sensor) into a similar coordinate sys- The defense contractor’s executives that include video,” said Mordechai tem, creating a smoother visual flow. Inhad come to CASS with a problem: they “Moti” Butrashvily, CASS chief execu- real-life, physical conditions or normalneeded high-quality, smooth, stable tive officer and chief technology offi- movement affect the images a sensorreal-time computer vision images deliv- gathers and may cause vibrations. cer. CASS has been building solutionsered from ground and aero systems to around AMD GPUs for years and knew Viewing a continuous frame-set fromback-end systems. The defense contrac- that for applications with a high-degree an image sensor generally looks shakytor’s digital signal processing (DSP) and of parallelism—like image processing or unbalanced, as the sensor is oftenfield-programmable gate array (FPGA) —programmable GPUs offer critical mobile or not stabilized. Image regis-solutions were not capable of develop- performance advantages. “But we knew tration fixes this problem by smooth-ing the high-speed, higher-resolution a stand-alone GPU just couldn’t offer ing the output video stream. Applica-images that could more accurately track a solution that would meet the power tions for image registration vary from defense to medical imaging and more.motion—tracking missiles as they are consumption and size constraints of thecarried on a moving vehicle or detect- defense contractor.” Typical registration process stages in-ing a person climbing into a bunker, for Butrashvily and his team looked at clude: identifying movement vectorsexample. a variety of possible solutions, and real- between two relative images, per- CASS was asked to create a compact ized their options were rather limited. forming alignment, and applying fur-system that could process a frame-by-frame Few manufacturers can offer the per- ther correction/enhancement filters to720p video input stream at 120 frames per formance needed without compromis- improve image and stream quality.second. While the defense contractor im- ing on size or power consumption. The In defense, sensor-based componentsposed constraints around maximum size CASS team found their research kept use registration from ground to aerialand maximum power consumption, CASS pointing them to the AMD Embedded systems with different applications.was otherwise unlimited in how it could G-Series APU, which combines the par- Adding to its complexity, defensedesign the solution. allel processing capabilities of a GPU applications require very high per- So, the company got creative. By with the serial processing capabilities formance computations (high resolu-making the right algorithmic adjustments of a CPU in a small footprint and low tions and frame rates) and have limitedand choosing an appropriate architecture, power solution. space for hardware, dictating a small system size. This requires a solutionthe resulting application runs at real-time “We evaluated several solutions, and with good heat dissipation and abilityspeeds where other competitive solutions nothing else compared to the APU for to consistently operate at low power.(DSP and FPGA) failed to meet the re- size, power consumption and capabilities. WWW. AMD . C OM/ EMBED D ED / C ATALOG 15
  16. 16. AMD EMBEDDED APU SOLUTIONS GUIDEtem that was compact and low power faster-than-real-time processing, CASS livered very well for the selected appli-enough to be used in unmanned aerial leveraged parallel processing for the in- cation and environment,” Butrashvilyand ground vehicle (UAV and UGV) tensive dense matrix operations, includ- explained. “This solution provides un-surround-vision systems for continuous ing GEMM (matrix multiplication), matched performance when you takemonitoring of objects and threats any- GEMV (matrix-vector multiplication) into account the power consumptionwhere in the world. and GESV (matrix Inverse), achieving and size requirements.” The AMD G-T56N APU met the up to 130 times the performance of run- The performance achieved was im-power requirements of the system, and ning those basic building blocks with pressive; showing nearly 150 frames percould deliver the high performance nec- the AMD BLAS (basic linear algebra second (FPS) peak at HD resolution ofessary to meet the image registration subprograms) libraries on the processor 1280x720 with 16-bits per pixel, mea-goals. Since the processor had to employ alone. To verify the numeric stability, sured from input to output of correctedfurther image filtering to enhance results, which is especially important in long- images. With the AMD Embedded G-CASS needed to ensure there was enough running, mission-critical operations, Series APU, CASS was able to achieveperformance overhead to run additional the arithmetic results of the APU were the following:algorithms while maintaining real-time compared to the x86 CPU following • eal-time performance Roperation. CASS selected OpenCL™ to IEEE 754 standard. CASS found high • rocessing of 120 frames per second Pimplement the accelerated algorithm correspondence and accuracy, assuring sustainedbuilding blocks. that the system achieves great numeri- • D sensor input resolution of 720p H In the prototype the APU served as cal stability. (1280x720)a digital signal and image processor, and • 0 to 30 times the performance of 2was connected to a sensor. “We tested the performing the entire algorithm on aAPU to see if we could achieve the real- The Results traditional CPUtime performance the sensors require,” Within two months, CASS com-Butrashvily explained. “There was no op- pleted the prototype development, in- The overall algorithm processing flowtion for delays: the signal had to be pro- cluding software optimization. The so- was complex, incorporating additionalcessed at the time it was being received lution was developed to support Linux, filters for image enhancement, thereforewith minimum latency.” Windows and their embedded variants. runtime speedup was summarized by 20 The entire algorithm was imple- The algorithmic processing engine to 30 times.mented in OpenCL, with the APU serv- was also integrated with OpenGL, de- For its next steps, CASS is workinging as the host manager/coordinator and livering a live display of the processed on support for hard real-time operatingframe grabber. With the goal to achieve results. “The AMD G-T56N APU de- systems, hardware commercialization and board design to match sensor dimensional constraints, and support for next-gener- ation APUs for even higher performance AMD Embedded Solutions... and resolutions. Coming Soon to an Event Near You! Moreover, because the job was not Check out the latest AMD Embedded products and our partner solutions at proprietary to the defense company, CASS one of these upcoming events. If you’re interested in scheduling a meeting is researching additional applications of its with an AMD representative at any of these shows, please contact us at new APU-based image registration tech- embedded@amd.com. nology. Being an important core compo- To learn more go to: www.amd.com/embedded nent in many image-processing systems, UPCOMING SEMINAR DATES: registration has relevance for other applica- 1/13-16 NRF (New York, NY) RTECC Roadshow: tions in defense, medical imaging and ma- 1/30 Taiwan Embedded Forum (Taiwan) 1/24 Santa Clara, CA chine vision. 2/5-7 ICE (London, UK) 3/19 Dallas, TX 2/26-28 Digital Signage Expo (Las Vegas, NV) 3/21 Austin, TX 2/26-28 Embedded World (Nuremberg, Germany) 4/16 Washington, DC 4/22-25 Design West (San Jose, CA) 4/18 Hanover, MD 5/8-10 ESEC (Tokyo, Japan) 5/7 Nashua, NH 5/9 Boston, MA 6/18 Denver, CO 6/20 Salt Lake City, UT16 J ANU ARY 201 3

×