SlideShare a Scribd company logo
The future is fusion.

             The Industry-Changing Impact
               of Accelerated Computing.
Computer processing has reached a crossroads where the relationship between hardware and software must change to support
the increased processing needs of modern computing workloads, including virtualized environments and media-rich applications.
Accelerated computing uses specialized hardware to increase the speed of certain processing tasks, offering commercial and
consumer users simultaneous energy efficiency, high performance, and low cost. From a hardware perspective, accelerated
computing provides a general platform that supports the addition of specialized processing units, or accelerators, to a multicore
computer. These accelerators may take the form of off-core chips such as media or network packet accelerators, or they may be
additional cores designed for specific types of processing tasks. From a software perspective, the accelerated computing framework
will simplify writing parallel applications: developers will have high-level tools that support parallel programming, and applications will
not have to specify which hardware should be used for a given processing task.

AMD has been building expertise and capability in general- and specific-purpose computing for the last several years, including how
to effectively couple general- and specific-purpose processors. This activity has built a solid foundation for further research and
development in accelerated computing. We are actively collaborating with large research institutions, independent software vendors,
the open source community, and other organizations to develop an open ecosystem around accelerated computing. Our commitment
to accelerated computing includes designing hardware accelerators, building developer tools, and driving thought leadership by
participating in standards development bodies. We believe that accelerated computing embodies a rapidly approaching paradigm
shift, and we intend to offer commercial and consumer users flexible computing solutions that meet their needs based on the
components available in the growing ecosystem.

This paper explores the accelerated computing framework. It provides examples of accelerated computing in action today, describes
potential uses of accelerators in the near- to mid-term, and makes a clear case that accelerated computing is the natural outcome
of the computer industry’s current integration trend. It further illustrates that, in full bloom, accelerated computing will create many
opportunities to customize computer systems by providing targeted, precise solutions for different kinds of workloads to satisfy the
need for application performance, energy efficiency, and low cost.




                                                                                           AMD White Paper: The Industry-Changing Impact of
                                                                                                                     Accelerated Computing
Contents


  A NeW INFleCTION POINT .................................................................................................................................................................................................................................................................3

  ACCelerATeD COMPuTINg: The COMINg PArADIgM ShIFT ..............................................................................................................................................................................3
     MArkeT TreNDS: DrIvINg The ShIFT TO ACCelerATeD COMPuTINg............................................................................................................................................3

  AMD: AT The FOreFrONT OF ACCelerATeD COMPuTINg.....................................................................................................................................................................................5

  x86 PrOCeSSOr geNerATIONS AND APPlICATION PerFOrMANCe .........................................................................................................................................................5
      geNerATION 1: A FOCuS ON FrequeNCy AND ArChITeCTure ...........................................................................................................................................................5
      geNerATION 2: hOMOgeNeOuS MulTICOreS.................................................................................................................................................................................................. 6
      geNerATION 3: heTerOgeNeOuS MulTICOreS ...............................................................................................................................................................................................7

  APPrOACheS TO ACCelerATeD COMPuTINg .................................................................................................................................................................................................................7

  ACCelerATeD COMPuTINg: lIghTS, CAMerA, ACTION! ......................................................................................................................................................................................... 8
     NOW PlAyINg ON A COMPuTer NeAr yOu ........................................................................................................................................................................................................ 8
     COMINg SOON TO A COMPuTer NeAr yOu ....................................................................................................................................................................................................... 9

  SIDeBAr: The lOOMINg CrISIS .................................................................................................................................................................................................................................................. 9

  The CASe FOr ACCelerATeD COMPuTINg NOW.......................................................................................................................................................................................................10




                                                                                                                                                                                                                                                                                              2
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
A new InfleCtIon PoInt                                                    >    A computer will include general and specialized
                                                                                 processors.
  The computing industry is poised on the edge of an inflection
                                                                            >    A processor will be chosen to handle a given task
  point where the interaction of hardware and software compute
                                                                                 based on its ability to provide optimal performance
  capabilities will change radically. This inflection point is being
                                                                                 in terms of compute power, media processing, and
  driven by the confluence of the following factors:
                                                                                 presentation, as well as energy efficiency and cost.
     >     evolving commercial and consumer workloads
                                                                            >    Software will take advantage of the best hardware
           (including virtualization and increasingly media-rich
                                                                                 for a given processing task without specifying how
           applications) that demand simultaneous energy
                                                                                 or where the processing is actually done.
           efficiency, high performance, and low cost.
                                                                            >    Developers will have high-level tools for writing
     >     Newer applications with parallel algorithms whose
                                                                                 parallel applications.
           processing can be done across multiple cores,
           resulting in better performance.
                                                                         From a hardware perspective, accelerated computing provides a
     >     A scarcity of programmers with the knowledge and              framework that supports the addition of specialized processors,
           skills needed to write parallel applications, and a           or accelerators, to a multicore computer. These accelerators
           shortage of tools to help programmers of all ability          may take the form of off-core chips, such as media or network
           levels develop parallel applications.                         packet accelerators, or they may be additional cores designed
                                                                         for specific types of processing tasks. From a software
  Commercial and consumer demand for increased computing                 perspective, the accelerated computing framework will simplify
  performance has consistently driven technological innovation           writing parallel applications. Developers will have high-level tools
  in the computing industry, creating advances in processor              that support parallel programming, and the applications will not
  power, core microarchitecture, and multicore designs. however,         have to specify which hardware should be used for a given
  the emergence of graphically intense applications, virtualized         processing task. Accelerated computing will provide a software
  environments, and parallel hardware architectures, combined            framework that includes platform abstractions, a specialized
  with a lack of skilled programmers and development tool sets,          runtime environment, application programming interfaces (APIs),
  has provoked intense scrutiny of how hardware and software             and libraries to support this simplified, flexible interaction with
  should interact to provide satisfying computing experiences in         hardware accelerators.
  this new era.
                                                                         Overall, accelerators provide several important benefits over
  Major computing industry players, including vendors,                   general purpose central processing units (CPus) alone:
  standards bodies, and educational institutions, are exploring
                                                                            >    They speed up certain serial tasks as well as
  ways to improve workload-specific processing performance
                                                                                 supporting parallel algorithms.
  for commercial and consumer computing applications while
  maintaining a focus on energy efficiency and cost. And they               >    They tend to have lower latency and higher
  are grappling with the challenge of unlocking the performance                  throughput for processing tasks.
  improvements possible in parallel architectures and algorithms
                                                                            >    They are more energy efficient: within a given
  without provoking a massive rewrite of current applications.
                                                                                 power envelope, special-purpose hardware
  All of this activity has created the current inflection point, and
                                                                                 accelerators will run better and faster than general-
  on the other side of that point is a paradigm shift that may
                                                                                 purpose CPus.
  fundamentally redefine how processing power translates into
  application performance.
                                                                         MArkeT TreNDS: DrIvINg The ShIFT TO ACCelerATeD
                                                                         COMPuTINg
                                                                         Several market trends are driving the shift to accelerated
  ACCelerAted ComPutIng: the ComIng PArAdIgm shIft
                                                                         computing:
  Accelerated computing is a framework that improves how                    >    Demand for excellent application performance on
  hardware and software interact to support emerging application                 energy-efficient, competitively priced computer
  areas and evolving workloads while providing energy efficiency,                systems.
  great performance, and low cost. In the future, with accelerated
                                                                            >    The equalization of hardware and software costs.
  computing:




                                                                                                                                                3
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
>    The move to virtualization in server and data center           uses a loosely coupled connection between the CPu, the
          computing.                                                     operating system, a device driver, and the accelerator. In a
                                                                         virtualized environment, using traditional device drivers that
     >    Diverging system requirements and unique needs.
                                                                         rely on dedicated system resources is not an option. All virtual
                                                                         machines must have access to all accelerators in the system,
  Performance, Energy Efficiency, and Cost
                                                                         which means that the accelerators must be virtualized and
  Commercial and consumer users alike have come to expect
                                                                         look like a peer processor rather than a device. This requires
  high-performance computing experiences with the lowest
                                                                         a closer relationship between the CPus and the accelerators.
  possible cost and highest energy efficiency. These expectations
                                                                         The accelerated computing framework facilitates the move to
  have been fueled by the consumer electronics industry, which
                                                                         tightly coupled silicon in virtual environments and creates greater
  consistently delivers devices that provide these benefits. For
                                                                         flexibility and better energy efficiency in data centers.
  example, an emerging grass-roots metric for laptop performance
  is the number of movies a person can watch on one battery.
                                                                         Diverging System Requirements
  Consumers want to watch multiple movies with a high-quality
                                                                         Computing environments today are diverging in several areas,
  picture on one battery, and they don’t want to spend $2,500 on
                                                                         making integrated solutions unfeasible:
  the laptop to get that kind of quality and performance.
                                                                            >    DIvergINg WOrklOADS: The workloads in
  Commercial users are also driving the demand for low-cost,                     today’s data centers are rapidly splitting into
  energy-efficient systems that can support high-performing                      two categories: basic productivity applications
  virtualized environments as well as applications that deliver                  and functions (like file, print, and storage) and
  media-rich content. As these concepts become significant                       performance-sensitive mission-critical applications
  drivers in the computing industry, using specialized hardware to               (like financial transaction processing and technical
  improve performance while keeping costs and energy use low                     computing). Administrators can increase the
  will become increasingly important.                                            performance of mission-critical applications by
                                                                                 using hardware with specialized accelerators.
  Equalizing Hardware and Software Costs
                                                                            >    POWer AND FOrM FACTOr requIreMeNTS:
   In the past, hardware development costs were much greater
                                                                                 From handhelds to servers, power and form
  than software development. however, software development
                                                                                 factor requirements are diverging rapidly. using
  costs (e.g., salaries, testing, and support) have increased to
                                                                                 appropriate accelerators can increase performance
  the point where costs in both areas are roughly the same.
                                                                                 on many of these platforms. For example, placing
  Consequently, attempts to improve hardware while ignoring
                                                                                 a video accelerator in a laptop would let the user
  software overlook a large component of the total system cost.
                                                                                 watch a high-definition movie without consuming
  Accelerated computing addresses this trend by providing a
                                                                                 all of the CPu and graphics processing unit (gPu)
  framework that supports faster development of applications that
                                                                                 cycles. Also, using special-purpose accelerators to
  can take advantage of hardware accelerators.
                                                                                 process high-bandwidth network communications
                                                                                 (e.g., 10 gbe) in data centers would be more efficient
  The Move to Virtualization
                                                                                 and create better performance than passing that
  early efforts in virtualization focused on server consolidation to
                                                                                 traffic through CPu cores.
  achieve better returns on investment in system management
                                                                            >    DeNSITy: Some newer computing approaches,
  and resource utilization in data centers. virtualization is rapidly
                                                                                 including cloud computing, provide value around
  becoming a mainstay of enterprise computing as its uses
                                                                                 optimized packaging and cooling for applications
  expand into storage, application delivery, and other areas.
                                                                                 and hardware. Accelerated computing supports
  The trend toward virtualization is changing the data-center
                                                                                 these approaches by providing integrated
  computing environment, with storage hosted in one node,
                                                                                 processors that help further optimize the use of
  workloads on another, and communication overhead distributed
                                                                                 computing resources.
  across the network.

                                                                         Accelerated computing provides a general platform that allows
  virtualization is actually a key component of an intelligent
                                                                         off-core chips like network and media accelerators to be added
  accelerated computing infrastructure because it supports
                                                                         to general-purpose multicore systems. The approach reduces
  a tight coupling between CPus and hardware accelerators.
                                                                         the cost of individualized, integrated solutions.
  The traditional approach to offloading work to an accelerator




                                                                                                                                               4
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
Amd: At the forefront of ACCelerAted ComPutIng                         hardware accelerators, building developer tools, and driving
                                                                         thought leadership by participating in standards-development
  All of these market trends are driving the move toward                 bodies. We believe that accelerated computing will be the result
  accelerated computing, and AMD is committed to influencing             of the coming paradigm shift, and we intend to offer commercial
  the direction of this movement. We have always maintained a            and consumer users flexible solutions that meet their computing
  forward-thinking approach to computing, which has created              needs based on the components available in the growing
  a track record of wise choices regarding future direction and          ecosystem.
  development:
                                                                         The rest of this paper provides a closer look at accelerated
     >    evolving from 32-bit to 64-bit processing with
                                                                         computing, including a brief history of innovations in processing
          AMD64.
                                                                         power to improve application performance, different approaches
     >    Developing the Direct Connect Architecture to
                                                                         to accelerated computing, and the value that this framework
          reduce bottlenecks and increase processing
                                                                         provides to commercial and consumer users alike.
          throughput with integrated memory controllers and
          hyperTransport™ technology.
     >    Designing and producing dual and multicore                     x86 ProCessor generAtIons And APPlICAtIon
          AMD Opteron™ processors.                                       PerformAnCe

                                                                         For the last 20 years, the focus of general-purpose processing
  In keeping with that approach, we recognized early on that
                                                                         (the x86 platform) has been on improving application
  silicon and software development costs would be too high
                                                                         performance through technological advancements in processor
  to support individualized processing solutions for emerging
                                                                         frequency, core microarchitecture, and multiple cores. The
  application areas. Consequently, we have invested considerable
                                                                         baseline assumption for defining “performance” and measuring
  time and resources into understanding the factors leading to the
                                                                         increases has been that frequency times the amount of
  current inflection point.
                                                                         work per clock cycle equals application performance. As the
                                                                         computing industry has matured, vendors have taken different
  We have been building expertise and capability in general-
                                                                         approaches to improving application performance using this
  and specific-purpose computing for the last several years.
                                                                         assumption (Figure 1).
  For example, our initiative code-named Torrenza and our
  Stream Computing work produced important research findings
                                                                         geNerATION 1: A FOCuS ON FrequeNCy AND ArChITeCTure
  about how to effectively couple general and specific purpose
                                                                         In the 1980s and 1990s, vendors focused on designing single-
  processors, which built a solid foundation for further research
                                                                         core CPus that would run single-threaded applications. At the
  and development in accelerated computing.
                                                                         time, increasing the frequency of the CPu was the clearest
                                                                         path to enhancing application performance because the higher
  The key to accelerated computing lies in effectively leveraging
                                                                         the frequency, the more work that could be done per second.
  processing power from general-purpose CPus and specific-
                                                                         vendors also worked on improving the core microarchitecture
  purpose processors like gPus, media accelerators, and others.
                                                                         of their CPus as a method to deliver better application
  The computing industry knows that gPus will eventually be
                                                                         performance. These architectural improvements were driven in
  integrated with CPus, and we acted on that future trend by
                                                                         part by the realization that continuing to increase a processor’s
  acquiring ATI Technologies. This move added significant gPu and
                                                                         frequency to enhance performance was not a viable option.
  multimedia expertise to our already extensive experience with
                                                                         Focusing on frequency alone led to increased design complexity
  general-purpose processors, creating a platform from which to
                                                                         and size as well as unacceptable thermal limits and power
  drive innovation and development for future computing needs
                                                                         consumption. This processor generation ended with a dramatic
  around data, voice, photo, video, and face recognition, among
                                                                         increase in power consumption and a significant decrease in the
  others.
                                                                         potential for future frequency increases (this was due to pushing
                                                                         the underlying microarchitecture to its limits and the declining
  Just as importantly, we are actively collaborating with large
                                                                         contributions for frequency in new process technologies).
  research institutions, independent software vendors (ISvs),
                                                                         ultimately, these factors set an impending limit on single-
  the open source community, and other organizations to
                                                                         threaded application performance and opened the door for a
  develop an open ecosystem around accelerated computing.
                                                                         new era of parallel computing.
  Our commitment to accelerated computing includes designing




                                                                                                                                             5
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
2000                                2008
      1985 1990


                     Generation One:                       Generation Two:                   Generation Three:
                      Frequency and                         Homogeneous                       Heterogeneous
                        Architecture                         Multicores                         Multicores


                                                                                         Best hardware for the task =
                                                         Improved throughput
                     Higher frequency =
                                                                                         better performance, energy
                                                         = better performance
                     better performance
                                                                                         efficiency, and lower cost




  FIgure 1. x86 PrOCeSSOr geNerATIONS FOCuSeD ON DIFFereNT MeThODS FOr IMPrOvINg APPlICATION PerFOrMANCe



  geNerATION 2: hOMOgeNeOuS MulTICOreS                                   applications fall into two performance categories:
  In the early 2000s, the industry began to introduce                       >    COMPuTe-PerFOrMANCe applications involve
  homogeneous multicore computer systems—multiple cores                          the collection and manipulation of large volumes
  of the same type built on one processor die. The goal of                       of complex data in a single process (i.e., how fast
  this processor generation was to create better application                     a task can be turned around). examples of these
  performance through improved throughput. The hardware                          applications include oil and gas exploration, drug
  technology that enabled multicore processors has evolved to                    development, and contextual search.
  the point where computer systems can now scale up to a very
                                                                            >    ThrOughPuT-PerFOrMANCe applications
  large number of CPu cores, although this generally results in
                                                                                 process massive amounts of data per second from
  untenable power envelopes and very high solution costs.
                                                                                 transactions that generally have short life spans,
                                                                                 follow predictable processes, and contain standard
  The Achilles’ Heels of General Purpose Multicore Systems
                                                                                 data elements. Throughput performance (i.e.,
  While adding more cores to a system is one option for continuing
                                                                                 how many tasks can be completed within a give
  to improve application performance, it isn’t always the best one.
                                                                                 time span) is vital for e-commerce sites, financial
  When it comes to enhancing application performance, general-
                                                                                 transactions, and mobile phone call monitoring.
  purpose, homogeneous multicore computer systems suffer from
  two Achilles’ heels:
                                                                         While general purpose CPus can be used in high-performance
     >    They cannot keep up with the performance                       computing applications like geophysics, science, and simulation,
          demands of modern and emerging computing                       offloading specific processing tasks to specialized accelerators
          applications while still meeting the criteria of energy        can dramatically improve power, space, and cost efficiency.
          efficiency and low cost.
     >    They are limited by current serial programming                 Current Serial Programming Models: homogeneous multicore
          models that cannot take full advantage of the                  systems can support multi-threaded applications, but they
          parallel hardware architectures inherent in multicore          do not necessarily enhance single-threaded application
          systems.                                                       performance. In fact, vendors have found that general-
                                                                         purpose multicore systems tend to reach a point of diminishing
  Characteristics of Modern and Emerging Computing: even with            performance returns as they scale beyond eight cores.
  multiple cores, general-purpose processing is hard pressed to
  provide the performance, cost savings, and energy efficiency           This discovery is supported by Amdahl’s law, which is used to
  needed for today’s data-driven computing needs. Modern                 find the maximum expected improvement to an overall system




                                                                                                                                            6
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
when only part of the system is improved (Figure 2). In the case       in the extra socket available on the motherboard.
  of application performance, the hardware has been improved
  through the addition of multiple cores with parallel processing        In a similar way, this next processor generation focuses on
  capabilities. however, the applications themselves are limited         offloading certain processing functions to specialized hardware
  by serial processing models and techniques. regardless of how          accelerators to improve overall performance. As an integral
  many cores are available, many sit idle as they wait for data          part of accelerated computing, these processors provide the
  to be processed serially. And even when certain algorithms             hardware foundation for today’s data-driven and media-rich
  are parallelized, the majority of the application’s processing is      computing experiences. Because AMD is taking a holistic
  still serial. The end result is that application performance on        approach to accelerated computing, this processor generation
  homogeneous multicore systems tends to be lower than the               will be designed to help provide better power efficiency and
  anticipated theoretical peak performance.                              lower costs for hardware as well as dramatically improving


                      140
                                                                                              Amdahl’s Law
                      120                                                                                                1
                                                                                              Speed-up =
                      100                                                                                        Sw + ( 1 - Sw )
                                                                                                                           N
                       80
     Speed-up                                                                                  Sw = % Serial Work
                       60                                                                      N = Number of Processors
                       40
                                                                                                 0% Serial
                       20                                                                        10% Serial
                                                                                                 35% Serial
                         0                                                                       100% Serial
                                1     2      4 8 16 32 64 128
                                            Number of CPU Cores
 FIgure 2. AMDAhl’S lAW ShOWS hOW APPlICATION PerFOrMANCe IMPrOveS The AMOuNT OF SerIAl WOrk DIMINISheS AND The NuMBer OF PrOCeSSOrS
 INCreASeS


  geNerATION 3: heTerOgeNeOuS MulTICOreS                                 programmer productivity to help reduce the cost of software
  The next processor generation is focused on creating                   development.
  heterogeneous multicore computer systems: computers
  with multiple processors that have different capabilities. A
  heterogeneous multicore system could contain several general           APProAChes to ACCelerAted ComPutIng
  purpose CPus plus several accelerators designed to enhance
                                                                         As accelerated computing gains momentum in the market, three
  the performance of a particular function (Figure 3).
                                                                         distinct approaches to using accelerators are emerging:
  This processor generation is at the heart of accelerated                  >    CPu-CeNTrIC APPrOACh: This approach seeks
  computing. And interestingly, its emergence is analogous to                    to improve application performance by extending
  the introduction of math coprocessors in the early days of                     the instruction set of the general purpose x86
  personal computing. When the original IBM PC was introduced                    architecture. While a general-purpose processor
  in 1979, it contained an 8088 CPu, which did not include a math                can do practically any type of processing
  coprocessor because most software at the time was not math                     task, using only CPus does not provide the
  intensive. Anyone who needed more advanced math functions                      simultaneous cost, performance, and energy
  bought an 8087 floating point math coprocessor and installed it                efficiency benefits that an accelerator can provide.




                                                                                                                                           7
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
ACCelerAted ComPutIng: lIghts, CAmerA, ACtIon!

               CPU                      CPU                       CPU                    Accelerated computing boils down to finding a balance of
                                                                                         power, performance, and cost. From commercial customers with
                                                                                         enterprise data centers to consumers who use their laptops
               Accelerator                                 GPU                           to watch movies, combining general-purpose processors
                                                                                         with hardware accelerators can help create that balance. The
                                                                                         following sections provide brief descriptions of how accelerated
                                                                                         computing is being implemented now and how it could be used
                                                     Accelerator                         in the future.

                                                                                         NOW PlAyINg ON A COMPuTer NeAr yOu
  FIgure 3. A heTerOgeNeOuS MulTICOre PrOCeSSOr CONTAINS MulTIPle                        Several industries are already implementing solutions based on
  COreS DeSIgNeD FOr SPeCIAlIzeD TASkS
                                                                                         accelerated computing. each industry has specific performance
                                                                                         requirements that demand processing power only available
        >      gPu-CeNTrIC APPrOACh: This approach seeks                                 through hardware accelerators.
               to meet modern computing needs by using only
               gPus for all processing tasks. Although gPus are                          Financial Services
               well-designed for coarse-grained parallel tasks,                          One critical performance driver for the financial services industry
               they do not provide the broad range of capabilities                       is low-latency trading. The amount of available market data that
               that a general-purpose CPu brings to processing                           financial services organizations must process is overwhelming:
               tasks, nor do they easily support fine-grained                            for example, the full data feed from the Options Price reporting
               parallelism.                                                              Authority (OPrA) surpassed one million messages per second
                                                                                         this year. The ability to process this data feed with the lowest
        >      BAlANCe AND OPTIMIzATION APPrOACh: This
                                                                                         possible latency (measured in microseconds) is key to success
               approach focuses on choosing the best hardware
                                                                                         for these organizations.
               to optimally process a given workload. It uses
               CPus to orchestrate system activities, balance
                                                                                         Companies like exegy and ACTIv Financial directly connect
               workloads, and determine which tasks to offload
                                                                                         specialized off-chip accelerators called field programmable gate
               to appropriate accelerators. Accelerators are
                                                                                         arrays (FPgAs) to homogeneous multicore processors using
               added to the system to complement existing CPu
                                                                                         hyperTransport™ technology to create ticker plant appliances
               abilities and improve application performance. This
                                                                                         that process these huge streams of financial market data.
               approach makes it possible for computer systems
                                                                                         exegy’s accelerated appliance has been benchmarked at
               and applications to be optimized for stream,
                                                                                         processing one million messages per second (the full OPrA
               parallel, sequential, or single-threaded processing,
                                                                                         feed) with an average latency of 80 microseconds1.
               depending on need.

                                                                                         Telecommunications
  AMD has chosen to follow the balance and optimization
                                                                                         As the demand for video on Tvs, laptops, cell phones, and
  approach. Because of our extensive experience developing
                                                                                         portable media players increases, telecommunications providers
  CPus, gPus, and accelerators, we believe that the best
                                                                                         need systems that can transcode huge volumes of streaming
  approach to accelerated computing is to cooperatively develop
                                                                                         video data to multiple devices simultaneously. In March 2007,
  the overall ecosystem (software as well as hardware), provide
                                                                                         Sun Microsystems introduced Sun Streaming System, a video
  the processors that we have the most expertise in, and
                                                                                         delivery platform that transcodes video data and delivers
  empower customers to choose the components that create the
                                                                                         personalized streams to end users. This platform relies on
  best performance for the applications they need to run.
                                                                                         multiple integrated network cards and general-purpose CPus to
  We intend to provide a set of accelerated computing solutions
                                                                                         provide high performance and energy efficiency in this video-
  targeted toward different markets, including, server, desktop,
                                                                                         processing solution2.
  and laptops. And our work with the Torrenza initiative has given
  us the ability to easily integrate external accelerators so our
  customers can differentiate and customize their computing
  solutions as needed.
  1
      Waters, “grid,” Bob giffords, 1 April 2008, (http://db.riskwaters.com/public/showPage.html?page=788660)
  2
      ziff Davis enterprise CustomSolutions, “expediting Time to Insight via Accelerated Computing,” 2007



                                                                                                                                                               8
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
Security                                                               10-, 40-, or 100-gB ethernet will require another approach.
  Accelerated computing provides the processing power                    Network packet accelerators can provide an alternative model
  needed for advanced security functions such as deep content            that balances power and performance for the data center.
  inspection, which is used to analyze data in intrusion prevention      Video Acceleration
  and antivirus applications. For example, antivirus engines can         Many consumer computer users use the number of movies they
  be required to inspect all data that enters a computer and then        can watch on a single battery as a criterion for evaluating laptop
  compare their findings against databases with many thousands           performance. The accelerated computing framework provides
  of threats. using specialized hardware accelerators that are           several options for meeting this performance requirement,
  tightly coupled with multicore CPus can increase performance           including using video accelerators to meet the criteria of high
  for deep-packet inspection and other advanced security                 performance, energy efficiency, and low cost.
  activities.
                                                                         Video Conferencing Systems
  COMINg SOON TO A COMPuTer NeAr yOu                                     high-end video conferencing systems available today deliver
  The emergence of media-rich applications, as well as                   an experience so realistic that it looks like the person you are
  increasingly data-driven computing tasks, are creating many            conferencing with is actually sitting on the other side of the table.
  new opportunities to implement the accelerated computing               however, this type of conferencing capability is generally limited
  framework. here is a brief sampling of the broad spectrum of           to enterprise organizations with financial resources that can
  possible ways that accelerated computing could be applied in           support this level of conferencing. The accelerated computing
  the near- to mid-term.                                                 framework provides a pathway toward creating high-quality
                                                                         video conferencing on the desktop at a reasonable cost to end
  Future High Bandwidth Data Centers                                     users.
  As very high bandwidth ethernet becomes a reality, current
  processing methods will not be able to keep up with network            Image Recognition
  communication. For example, a server running the full TCP/             As our society produces increasing quantities of digital images—
  IP stack uses most of its CPu power to process the traffic             both still and video—the need to automatically recognize and
  generated on a 1-gB ethernet network. Because the network              classify images is becoming more urgent. For example, right now
  processing alone consumes so many CPu cycles, implementing             it is difficult to find images with specific content unless someone


    the loomIng CrIsIs                             While some data parallel applications        programming tools and providing
                                                   are being written to address                 educational opportunities so that
    The growing gap between hardware               the increasing role of media-                current developers can more easily
    and software parallel processing               rich applications and virtualized            create parallel programs. PPl is a
    capabilities is creating a looming crisis in   environments, developers have                combined effort between top Stanford
    the computer industry. While hardware          realized that writing these kinds of         researchers in applications, languages,
    technology continues to advance in             applications at a low level is very          systems software, and computer
    the area of data parallel processing, the      challenging—especially in the areas of       architecture and leading companies
    parallel programming tools and skilled         synchronization, communication, and          in computer systems and software.
    developers needed to extract maximum           scheduling. And for the moment, higher-      In addition to developing parallel
    performance from multicore systems             level tools and languages to support         algorithms, development environments,
    are not readily available. According to        data parallel development efforts do         and runtime systems that scale
    Stanford university, by 2010 software          not exist. Another complication is           to 1000s of hardware threads, PPl
    developers will face systems with              that the computer science education          members are also working on making
    heterogeneous multicore processors             system is not prepared to teach parallel     parallel programming a core component
    and application specific accelerators          computing skills.                            of undergraduate computer science
    that run hundreds of hardware threads                                                       education. This educational goal will
    and leverage deep memory hierarchies           given this state of affairs, Stanford        develop a new class of scientists and
    to create over one teraflop of computing       university has created the Pervasive         engineers who can close the current
    power.                                         Parallelism lab (PPl). This project          parallelism gap and avert the looming
                                                   focuses on creating high-level               crisis.




                                                                                                                                                 9
AMD White Paper: The Industry-Changing Impact of Accelerated Computing
has manually tagged the images and added the content to a              possible for them to put different capabilities (general- and
  searchable database. large image sites pay people to do this to        specific-purpose processors, for example) on the same piece of
  provide a better customer experience, but the average person           silicon, which increases processor utilization, enhances energy
  with thousands of digital pictures does not take the time to tag all   efficiency, and assimilates capabilities that used to reside on
  of their photos for easy searching.                                    separate chips.

  The same is true of video: the only way to find your favorite          This integration passes on performance improvements to
  actor in a given movie and scene is to consult a source that           applications that support media-rich experiences for commercial
  has viewed the movie and manually recorded the information.            and consumer users. And in the data center, it supports the trend
  Automating this type of image and video indexing would be              toward virtualization as a means of increasing resource utilization
  a welcome addition to modern computing, and putting the                and reducing management costs. This entire trend focuses on
  capability in a $50 application would make commercial and              extracting more processing power per unit (e.g., performance per
  consumer computer users alike very happy.                              dollar) from all areas of the computing system, which translates
                                                                         into excellent performance for emerging applications that require
  Of course, this type of software is computationally complex and        more processing power.
  involves an incredible amount of processing. Applications that
  can process images and video, look for patterns, and create            Accelerated computing also makes the x86 instruction set more
  metadata that represents what is contained in the images will          affordable and applicable to many different uses by allowing
  require dedicated hardware accelerators to handle these tasks          computer systems to split and balance workloads between
  without bogging down the entire computer system.                       CPus and accelerators. For example, CPus excel at coordinating
                                                                         and processing linear tasks, while gPus are designed to process
  Healthcare and Network Security                                        large amounts of data in parallel. The more media-rich or data-
  The added emphasis on information privacy in the healthcare            intense the workload, the more parallel processing is needed
  industry has prompted new levels of network security. Instead          to provide a great computing experience, so the presence of
  of simply controlling who has access to healthcare information,        dedicated gPus to improve performance is critical. Accelerated
  IT departments in hospitals, insurance companies, and other            computing provides the framework for easily creating or
  healthcare institutions must conduct analyses of the information       modifying applications so their workloads can be split into serial
  that comes into their networks. Common types of security               and parallel tasks and then sent to the processor that can
  analysis include packet filtering, but more advanced techniques        provide the best performance.
  like deep packet analysis, which looks beyond the packet
  header to analyze the payload for potential threats, take many         The trend in modern computing toward visually rich content
  processor cycles. By offloading deep packet and other security         spans all levels of computing because people tend to remember
  analyses to a specialized hardware accelerator, healthcare             visual content better. high-end workstations are by nature
  IT departments could significantly increase application                graphic-intense, and standard client computer systems in
  performance without reducing available system resources and            commercial business are being called on to present more
  still meet energy efficiency and cost criteria.                        media-rich content, including presentations and data mining
                                                                         applications that present data visually. From a consumer
                                                                         computing perspective, we are experiencing a distinct shift from
  the CAse for ACCelerAted ComPutIng now                                 text to multimedia content. And finally, current and emerging
                                                                         operating system user interfaces are becoming more graphically
  Accelerated computing is not some passing fancy that will              rich.
  disappear as quickly as it appeared. quite the opposite:
  accelerated computing is a continuation of the current trend           In full bloom, accelerated computing will create many
  toward integration at the level of silicon, consumer computers,        opportunities to customize computer systems by providing
  and data centers. In terms of silicon integration, advancing           targeted, precise solutions for different kinds of workloads. And
  process technology is letting vendors place increasingly               it will create those opportunities while satisfying the need for
  complex designs on a single piece of silicon. This makes it            application performance, energy efficiency, and low cost.




Advanced Micro Devices                                                                        © 2008 Advanced Micro Devices, Inc. All rights reserved. AMD,
                                                                                              the AMD Arrow logo, and combinations thereof are trademarks of
One AMD Place
                                                                                              Advanced MicroDevices, Inc. Other names are for informational
Sunnyvale, CA 94088-3453
                                                                                              purposes only and may be trademarks of their respective owners.
www.amd.com
                                                                                              46125B

More Related Content

What's hot

The value of virtualized consolidation
The value of virtualized consolidationThe value of virtualized consolidation
The value of virtualized consolidation
IBM India Smarter Computing
 
Build a Smarter Data Center with Juniper Networks Qfabric
Build a Smarter Data Center with Juniper Networks QfabricBuild a Smarter Data Center with Juniper Networks Qfabric
Build a Smarter Data Center with Juniper Networks Qfabric
IBM India Smarter Computing
 
Cloud2009
Cloud2009Cloud2009
Cloud2009
namblasec
 
Bliv klar til cloud med Citrix Netscaler (pdf)
Bliv klar til cloud med Citrix Netscaler (pdf)Bliv klar til cloud med Citrix Netscaler (pdf)
Bliv klar til cloud med Citrix Netscaler (pdf)
Kim Jensen
 
Hp network function virtualization technical white paper NFV
Hp  network function virtualization technical white paper NFVHp  network function virtualization technical white paper NFV
Hp network function virtualization technical white paper NFV
Ignacio García-Carrillo
 
AMD Putting Server Virtualization to Work
AMD Putting Server Virtualization to WorkAMD Putting Server Virtualization to Work
AMD Putting Server Virtualization to Work
James Price
 
Cloud computing and sustainability
Cloud computing and sustainabilityCloud computing and sustainability
Cloud computing and sustainability
Office365UK
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET Journal
 
C017531925
C017531925C017531925
C017531925
IOSR Journals
 
Amd fusion apus
Amd fusion apusAmd fusion apus
Amd fusion apus
Maulik Dhameliya
 
vDesk: Introduction to Desktop Virtualization
vDesk: Introduction to Desktop VirtualizationvDesk: Introduction to Desktop Virtualization
vDesk: Introduction to Desktop Virtualization
RingCube Technologies, Inc.
 
65 72
65 7265 72
Ec24817824
Ec24817824Ec24817824
Ec24817824
IJERA Editor
 
Allocation Strategies of Virtual Resources in Cloud-Computing Networks
Allocation Strategies of Virtual Resources in Cloud-Computing NetworksAllocation Strategies of Virtual Resources in Cloud-Computing Networks
Allocation Strategies of Virtual Resources in Cloud-Computing Networks
IJERA Editor
 
Gpu
GpuGpu
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
AIRCC Publishing Corporation
 
The Journey Toward the Software-Defined Data Center
The Journey Toward the Software-Defined Data CenterThe Journey Toward the Software-Defined Data Center
The Journey Toward the Software-Defined Data Center
Cognizant
 
Sla based optimization of power and migration cost in cloud computing
Sla based optimization of power and migration cost in cloud computingSla based optimization of power and migration cost in cloud computing
Sla based optimization of power and migration cost in cloud computing
Nikhil Venugopal
 

What's hot (18)

The value of virtualized consolidation
The value of virtualized consolidationThe value of virtualized consolidation
The value of virtualized consolidation
 
Build a Smarter Data Center with Juniper Networks Qfabric
Build a Smarter Data Center with Juniper Networks QfabricBuild a Smarter Data Center with Juniper Networks Qfabric
Build a Smarter Data Center with Juniper Networks Qfabric
 
Cloud2009
Cloud2009Cloud2009
Cloud2009
 
Bliv klar til cloud med Citrix Netscaler (pdf)
Bliv klar til cloud med Citrix Netscaler (pdf)Bliv klar til cloud med Citrix Netscaler (pdf)
Bliv klar til cloud med Citrix Netscaler (pdf)
 
Hp network function virtualization technical white paper NFV
Hp  network function virtualization technical white paper NFVHp  network function virtualization technical white paper NFV
Hp network function virtualization technical white paper NFV
 
AMD Putting Server Virtualization to Work
AMD Putting Server Virtualization to WorkAMD Putting Server Virtualization to Work
AMD Putting Server Virtualization to Work
 
Cloud computing and sustainability
Cloud computing and sustainabilityCloud computing and sustainability
Cloud computing and sustainability
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
 
C017531925
C017531925C017531925
C017531925
 
Amd fusion apus
Amd fusion apusAmd fusion apus
Amd fusion apus
 
vDesk: Introduction to Desktop Virtualization
vDesk: Introduction to Desktop VirtualizationvDesk: Introduction to Desktop Virtualization
vDesk: Introduction to Desktop Virtualization
 
65 72
65 7265 72
65 72
 
Ec24817824
Ec24817824Ec24817824
Ec24817824
 
Allocation Strategies of Virtual Resources in Cloud-Computing Networks
Allocation Strategies of Virtual Resources in Cloud-Computing NetworksAllocation Strategies of Virtual Resources in Cloud-Computing Networks
Allocation Strategies of Virtual Resources in Cloud-Computing Networks
 
Gpu
GpuGpu
Gpu
 
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTINGGROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
GROUP BASED RESOURCE MANAGEMENT AND PRICING MODEL IN CLOUD COMPUTING
 
The Journey Toward the Software-Defined Data Center
The Journey Toward the Software-Defined Data CenterThe Journey Toward the Software-Defined Data Center
The Journey Toward the Software-Defined Data Center
 
Sla based optimization of power and migration cost in cloud computing
Sla based optimization of power and migration cost in cloud computingSla based optimization of power and migration cost in cloud computing
Sla based optimization of power and migration cost in cloud computing
 

Similar to Accelerated Computing

Interventions for scientific and enterprise applications
Interventions for scientific and enterprise applicationsInterventions for scientific and enterprise applications
Interventions for scientific and enterprise applications
eSAT Publishing House
 
Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...
eSAT Journals
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
 
Clusetrreport
ClusetrreportClusetrreport
Clusetrreport
Sreejith Nair
 
Informatica push down optimization implementation
Informatica push down optimization implementationInformatica push down optimization implementation
Informatica push down optimization implementation
divjeev
 
Unovie Technical Architecture
Unovie Technical ArchitectureUnovie Technical Architecture
Unovie Technical Architecture
Ranga Chakravarthula
 
Computing power technology – an overview.pdf
Computing power technology – an overview.pdfComputing power technology – an overview.pdf
Computing power technology – an overview.pdf
Temok IT Services
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
inside-BigData.com
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute Project
Hitesh Jani
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
Nous Infosystems
 
IBM Cloud
IBM CloudIBM Cloud
IBM Cloud
simonarden
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing Introduction
guest90f660
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
Liming Liu
 
Network performance - skilled craft to hard science
Network performance - skilled craft to hard scienceNetwork performance - skilled craft to hard science
Network performance - skilled craft to hard science
Martin Geddes
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
Abacus Technologies
 
5G Edge Computing Whitepaper, FCC Advisory Council
5G Edge Computing Whitepaper, FCC Advisory Council5G Edge Computing Whitepaper, FCC Advisory Council
5G Edge Computing Whitepaper, FCC Advisory Council
DESMOND YUEN
 
GxP Data Integrity for Cloud Apps – Part 1
GxP Data Integrity for Cloud Apps – Part 1GxP Data Integrity for Cloud Apps – Part 1
GxP Data Integrity for Cloud Apps – Part 1
Agaram Technologies
 
How Should I Prepare Your Enterprise For The Increased...
How Should I Prepare Your Enterprise For The Increased...How Should I Prepare Your Enterprise For The Increased...
How Should I Prepare Your Enterprise For The Increased...
Claudia Brown
 
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET Journal
 
Smarter Computing and Breakthrough IT Economics
Smarter Computing and Breakthrough IT EconomicsSmarter Computing and Breakthrough IT Economics
Smarter Computing and Breakthrough IT Economics
IBM India Smarter Computing
 

Similar to Accelerated Computing (20)

Interventions for scientific and enterprise applications
Interventions for scientific and enterprise applicationsInterventions for scientific and enterprise applications
Interventions for scientific and enterprise applications
 
Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...Interventions for scientific and enterprise applications based on high perfor...
Interventions for scientific and enterprise applications based on high perfor...
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Clusetrreport
ClusetrreportClusetrreport
Clusetrreport
 
Informatica push down optimization implementation
Informatica push down optimization implementationInformatica push down optimization implementation
Informatica push down optimization implementation
 
Unovie Technical Architecture
Unovie Technical ArchitectureUnovie Technical Architecture
Unovie Technical Architecture
 
Computing power technology – an overview.pdf
Computing power technology – an overview.pdfComputing power technology – an overview.pdf
Computing power technology – an overview.pdf
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute Project
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
IBM Cloud
IBM CloudIBM Cloud
IBM Cloud
 
Cloud Computing Introduction
Cloud Computing IntroductionCloud Computing Introduction
Cloud Computing Introduction
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Network performance - skilled craft to hard science
Network performance - skilled craft to hard scienceNetwork performance - skilled craft to hard science
Network performance - skilled craft to hard science
 
Priorities Shift In IC Design
Priorities Shift In IC DesignPriorities Shift In IC Design
Priorities Shift In IC Design
 
5G Edge Computing Whitepaper, FCC Advisory Council
5G Edge Computing Whitepaper, FCC Advisory Council5G Edge Computing Whitepaper, FCC Advisory Council
5G Edge Computing Whitepaper, FCC Advisory Council
 
GxP Data Integrity for Cloud Apps – Part 1
GxP Data Integrity for Cloud Apps – Part 1GxP Data Integrity for Cloud Apps – Part 1
GxP Data Integrity for Cloud Apps – Part 1
 
How Should I Prepare Your Enterprise For The Increased...
How Should I Prepare Your Enterprise For The Increased...How Should I Prepare Your Enterprise For The Increased...
How Should I Prepare Your Enterprise For The Increased...
 
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
 
Smarter Computing and Breakthrough IT Economics
Smarter Computing and Breakthrough IT EconomicsSmarter Computing and Breakthrough IT Economics
Smarter Computing and Breakthrough IT Economics
 

More from jworth

University of Texas
University of TexasUniversity of Texas
University of Texas
jworth
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
jworth
 
AMD's Lonestar Campus
AMD's Lonestar CampusAMD's Lonestar Campus
AMD's Lonestar Campus
jworth
 
AMD Financial Analyst\'s Day
AMD Financial Analyst\'s DayAMD Financial Analyst\'s Day
AMD Financial Analyst\'s Day
jworth
 
Amd V Nested Paging
Amd V Nested PagingAmd V Nested Paging
Amd V Nested Paging
jworth
 
Ces08
Ces08Ces08
Ces08
jworth
 
Virtualization
VirtualizationVirtualization
Virtualization
jworth
 

More from jworth (7)

University of Texas
University of TexasUniversity of Texas
University of Texas
 
Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
 
AMD's Lonestar Campus
AMD's Lonestar CampusAMD's Lonestar Campus
AMD's Lonestar Campus
 
AMD Financial Analyst\'s Day
AMD Financial Analyst\'s DayAMD Financial Analyst\'s Day
AMD Financial Analyst\'s Day
 
Amd V Nested Paging
Amd V Nested PagingAmd V Nested Paging
Amd V Nested Paging
 
Ces08
Ces08Ces08
Ces08
 
Virtualization
VirtualizationVirtualization
Virtualization
 

Recently uploaded

Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
Ivanti
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
ChristopherTHyatt
 
The Evolution of Remote Server Management
The Evolution of Remote Server ManagementThe Evolution of Remote Server Management
The Evolution of Remote Server Management
Bert Blevins
 
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAIApplying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
ssuserd4e0d2
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 

Recently uploaded (20)

Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
July Patch Tuesday
July Patch TuesdayJuly Patch Tuesday
July Patch Tuesday
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
How to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdfHow to build a generative AI solution A step-by-step guide (2).pdf
How to build a generative AI solution A step-by-step guide (2).pdf
 
The Evolution of Remote Server Management
The Evolution of Remote Server ManagementThe Evolution of Remote Server Management
The Evolution of Remote Server Management
 
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAIApplying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 

Accelerated Computing

  • 1. The future is fusion. The Industry-Changing Impact of Accelerated Computing. Computer processing has reached a crossroads where the relationship between hardware and software must change to support the increased processing needs of modern computing workloads, including virtualized environments and media-rich applications. Accelerated computing uses specialized hardware to increase the speed of certain processing tasks, offering commercial and consumer users simultaneous energy efficiency, high performance, and low cost. From a hardware perspective, accelerated computing provides a general platform that supports the addition of specialized processing units, or accelerators, to a multicore computer. These accelerators may take the form of off-core chips such as media or network packet accelerators, or they may be additional cores designed for specific types of processing tasks. From a software perspective, the accelerated computing framework will simplify writing parallel applications: developers will have high-level tools that support parallel programming, and applications will not have to specify which hardware should be used for a given processing task. AMD has been building expertise and capability in general- and specific-purpose computing for the last several years, including how to effectively couple general- and specific-purpose processors. This activity has built a solid foundation for further research and development in accelerated computing. We are actively collaborating with large research institutions, independent software vendors, the open source community, and other organizations to develop an open ecosystem around accelerated computing. Our commitment to accelerated computing includes designing hardware accelerators, building developer tools, and driving thought leadership by participating in standards development bodies. We believe that accelerated computing embodies a rapidly approaching paradigm shift, and we intend to offer commercial and consumer users flexible computing solutions that meet their needs based on the components available in the growing ecosystem. This paper explores the accelerated computing framework. It provides examples of accelerated computing in action today, describes potential uses of accelerators in the near- to mid-term, and makes a clear case that accelerated computing is the natural outcome of the computer industry’s current integration trend. It further illustrates that, in full bloom, accelerated computing will create many opportunities to customize computer systems by providing targeted, precise solutions for different kinds of workloads to satisfy the need for application performance, energy efficiency, and low cost. AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 2. Contents A NeW INFleCTION POINT .................................................................................................................................................................................................................................................................3 ACCelerATeD COMPuTINg: The COMINg PArADIgM ShIFT ..............................................................................................................................................................................3 MArkeT TreNDS: DrIvINg The ShIFT TO ACCelerATeD COMPuTINg............................................................................................................................................3 AMD: AT The FOreFrONT OF ACCelerATeD COMPuTINg.....................................................................................................................................................................................5 x86 PrOCeSSOr geNerATIONS AND APPlICATION PerFOrMANCe .........................................................................................................................................................5 geNerATION 1: A FOCuS ON FrequeNCy AND ArChITeCTure ...........................................................................................................................................................5 geNerATION 2: hOMOgeNeOuS MulTICOreS.................................................................................................................................................................................................. 6 geNerATION 3: heTerOgeNeOuS MulTICOreS ...............................................................................................................................................................................................7 APPrOACheS TO ACCelerATeD COMPuTINg .................................................................................................................................................................................................................7 ACCelerATeD COMPuTINg: lIghTS, CAMerA, ACTION! ......................................................................................................................................................................................... 8 NOW PlAyINg ON A COMPuTer NeAr yOu ........................................................................................................................................................................................................ 8 COMINg SOON TO A COMPuTer NeAr yOu ....................................................................................................................................................................................................... 9 SIDeBAr: The lOOMINg CrISIS .................................................................................................................................................................................................................................................. 9 The CASe FOr ACCelerATeD COMPuTINg NOW.......................................................................................................................................................................................................10 2 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 3. A new InfleCtIon PoInt > A computer will include general and specialized processors. The computing industry is poised on the edge of an inflection > A processor will be chosen to handle a given task point where the interaction of hardware and software compute based on its ability to provide optimal performance capabilities will change radically. This inflection point is being in terms of compute power, media processing, and driven by the confluence of the following factors: presentation, as well as energy efficiency and cost. > evolving commercial and consumer workloads > Software will take advantage of the best hardware (including virtualization and increasingly media-rich for a given processing task without specifying how applications) that demand simultaneous energy or where the processing is actually done. efficiency, high performance, and low cost. > Developers will have high-level tools for writing > Newer applications with parallel algorithms whose parallel applications. processing can be done across multiple cores, resulting in better performance. From a hardware perspective, accelerated computing provides a > A scarcity of programmers with the knowledge and framework that supports the addition of specialized processors, skills needed to write parallel applications, and a or accelerators, to a multicore computer. These accelerators shortage of tools to help programmers of all ability may take the form of off-core chips, such as media or network levels develop parallel applications. packet accelerators, or they may be additional cores designed for specific types of processing tasks. From a software Commercial and consumer demand for increased computing perspective, the accelerated computing framework will simplify performance has consistently driven technological innovation writing parallel applications. Developers will have high-level tools in the computing industry, creating advances in processor that support parallel programming, and the applications will not power, core microarchitecture, and multicore designs. however, have to specify which hardware should be used for a given the emergence of graphically intense applications, virtualized processing task. Accelerated computing will provide a software environments, and parallel hardware architectures, combined framework that includes platform abstractions, a specialized with a lack of skilled programmers and development tool sets, runtime environment, application programming interfaces (APIs), has provoked intense scrutiny of how hardware and software and libraries to support this simplified, flexible interaction with should interact to provide satisfying computing experiences in hardware accelerators. this new era. Overall, accelerators provide several important benefits over Major computing industry players, including vendors, general purpose central processing units (CPus) alone: standards bodies, and educational institutions, are exploring > They speed up certain serial tasks as well as ways to improve workload-specific processing performance supporting parallel algorithms. for commercial and consumer computing applications while maintaining a focus on energy efficiency and cost. And they > They tend to have lower latency and higher are grappling with the challenge of unlocking the performance throughput for processing tasks. improvements possible in parallel architectures and algorithms > They are more energy efficient: within a given without provoking a massive rewrite of current applications. power envelope, special-purpose hardware All of this activity has created the current inflection point, and accelerators will run better and faster than general- on the other side of that point is a paradigm shift that may purpose CPus. fundamentally redefine how processing power translates into application performance. MArkeT TreNDS: DrIvINg The ShIFT TO ACCelerATeD COMPuTINg Several market trends are driving the shift to accelerated ACCelerAted ComPutIng: the ComIng PArAdIgm shIft computing: Accelerated computing is a framework that improves how > Demand for excellent application performance on hardware and software interact to support emerging application energy-efficient, competitively priced computer areas and evolving workloads while providing energy efficiency, systems. great performance, and low cost. In the future, with accelerated > The equalization of hardware and software costs. computing: 3 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 4. > The move to virtualization in server and data center uses a loosely coupled connection between the CPu, the computing. operating system, a device driver, and the accelerator. In a virtualized environment, using traditional device drivers that > Diverging system requirements and unique needs. rely on dedicated system resources is not an option. All virtual machines must have access to all accelerators in the system, Performance, Energy Efficiency, and Cost which means that the accelerators must be virtualized and Commercial and consumer users alike have come to expect look like a peer processor rather than a device. This requires high-performance computing experiences with the lowest a closer relationship between the CPus and the accelerators. possible cost and highest energy efficiency. These expectations The accelerated computing framework facilitates the move to have been fueled by the consumer electronics industry, which tightly coupled silicon in virtual environments and creates greater consistently delivers devices that provide these benefits. For flexibility and better energy efficiency in data centers. example, an emerging grass-roots metric for laptop performance is the number of movies a person can watch on one battery. Diverging System Requirements Consumers want to watch multiple movies with a high-quality Computing environments today are diverging in several areas, picture on one battery, and they don’t want to spend $2,500 on making integrated solutions unfeasible: the laptop to get that kind of quality and performance. > DIvergINg WOrklOADS: The workloads in Commercial users are also driving the demand for low-cost, today’s data centers are rapidly splitting into energy-efficient systems that can support high-performing two categories: basic productivity applications virtualized environments as well as applications that deliver and functions (like file, print, and storage) and media-rich content. As these concepts become significant performance-sensitive mission-critical applications drivers in the computing industry, using specialized hardware to (like financial transaction processing and technical improve performance while keeping costs and energy use low computing). Administrators can increase the will become increasingly important. performance of mission-critical applications by using hardware with specialized accelerators. Equalizing Hardware and Software Costs > POWer AND FOrM FACTOr requIreMeNTS: In the past, hardware development costs were much greater From handhelds to servers, power and form than software development. however, software development factor requirements are diverging rapidly. using costs (e.g., salaries, testing, and support) have increased to appropriate accelerators can increase performance the point where costs in both areas are roughly the same. on many of these platforms. For example, placing Consequently, attempts to improve hardware while ignoring a video accelerator in a laptop would let the user software overlook a large component of the total system cost. watch a high-definition movie without consuming Accelerated computing addresses this trend by providing a all of the CPu and graphics processing unit (gPu) framework that supports faster development of applications that cycles. Also, using special-purpose accelerators to can take advantage of hardware accelerators. process high-bandwidth network communications (e.g., 10 gbe) in data centers would be more efficient The Move to Virtualization and create better performance than passing that early efforts in virtualization focused on server consolidation to traffic through CPu cores. achieve better returns on investment in system management > DeNSITy: Some newer computing approaches, and resource utilization in data centers. virtualization is rapidly including cloud computing, provide value around becoming a mainstay of enterprise computing as its uses optimized packaging and cooling for applications expand into storage, application delivery, and other areas. and hardware. Accelerated computing supports The trend toward virtualization is changing the data-center these approaches by providing integrated computing environment, with storage hosted in one node, processors that help further optimize the use of workloads on another, and communication overhead distributed computing resources. across the network. Accelerated computing provides a general platform that allows virtualization is actually a key component of an intelligent off-core chips like network and media accelerators to be added accelerated computing infrastructure because it supports to general-purpose multicore systems. The approach reduces a tight coupling between CPus and hardware accelerators. the cost of individualized, integrated solutions. The traditional approach to offloading work to an accelerator 4 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 5. Amd: At the forefront of ACCelerAted ComPutIng hardware accelerators, building developer tools, and driving thought leadership by participating in standards-development All of these market trends are driving the move toward bodies. We believe that accelerated computing will be the result accelerated computing, and AMD is committed to influencing of the coming paradigm shift, and we intend to offer commercial the direction of this movement. We have always maintained a and consumer users flexible solutions that meet their computing forward-thinking approach to computing, which has created needs based on the components available in the growing a track record of wise choices regarding future direction and ecosystem. development: The rest of this paper provides a closer look at accelerated > evolving from 32-bit to 64-bit processing with computing, including a brief history of innovations in processing AMD64. power to improve application performance, different approaches > Developing the Direct Connect Architecture to to accelerated computing, and the value that this framework reduce bottlenecks and increase processing provides to commercial and consumer users alike. throughput with integrated memory controllers and hyperTransport™ technology. > Designing and producing dual and multicore x86 ProCessor generAtIons And APPlICAtIon AMD Opteron™ processors. PerformAnCe For the last 20 years, the focus of general-purpose processing In keeping with that approach, we recognized early on that (the x86 platform) has been on improving application silicon and software development costs would be too high performance through technological advancements in processor to support individualized processing solutions for emerging frequency, core microarchitecture, and multiple cores. The application areas. Consequently, we have invested considerable baseline assumption for defining “performance” and measuring time and resources into understanding the factors leading to the increases has been that frequency times the amount of current inflection point. work per clock cycle equals application performance. As the computing industry has matured, vendors have taken different We have been building expertise and capability in general- approaches to improving application performance using this and specific-purpose computing for the last several years. assumption (Figure 1). For example, our initiative code-named Torrenza and our Stream Computing work produced important research findings geNerATION 1: A FOCuS ON FrequeNCy AND ArChITeCTure about how to effectively couple general and specific purpose In the 1980s and 1990s, vendors focused on designing single- processors, which built a solid foundation for further research core CPus that would run single-threaded applications. At the and development in accelerated computing. time, increasing the frequency of the CPu was the clearest path to enhancing application performance because the higher The key to accelerated computing lies in effectively leveraging the frequency, the more work that could be done per second. processing power from general-purpose CPus and specific- vendors also worked on improving the core microarchitecture purpose processors like gPus, media accelerators, and others. of their CPus as a method to deliver better application The computing industry knows that gPus will eventually be performance. These architectural improvements were driven in integrated with CPus, and we acted on that future trend by part by the realization that continuing to increase a processor’s acquiring ATI Technologies. This move added significant gPu and frequency to enhance performance was not a viable option. multimedia expertise to our already extensive experience with Focusing on frequency alone led to increased design complexity general-purpose processors, creating a platform from which to and size as well as unacceptable thermal limits and power drive innovation and development for future computing needs consumption. This processor generation ended with a dramatic around data, voice, photo, video, and face recognition, among increase in power consumption and a significant decrease in the others. potential for future frequency increases (this was due to pushing the underlying microarchitecture to its limits and the declining Just as importantly, we are actively collaborating with large contributions for frequency in new process technologies). research institutions, independent software vendors (ISvs), ultimately, these factors set an impending limit on single- the open source community, and other organizations to threaded application performance and opened the door for a develop an open ecosystem around accelerated computing. new era of parallel computing. Our commitment to accelerated computing includes designing 5 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 6. 2000 2008 1985 1990 Generation One: Generation Two: Generation Three: Frequency and Homogeneous Heterogeneous Architecture Multicores Multicores Best hardware for the task = Improved throughput Higher frequency = better performance, energy = better performance better performance efficiency, and lower cost FIgure 1. x86 PrOCeSSOr geNerATIONS FOCuSeD ON DIFFereNT MeThODS FOr IMPrOvINg APPlICATION PerFOrMANCe geNerATION 2: hOMOgeNeOuS MulTICOreS applications fall into two performance categories: In the early 2000s, the industry began to introduce > COMPuTe-PerFOrMANCe applications involve homogeneous multicore computer systems—multiple cores the collection and manipulation of large volumes of the same type built on one processor die. The goal of of complex data in a single process (i.e., how fast this processor generation was to create better application a task can be turned around). examples of these performance through improved throughput. The hardware applications include oil and gas exploration, drug technology that enabled multicore processors has evolved to development, and contextual search. the point where computer systems can now scale up to a very > ThrOughPuT-PerFOrMANCe applications large number of CPu cores, although this generally results in process massive amounts of data per second from untenable power envelopes and very high solution costs. transactions that generally have short life spans, follow predictable processes, and contain standard The Achilles’ Heels of General Purpose Multicore Systems data elements. Throughput performance (i.e., While adding more cores to a system is one option for continuing how many tasks can be completed within a give to improve application performance, it isn’t always the best one. time span) is vital for e-commerce sites, financial When it comes to enhancing application performance, general- transactions, and mobile phone call monitoring. purpose, homogeneous multicore computer systems suffer from two Achilles’ heels: While general purpose CPus can be used in high-performance > They cannot keep up with the performance computing applications like geophysics, science, and simulation, demands of modern and emerging computing offloading specific processing tasks to specialized accelerators applications while still meeting the criteria of energy can dramatically improve power, space, and cost efficiency. efficiency and low cost. > They are limited by current serial programming Current Serial Programming Models: homogeneous multicore models that cannot take full advantage of the systems can support multi-threaded applications, but they parallel hardware architectures inherent in multicore do not necessarily enhance single-threaded application systems. performance. In fact, vendors have found that general- purpose multicore systems tend to reach a point of diminishing Characteristics of Modern and Emerging Computing: even with performance returns as they scale beyond eight cores. multiple cores, general-purpose processing is hard pressed to provide the performance, cost savings, and energy efficiency This discovery is supported by Amdahl’s law, which is used to needed for today’s data-driven computing needs. Modern find the maximum expected improvement to an overall system 6 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 7. when only part of the system is improved (Figure 2). In the case in the extra socket available on the motherboard. of application performance, the hardware has been improved through the addition of multiple cores with parallel processing In a similar way, this next processor generation focuses on capabilities. however, the applications themselves are limited offloading certain processing functions to specialized hardware by serial processing models and techniques. regardless of how accelerators to improve overall performance. As an integral many cores are available, many sit idle as they wait for data part of accelerated computing, these processors provide the to be processed serially. And even when certain algorithms hardware foundation for today’s data-driven and media-rich are parallelized, the majority of the application’s processing is computing experiences. Because AMD is taking a holistic still serial. The end result is that application performance on approach to accelerated computing, this processor generation homogeneous multicore systems tends to be lower than the will be designed to help provide better power efficiency and anticipated theoretical peak performance. lower costs for hardware as well as dramatically improving 140 Amdahl’s Law 120 1 Speed-up = 100 Sw + ( 1 - Sw ) N 80 Speed-up Sw = % Serial Work 60 N = Number of Processors 40 0% Serial 20 10% Serial 35% Serial 0 100% Serial 1 2 4 8 16 32 64 128 Number of CPU Cores FIgure 2. AMDAhl’S lAW ShOWS hOW APPlICATION PerFOrMANCe IMPrOveS The AMOuNT OF SerIAl WOrk DIMINISheS AND The NuMBer OF PrOCeSSOrS INCreASeS geNerATION 3: heTerOgeNeOuS MulTICOreS programmer productivity to help reduce the cost of software The next processor generation is focused on creating development. heterogeneous multicore computer systems: computers with multiple processors that have different capabilities. A heterogeneous multicore system could contain several general APProAChes to ACCelerAted ComPutIng purpose CPus plus several accelerators designed to enhance As accelerated computing gains momentum in the market, three the performance of a particular function (Figure 3). distinct approaches to using accelerators are emerging: This processor generation is at the heart of accelerated > CPu-CeNTrIC APPrOACh: This approach seeks computing. And interestingly, its emergence is analogous to to improve application performance by extending the introduction of math coprocessors in the early days of the instruction set of the general purpose x86 personal computing. When the original IBM PC was introduced architecture. While a general-purpose processor in 1979, it contained an 8088 CPu, which did not include a math can do practically any type of processing coprocessor because most software at the time was not math task, using only CPus does not provide the intensive. Anyone who needed more advanced math functions simultaneous cost, performance, and energy bought an 8087 floating point math coprocessor and installed it efficiency benefits that an accelerator can provide. 7 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 8. ACCelerAted ComPutIng: lIghts, CAmerA, ACtIon! CPU CPU CPU Accelerated computing boils down to finding a balance of power, performance, and cost. From commercial customers with enterprise data centers to consumers who use their laptops Accelerator GPU to watch movies, combining general-purpose processors with hardware accelerators can help create that balance. The following sections provide brief descriptions of how accelerated computing is being implemented now and how it could be used Accelerator in the future. NOW PlAyINg ON A COMPuTer NeAr yOu FIgure 3. A heTerOgeNeOuS MulTICOre PrOCeSSOr CONTAINS MulTIPle Several industries are already implementing solutions based on COreS DeSIgNeD FOr SPeCIAlIzeD TASkS accelerated computing. each industry has specific performance requirements that demand processing power only available > gPu-CeNTrIC APPrOACh: This approach seeks through hardware accelerators. to meet modern computing needs by using only gPus for all processing tasks. Although gPus are Financial Services well-designed for coarse-grained parallel tasks, One critical performance driver for the financial services industry they do not provide the broad range of capabilities is low-latency trading. The amount of available market data that that a general-purpose CPu brings to processing financial services organizations must process is overwhelming: tasks, nor do they easily support fine-grained for example, the full data feed from the Options Price reporting parallelism. Authority (OPrA) surpassed one million messages per second this year. The ability to process this data feed with the lowest > BAlANCe AND OPTIMIzATION APPrOACh: This possible latency (measured in microseconds) is key to success approach focuses on choosing the best hardware for these organizations. to optimally process a given workload. It uses CPus to orchestrate system activities, balance Companies like exegy and ACTIv Financial directly connect workloads, and determine which tasks to offload specialized off-chip accelerators called field programmable gate to appropriate accelerators. Accelerators are arrays (FPgAs) to homogeneous multicore processors using added to the system to complement existing CPu hyperTransport™ technology to create ticker plant appliances abilities and improve application performance. This that process these huge streams of financial market data. approach makes it possible for computer systems exegy’s accelerated appliance has been benchmarked at and applications to be optimized for stream, processing one million messages per second (the full OPrA parallel, sequential, or single-threaded processing, feed) with an average latency of 80 microseconds1. depending on need. Telecommunications AMD has chosen to follow the balance and optimization As the demand for video on Tvs, laptops, cell phones, and approach. Because of our extensive experience developing portable media players increases, telecommunications providers CPus, gPus, and accelerators, we believe that the best need systems that can transcode huge volumes of streaming approach to accelerated computing is to cooperatively develop video data to multiple devices simultaneously. In March 2007, the overall ecosystem (software as well as hardware), provide Sun Microsystems introduced Sun Streaming System, a video the processors that we have the most expertise in, and delivery platform that transcodes video data and delivers empower customers to choose the components that create the personalized streams to end users. This platform relies on best performance for the applications they need to run. multiple integrated network cards and general-purpose CPus to We intend to provide a set of accelerated computing solutions provide high performance and energy efficiency in this video- targeted toward different markets, including, server, desktop, processing solution2. and laptops. And our work with the Torrenza initiative has given us the ability to easily integrate external accelerators so our customers can differentiate and customize their computing solutions as needed. 1 Waters, “grid,” Bob giffords, 1 April 2008, (http://db.riskwaters.com/public/showPage.html?page=788660) 2 ziff Davis enterprise CustomSolutions, “expediting Time to Insight via Accelerated Computing,” 2007 8 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 9. Security 10-, 40-, or 100-gB ethernet will require another approach. Accelerated computing provides the processing power Network packet accelerators can provide an alternative model needed for advanced security functions such as deep content that balances power and performance for the data center. inspection, which is used to analyze data in intrusion prevention Video Acceleration and antivirus applications. For example, antivirus engines can Many consumer computer users use the number of movies they be required to inspect all data that enters a computer and then can watch on a single battery as a criterion for evaluating laptop compare their findings against databases with many thousands performance. The accelerated computing framework provides of threats. using specialized hardware accelerators that are several options for meeting this performance requirement, tightly coupled with multicore CPus can increase performance including using video accelerators to meet the criteria of high for deep-packet inspection and other advanced security performance, energy efficiency, and low cost. activities. Video Conferencing Systems COMINg SOON TO A COMPuTer NeAr yOu high-end video conferencing systems available today deliver The emergence of media-rich applications, as well as an experience so realistic that it looks like the person you are increasingly data-driven computing tasks, are creating many conferencing with is actually sitting on the other side of the table. new opportunities to implement the accelerated computing however, this type of conferencing capability is generally limited framework. here is a brief sampling of the broad spectrum of to enterprise organizations with financial resources that can possible ways that accelerated computing could be applied in support this level of conferencing. The accelerated computing the near- to mid-term. framework provides a pathway toward creating high-quality video conferencing on the desktop at a reasonable cost to end Future High Bandwidth Data Centers users. As very high bandwidth ethernet becomes a reality, current processing methods will not be able to keep up with network Image Recognition communication. For example, a server running the full TCP/ As our society produces increasing quantities of digital images— IP stack uses most of its CPu power to process the traffic both still and video—the need to automatically recognize and generated on a 1-gB ethernet network. Because the network classify images is becoming more urgent. For example, right now processing alone consumes so many CPu cycles, implementing it is difficult to find images with specific content unless someone the loomIng CrIsIs While some data parallel applications programming tools and providing are being written to address educational opportunities so that The growing gap between hardware the increasing role of media- current developers can more easily and software parallel processing rich applications and virtualized create parallel programs. PPl is a capabilities is creating a looming crisis in environments, developers have combined effort between top Stanford the computer industry. While hardware realized that writing these kinds of researchers in applications, languages, technology continues to advance in applications at a low level is very systems software, and computer the area of data parallel processing, the challenging—especially in the areas of architecture and leading companies parallel programming tools and skilled synchronization, communication, and in computer systems and software. developers needed to extract maximum scheduling. And for the moment, higher- In addition to developing parallel performance from multicore systems level tools and languages to support algorithms, development environments, are not readily available. According to data parallel development efforts do and runtime systems that scale Stanford university, by 2010 software not exist. Another complication is to 1000s of hardware threads, PPl developers will face systems with that the computer science education members are also working on making heterogeneous multicore processors system is not prepared to teach parallel parallel programming a core component and application specific accelerators computing skills. of undergraduate computer science that run hundreds of hardware threads education. This educational goal will and leverage deep memory hierarchies given this state of affairs, Stanford develop a new class of scientists and to create over one teraflop of computing university has created the Pervasive engineers who can close the current power. Parallelism lab (PPl). This project parallelism gap and avert the looming focuses on creating high-level crisis. 9 AMD White Paper: The Industry-Changing Impact of Accelerated Computing
  • 10. has manually tagged the images and added the content to a possible for them to put different capabilities (general- and searchable database. large image sites pay people to do this to specific-purpose processors, for example) on the same piece of provide a better customer experience, but the average person silicon, which increases processor utilization, enhances energy with thousands of digital pictures does not take the time to tag all efficiency, and assimilates capabilities that used to reside on of their photos for easy searching. separate chips. The same is true of video: the only way to find your favorite This integration passes on performance improvements to actor in a given movie and scene is to consult a source that applications that support media-rich experiences for commercial has viewed the movie and manually recorded the information. and consumer users. And in the data center, it supports the trend Automating this type of image and video indexing would be toward virtualization as a means of increasing resource utilization a welcome addition to modern computing, and putting the and reducing management costs. This entire trend focuses on capability in a $50 application would make commercial and extracting more processing power per unit (e.g., performance per consumer computer users alike very happy. dollar) from all areas of the computing system, which translates into excellent performance for emerging applications that require Of course, this type of software is computationally complex and more processing power. involves an incredible amount of processing. Applications that can process images and video, look for patterns, and create Accelerated computing also makes the x86 instruction set more metadata that represents what is contained in the images will affordable and applicable to many different uses by allowing require dedicated hardware accelerators to handle these tasks computer systems to split and balance workloads between without bogging down the entire computer system. CPus and accelerators. For example, CPus excel at coordinating and processing linear tasks, while gPus are designed to process Healthcare and Network Security large amounts of data in parallel. The more media-rich or data- The added emphasis on information privacy in the healthcare intense the workload, the more parallel processing is needed industry has prompted new levels of network security. Instead to provide a great computing experience, so the presence of of simply controlling who has access to healthcare information, dedicated gPus to improve performance is critical. Accelerated IT departments in hospitals, insurance companies, and other computing provides the framework for easily creating or healthcare institutions must conduct analyses of the information modifying applications so their workloads can be split into serial that comes into their networks. Common types of security and parallel tasks and then sent to the processor that can analysis include packet filtering, but more advanced techniques provide the best performance. like deep packet analysis, which looks beyond the packet header to analyze the payload for potential threats, take many The trend in modern computing toward visually rich content processor cycles. By offloading deep packet and other security spans all levels of computing because people tend to remember analyses to a specialized hardware accelerator, healthcare visual content better. high-end workstations are by nature IT departments could significantly increase application graphic-intense, and standard client computer systems in performance without reducing available system resources and commercial business are being called on to present more still meet energy efficiency and cost criteria. media-rich content, including presentations and data mining applications that present data visually. From a consumer computing perspective, we are experiencing a distinct shift from the CAse for ACCelerAted ComPutIng now text to multimedia content. And finally, current and emerging operating system user interfaces are becoming more graphically Accelerated computing is not some passing fancy that will rich. disappear as quickly as it appeared. quite the opposite: accelerated computing is a continuation of the current trend In full bloom, accelerated computing will create many toward integration at the level of silicon, consumer computers, opportunities to customize computer systems by providing and data centers. In terms of silicon integration, advancing targeted, precise solutions for different kinds of workloads. And process technology is letting vendors place increasingly it will create those opportunities while satisfying the need for complex designs on a single piece of silicon. This makes it application performance, energy efficiency, and low cost. Advanced Micro Devices © 2008 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, and combinations thereof are trademarks of One AMD Place Advanced MicroDevices, Inc. Other names are for informational Sunnyvale, CA 94088-3453 purposes only and may be trademarks of their respective owners. www.amd.com 46125B