EFFICIENT POWER MANAGEMENT TECHNIQUES SUCH AS SKIN TEMPERATURE AWARE POWER MANAGEMENT AND BATTERY BOOST FOR IMPROVED ENERGY EFFICIENCY [PERFORMANCE/WATT]
DEVELOPING PERFORMANCE ANALYSIS ENVIRONMENT BY REUSING EXISTING VERIFICATION ENVIRONMENT
HOLISTIC VIEW OF SOC VERIFICATION :
EVOLUTION OF UVM METHDOLOGY, UVM 1.2 AND CHALLENGES WITH MULTI LANGUAGE SUPPORT/AMS SUPPORT.
EDA INDUSTRY/TOOL CHALLENGES WITH HW-SW DEBUG, VP MODEL VERIFICATION.
H/W ASSISTED SIMULATION ACCELERATION, CHOOSING EMULATION CONFIGURATION FOR YOUR DESIGN.
OLA Conf 2002 - OLA in SoC Design Environment - paperTim55Ehrler
The integration of Open Library Architecture (OLA) libraries within nano-technology design environments can positively impact SoC design cycle times. Consistent calculation of desired information across a standard application programming interface (API) ensures analysis convergence among tools, eliminates data exchange processing and storage requirements, and significantly reduces iterations through design processes steps.
Bringing Engineering Analysis Codes Into Real-Time Full-Scope SimulatorsGSE Systems, Inc.
Presented at the 2013 Nuclear Simulation and Training China Forum in Beijing. For more information on GSE's real-time simulators and engineering capabilities, go to www.gses.com, follow GSE on Twitter @GSESystems and connect on Facebook.com/GSESystems
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
EFFICIENT POWER MANAGEMENT TECHNIQUES SUCH AS SKIN TEMPERATURE AWARE POWER MANAGEMENT AND BATTERY BOOST FOR IMPROVED ENERGY EFFICIENCY [PERFORMANCE/WATT]
DEVELOPING PERFORMANCE ANALYSIS ENVIRONMENT BY REUSING EXISTING VERIFICATION ENVIRONMENT
HOLISTIC VIEW OF SOC VERIFICATION :
EVOLUTION OF UVM METHDOLOGY, UVM 1.2 AND CHALLENGES WITH MULTI LANGUAGE SUPPORT/AMS SUPPORT.
EDA INDUSTRY/TOOL CHALLENGES WITH HW-SW DEBUG, VP MODEL VERIFICATION.
H/W ASSISTED SIMULATION ACCELERATION, CHOOSING EMULATION CONFIGURATION FOR YOUR DESIGN.
OLA Conf 2002 - OLA in SoC Design Environment - paperTim55Ehrler
The integration of Open Library Architecture (OLA) libraries within nano-technology design environments can positively impact SoC design cycle times. Consistent calculation of desired information across a standard application programming interface (API) ensures analysis convergence among tools, eliminates data exchange processing and storage requirements, and significantly reduces iterations through design processes steps.
Bringing Engineering Analysis Codes Into Real-Time Full-Scope SimulatorsGSE Systems, Inc.
Presented at the 2013 Nuclear Simulation and Training China Forum in Beijing. For more information on GSE's real-time simulators and engineering capabilities, go to www.gses.com, follow GSE on Twitter @GSESystems and connect on Facebook.com/GSESystems
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
This implemented DSP system utilizes TCP socket communication. Upon message reception, it decides the appropriate process to be executed based on cases which can be categorized as follows:
1) image capture
2) image transfer
3) image processing
4) sensor calibration
A user-friendly MATLAB GUI, named DIPeth, facilitates the system's control.
ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
Getting ready for the next generation of Intel Xeon Phi processors: we outline the steps to tune, profile and then optimize applications to target many core
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLinaro
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
Speakers: Juri Lelli
Date: September 26, 2016
★ Session Description ★
Walkthrough of the EAS kernel adaptation to the Android Common Kernel.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-105
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-105/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
- What we mean by EAS core and how it's distinct from the other components - also why it's so difficult to get it merged. (This is driven by key partner concerns).
- An update on misc work that's underway to resolve the upstreaming.
- Misc load balance pathway enhancements
- Wakeup pathway mods (cleanups, basic big.LITTLE capacity awareness etc)
- Periodic load balancer mods.
- Energy model expression (why this is important, partner perspectives/experience and bottlenecks)
- Proposals to get an expression into the mainline
- Optional boot-time auto-detection of capacity over-ridable by sysfs
- Leveraging the merged power coefficient bindings
- Leveraging the OPP bindings
.. to effectively get to EAS' struct sched_group_energy.
- How we are structuring things to ease upstream acceptance. What's helping, what's not, where partners can help.
LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro
LAS16-400: Mini Conference 3 AOSP (Session 1)
Speakers: Thomas Gall, Bernhard Rosenkränzer
Date: September 29, 2016
★ Session Description ★
The Android Open Source Project is one community which is strategic to Linaro and it’s members. The purpose of this mini conference is to gather fellow Android engineers together from the community, member companies, and Linaro to discuss engineering activities and improve collaboration across different groups.
Within this mini conference we encourage discussion and presentations to advance engineering topics, forge consensus and educate each other.
The tentative agenda for this mini conference includes :
- Quick introduction
- Filesystems - Between requirements for encryption and standing concerns about degrading performance as an Android file system age, let’s have some discussion involving current data, known issues and towards improvements in this area for Android.
- HAL consolidation - Review current status and discuss next steps to work on.
One build for many devices: device/build configuration. Next features and platforms to add. Gaps in HiKey support vs. AOSP build.
- Graphics - YUV support in mesa and hwc.
- WiFi and sensor HAL status and next steps
- New developments with AOSP + the Kernel - With regards to the Google Common Kernel tree and upstream Linux kernel activities related to Android, there are a few topics up for discussion:
- - Updates on HiKey in AOSP
- - EAS in common.git & integration with AOSP userspace
- - New Sync API in 4.6+ kernels, and how it will affects graphics drivers
- AOSP transition to clang - As everyone knows GCC in AOSP has been deprecated. Let’s cover current status, issues and next steps. Let’s also discuss the elephant in the room, building the kernel with clang.
- Out of tree AOSP User space Patches - This is a discussion with the goal of organized action to see forward progress on AOSP user space patches that aren’t in AOSP for whatever reason.
- Android is used in some environments where booting can be frequent and affect the product experience. Do you want to wait for a minute while your car boots? We’ll spend time brainstorming on improving Android boot time.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-400
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-400/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
this presentation is on the Paper "The Microarchitecure Of FPGA Based Soft Processor" by Peter Yiannacouras, Jonathan Rose and
J Gregory Steffan
Dept. of Electrical and Computer Engineering
University of Toronto
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
Learn about opportunities and challenges for accelerating big data middleware on modern high-performance computing (HPC) clusters by exploiting HPC technologies.
Implementation of Soft-core Processor on FPGADeepak Kumar
We can add a soft-core processor to a FPGA-based system after it's already designed. However, adding a hard-core processor requires either a different FPGA, or an additional chip on the board.
It’s surprisingly straightforward to migrate feature code from the CPU to the DSP – and determine the resulting benefits to the end application. In this session we’ll demonstrate Qualcomm® Hexagon™ SDK installation, code generation, profiling and execution of dynamic code modules on a Qualcomm® Snapdragon™ hardware target, and you’ll learn how to analyze the resulting performance benefits. Qualcomm Snapdragon and Qualcomm Hexagon are products of Qualcomm Technologies, Inc.
Learn more about Hexagon SDK: https://developer.qualcomm.com/hexagon
Watch this presentation on YouTube:
https://www.youtube.com/watch?v=x6mKEWLzJM0
Learn more about DirectGMA in this blog post: bit.ly/AMDDirectGMA
AMD has introduced Direct Graphics Memory Access in order to:
‒ Makes a portion of the GPU memory accessible to other devices
‒ Allows devices on the bus to write directly into this area of GPU memory
‒ Allows GPUs to write directly into the memory of remote devices on the bus supporting DirectGMA
‒ Provides a driver interface to allow 3rd party hardware vendors to support data exchange with an AMD GPU using DirectGMA
‒ and more
View the accompanying blog post here: bit.ly/AMDDirectGMA
Session ID: SFO17-307
Session Name: WALT vs PELT : Redux
- SFO17-307
Speaker: Pavan Kumar Kondeti
Track: LMG
★ Session Summary ★
New data on the comparison of the WALT and PELT load tracking schemes in the scheduler
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-307/
Presentation:
Video: https://www.youtube.com/watch?v=r3QKEYpyetU
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
This implemented DSP system utilizes TCP socket communication. Upon message reception, it decides the appropriate process to be executed based on cases which can be categorized as follows:
1) image capture
2) image transfer
3) image processing
4) sensor calibration
A user-friendly MATLAB GUI, named DIPeth, facilitates the system's control.
ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
Getting ready for the next generation of Intel Xeon Phi processors: we outline the steps to tune, profile and then optimize applications to target many core
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLinaro
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
Speakers: Juri Lelli
Date: September 26, 2016
★ Session Description ★
Walkthrough of the EAS kernel adaptation to the Android Common Kernel.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-105
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-105/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
- What we mean by EAS core and how it's distinct from the other components - also why it's so difficult to get it merged. (This is driven by key partner concerns).
- An update on misc work that's underway to resolve the upstreaming.
- Misc load balance pathway enhancements
- Wakeup pathway mods (cleanups, basic big.LITTLE capacity awareness etc)
- Periodic load balancer mods.
- Energy model expression (why this is important, partner perspectives/experience and bottlenecks)
- Proposals to get an expression into the mainline
- Optional boot-time auto-detection of capacity over-ridable by sysfs
- Leveraging the merged power coefficient bindings
- Leveraging the OPP bindings
.. to effectively get to EAS' struct sched_group_energy.
- How we are structuring things to ease upstream acceptance. What's helping, what's not, where partners can help.
LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro
LAS16-400: Mini Conference 3 AOSP (Session 1)
Speakers: Thomas Gall, Bernhard Rosenkränzer
Date: September 29, 2016
★ Session Description ★
The Android Open Source Project is one community which is strategic to Linaro and it’s members. The purpose of this mini conference is to gather fellow Android engineers together from the community, member companies, and Linaro to discuss engineering activities and improve collaboration across different groups.
Within this mini conference we encourage discussion and presentations to advance engineering topics, forge consensus and educate each other.
The tentative agenda for this mini conference includes :
- Quick introduction
- Filesystems - Between requirements for encryption and standing concerns about degrading performance as an Android file system age, let’s have some discussion involving current data, known issues and towards improvements in this area for Android.
- HAL consolidation - Review current status and discuss next steps to work on.
One build for many devices: device/build configuration. Next features and platforms to add. Gaps in HiKey support vs. AOSP build.
- Graphics - YUV support in mesa and hwc.
- WiFi and sensor HAL status and next steps
- New developments with AOSP + the Kernel - With regards to the Google Common Kernel tree and upstream Linux kernel activities related to Android, there are a few topics up for discussion:
- - Updates on HiKey in AOSP
- - EAS in common.git & integration with AOSP userspace
- - New Sync API in 4.6+ kernels, and how it will affects graphics drivers
- AOSP transition to clang - As everyone knows GCC in AOSP has been deprecated. Let’s cover current status, issues and next steps. Let’s also discuss the elephant in the room, building the kernel with clang.
- Out of tree AOSP User space Patches - This is a discussion with the goal of organized action to see forward progress on AOSP user space patches that aren’t in AOSP for whatever reason.
- Android is used in some environments where booting can be frequent and affect the product experience. Do you want to wait for a minute while your car boots? We’ll spend time brainstorming on improving Android boot time.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-400
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-400/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
this presentation is on the Paper "The Microarchitecure Of FPGA Based Soft Processor" by Peter Yiannacouras, Jonathan Rose and
J Gregory Steffan
Dept. of Electrical and Computer Engineering
University of Toronto
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
Learn about opportunities and challenges for accelerating big data middleware on modern high-performance computing (HPC) clusters by exploiting HPC technologies.
Implementation of Soft-core Processor on FPGADeepak Kumar
We can add a soft-core processor to a FPGA-based system after it's already designed. However, adding a hard-core processor requires either a different FPGA, or an additional chip on the board.
It’s surprisingly straightforward to migrate feature code from the CPU to the DSP – and determine the resulting benefits to the end application. In this session we’ll demonstrate Qualcomm® Hexagon™ SDK installation, code generation, profiling and execution of dynamic code modules on a Qualcomm® Snapdragon™ hardware target, and you’ll learn how to analyze the resulting performance benefits. Qualcomm Snapdragon and Qualcomm Hexagon are products of Qualcomm Technologies, Inc.
Learn more about Hexagon SDK: https://developer.qualcomm.com/hexagon
Watch this presentation on YouTube:
https://www.youtube.com/watch?v=x6mKEWLzJM0
Learn more about DirectGMA in this blog post: bit.ly/AMDDirectGMA
AMD has introduced Direct Graphics Memory Access in order to:
‒ Makes a portion of the GPU memory accessible to other devices
‒ Allows devices on the bus to write directly into this area of GPU memory
‒ Allows GPUs to write directly into the memory of remote devices on the bus supporting DirectGMA
‒ Provides a driver interface to allow 3rd party hardware vendors to support data exchange with an AMD GPU using DirectGMA
‒ and more
View the accompanying blog post here: bit.ly/AMDDirectGMA
Session ID: SFO17-307
Session Name: WALT vs PELT : Redux
- SFO17-307
Speaker: Pavan Kumar Kondeti
Track: LMG
★ Session Summary ★
New data on the comparison of the WALT and PELT load tracking schemes in the scheduler
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-307/
Presentation:
Video: https://www.youtube.com/watch?v=r3QKEYpyetU
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Functional verification is one of the key bottlenecks in the rapid design of integrated circuits. It is estimated that verification in its entirety accounts for up to 60% of design resources, including duration, computer resources and total personnel. The three primary tools used in logic and functional verification of commercial integrated circuits are simulation (at various levels), emulation at the chip level, and formal verification.
UVM is a standardized methodology for verifying complex IP and SOC in the semiconductor industry. UVM is an Accellera standard and developed with support from multiple vendors Aldec, Cadence, Mentor, and Synopsys. UVM 1.0 was released on 28 Feb 2011 which is widely accepted by verification Engineer across the world. UVM has evolved and undergone a series of minor releases, which introduced new features.
UVM provides the standard structure for creating test-bench and UVCs. The following features are provided by UVM
• Separation of tests from test bench
• Transaction-level communication (TLM)
• Sequences
• Factory and configuration
• Message reporting
• End-of-test mechanism
• Register layer
One of the biggest issues for a developer – whether they are an engineer at an OEM or working for a mobile AI application startup – is that their apps are at the mercy of pre-set power and performance settings as defined by OEMs or Silicon vendors. So how can a developer break through that barrier when it seems their hands are tied behind their backs? The Snapdragon Power Optimization SDK allows developers to control the CPU and GPU frequency much more finely from their own application logic. This provides developers with more control within the bounds of the power/thermal framework.
This talk was given at GTC16 by James Beyer and Jeff Larkin, both members of the OpenACC and OpenMP committees. It's intended to be an unbiased discussion of the differences between the two languages and the tradeoffs to each approach.
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
In the design of electronics and semiconductors, challenges are compounded by the integration of AI, multi-core, real-time software, network, connectivity, diagnostics, and security. Performance limits, battery life, and cost are adoption barriers. It is extremely important to have tools and processes that deliver efficiency throughout the design cycle.
Continuous verification from planning to development addresses the multi-discipline needs of hardware, software, and networks. This unique approach accelerates the design phase, defines the test efforts, and finds defects during specification. Architecture modeling is required to meet timing deadlines, generate the lowest power consumption, and attain the highest Quality-of-Service. optimize the electronic design system and designing of custom components.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
Session Presented at 5th IndicThreads.com Conference On Java held on 10-11 December 2010 in Pune, India
WEB: http://J10.IndicThreads.com
------------
Enterprise applications typically comprise of multi layered stacks including the application modules, application servers, the Java Virtual Machine and the underlying Operating System. Consequently the performance of these applications are a factor of these different layers. In the eventuality of a performance problem, it is often difficult to determine the starting point for diagnosis. The Java Virtual Machine is the ‘engine’ for most of the applications. It is responsible broadly for efficient execution and memory management of applications. End users have difficulty attributing the effect of the JVM on the performance of the application. This is because usually JVM is viewed as a ‘black box’.
This talk provides an insight into the key subsystems of the JVM by looking under the hood of a high performance JVM. It ventures onto talk about approaches and techniques for analyzing performance issues. It concludes by introducing the audience to a tool called the “Health Center” which is useful for evaluating and comprehending the JVM behavior of a running application in an unobtrusive, lightweight manner.
Takeaways for the Audience A better understanding of key JVM components, approaches and techniques to diagnose performance issues and performance evaluation using the Health Center
This project deals with the warehouse scale computers that power all the internet services which we use today. The project covers the hardware blocks used in a Google WSC. Also, the project deals with the architecture of hardware accelerators such as the Graphical Processing Unit and the Tensor Processing Unit, which is highly useful for the warehouse scale machines to run heavy tasks and also to support application-specific machine learning and deep learning tasks. Also, the project explains about the energy efficiency of the processors used by the Google WSC to achieve high performance. The project also tries to explain about performance enhancement mechanism used by Google WSC.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
Affect of parallel computing on multicore processorscsandit
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make
number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of
Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the
various techniques there followed in, for improving the performance of the Multicore
Processors.
We conducted cluster experiments to find this limit. In this paper we propose an alternate design
of Multicore processor based on the results of our cluster experiment.
An Approach to Overcome Modeling Inaccuracies for Performance Simulation Sig...Pankaj Singh
RNM is finding prominence in functional verification signoff, However there is clear modeling gap when it comes to performance simulation of high-speed SerDes. Sometimes the pre-silicon simulation results show passing results with respect to Jitter tolerance (JTOL) specification which may not match the actual silicon validation results. These performance issues manifest due to inaccuracies of model where it may not comprehend the actual circuit behavior. There is no clear methodology to overcome these model gaps for performance simulation signoff.
This paper discusses in detail the techniques used to accurately model and verify high-speed SerDes systems for performance simulation.
Overcoming challenges of_verifying complex mixed signal designsPankaj Singh
Efficient and Innovative Digital Mixed-Signal (DMS) verification methodology is required to enable effective verification of RX path of SERDES. This presentation describes the usage of Real value models and Capture -Verify approach to verify complex high speed mixed signal design.
Real value models are the backbone of DMS methodology. Real value models are created for all critical modules in Receive path like Equalizer and Sampler and its associated peripheral modules. It is critical to make sure created models are functionally equivalent to respective designs. This is achieved by verifying each created model with respective designs for all functional modes. While the Real Value models are effective in meeting overcoming the simulation performance bottleneck by achieving 10x faster simulation time; the Nonlinearity factors of the front-end design are not represented accurately in discrete domain real value models for next generation of SerDes Design at very high data rate.
To overcome this problem, a novel approach called ‘capture and verify’ is used for verifying the jitter tolerance and eye parameters. In this approach, waveforms from spice level verification of Equalizer for different functional modes are captured and stored. These stored waveforms are used to generate run time table-based models to accurately represent the analog modules. These run time models are used in top-level simulations along with real value models thereby achieving required goal of simulation performance without compromising on accuracy of results.
The complete Design Verification (DV) environment is developed using UVM-e Methodology. Verification environment contains model for transmitter with all de-emphasis settings along with protocol compliant channels with multiple attenuations. DV infrastructure has hooks to plug-in required channel models to verify SERDES. This verification environment is also capable of verifying the clock data recovery (CDR) path of the design using protocol compliant jitter and Spread-Spectrum Clocking (SSC) stimulus.
The real value modelling bridges the gap between the performance requirements of the simulation and accuracy limitations of design. A significant speed-up in simulation performance is achieved (almost 10X in this case) by replacing with functionally equivalent real value models for mixed signal designs. Usage of Capture and Verify methodology with spice simulation waveforms for critical blocks ensures non-linearity of the next generation high speed SerDes design is well captured in simulations provide complete comprehensive solution for high speed mixed signal designs.
Qualifying a high performance memory subsysten for Functional SafetyPankaj Singh
Addressing the Challenges of Safety verification for LPDDR4.
✓Avoid traditional approach of starting functional safety after functional verification : Iterative and expensive development phase
1. Functional Safety Need to be Architected and not added later.
2. Safety Analysis must start prior to implementation. ‘Design for safety/verification’
3. Reuse & Synergize : Nominal and Functional Safety Verification.
✓Fault optimization with formal and other techniques is necessary to overcome challenges with scaling simulation and analysis.
✓Integrated push button fault simulation flow is need of hour and saves verification engineers time.
✓Analog defect modelling and coverage can be performed based on IEEE P2427.
Safety Verification and Software aspects of Automotive SoCPankaj Singh
IP-SoC Conference 2017 Grenoble
Automotive industry has evolved over last 100 years. Electronic systems were
introduced into the automotive industry in 1960. Since then the complexity has grown
many fold and today’s automobiles have as many as 150 programmable computing
elements or Electronic Control Units(ECUs) with several wiring connections.
The software content has also increased significantly with today’s car having more than
100 million of lines of software code.
This increased hardware and software complexity increases the risk of failure that could
impact negatively on vehicle safety. This has led to concerns regarding the validation of
failure modes and the detection mechanisms. Car maker and suppliers need to prove
that, despite increasing complexity, their electronic systems will deliver the required
functionality safely and reliably.
This presentation describes the challenges and methodology related to Safety
verification and Software development aspects of Automotive Microcontroller SoC.
1.Car Security
Understanding the Car Onboard Communication / Connection and inherent Security Weakness
2.Addressing the Security Concerns : System’s Viewpoint
Hardware Security Module & Secure Hardware Extension
Look at Software Principle of MAC and Associated Hardware
3.Achieving Security implementation checks via Software and Addressing the Hardware Safety aspect.
Closing the Loop for Security Safeness: Complete Solution to Ensure Security/Safety Compliance with Software
Panel:The secret of Indian leadership in Electronic Design skill... From Desi...Pankaj Singh
Panel Discussion: D&R IP-SoC, Bangalore 2015. Topic:The secret of Indian leadership in Electronic Design skill... From Design to Services to Embedded Software
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPankaj Singh
This paper introduces efficient test logic partitioning to not only optimize and reduce the overall test power during silicon validation but also reduce power in functional mode by shutting off test logic. Approach used in optimizing test power has been successful in reducing overall functional mode leakage power by 50% without any additional area overhead or test time increase. Results shared are based on WIMAX full chip SoC design.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Securing your Kubernetes cluster_ a step-by-step guide to success !
AMD_11th_Intl_SoC_Conf_UCI_Irvine
1. Platform Coherency and SoC Verification Challenges
PANKAJ SINGH, CHETHAN-RAJ M , PRAKASH RAGHAVENDRA, ANINDYASUNDAR NANDI, DIBYENDU DAS AND TONY TYE
THE 11TH INTERNATIONAL SYSTEM-ON-CHIP (SOC) CONFERENCE, EXHIBIT, AND WORKSHOPS, OCTOBER 2013, IRVINE, CALIFORNIA
WWW.SOCCONFERENCE.COM
ACKNOWLEDGEMENTS:
PHIL ROGERS AMD CORPORATE FELLOW , ROY JU & BEN SANDER SR FELLOW
NARENDRA KAMAT, PRAVEEN DONGARA AND LEE HOWES
2. TODAY’S TOPICS
A New Parallel Computing Platform
– Heterogeneous System Architecture
Opportunities, Benefits and Feature Roadmap
Kaveri Platform Coherency
Shared memory, Platform atomics
Kaveri Verification Approach
SoC Verification Challenges and Solutions
1
HSA
2
2
KAVERI
PLATFORM
COHERENCY
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
3
4
SoC
KAVERI
VERIFICATION VERIFICATION
3. A New Parallel Computing
Platform – Heterogeneous
System Architecture (HSA)
1
2
HSA
KAVERI
PLATFORM
COHERENCY
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
3
4
SoC
KAVERI
VERIFICATION VERIFICATION
4. APU: ACCELERATED PROCESSING UNIT
The APU is a great advance compared
to previous platforms
CPU pair
Combines scalar processing on CPU
with parallel processing on the GPU and
high-bandwidth access to memory
Challenge: How do we make it even better going forward?
Easier to program
Easier to optimize
Easier to load balance
Higher performance
Lower power
4
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
GPU SIMD
5. THE HSA OPPORTUNITY ON MODERN APPLICATIONS
PROBLEM
SOLUTION
HSA + Libraries =
productivity & performance with low power
Developer
Return
Few M
HSA
coders
(Differentiation in
performance,
reduced power,
features,
time to market)
Few 100Ks
HSA
apps
GPU/HW blocks hard to program
Not all workloads accelerate
Wide range of
differentiated
experiences
PROBLEM
Historically, developers program CPUs
~20+M*
CPU
coders
~4M
apps
Good user
experiences
Developer Investment
(Effort, time, new skills)
*IDC
5
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Tens of Ks
GPU
coders
Few
hundred
apps
Significant
niche
value
6. HSA AND ITS BENEFITS
HSA IS A COMPUTING PLATFORM THAT DRIVES NEW CLASS OF APPLICATIONS
App-Accelerated
Software Applications
Graphics Workloads
Data-Parallel Workloads
Serial and Task-Parallel Workloads
HSA is an enabler of APU’s higher performance and power efficiency
Our industry-leading APUs speed up applications beyond graphics
CPU and GPU (APUs) work cooperatively together directly in system memory
Makes programming the APU as easy as C++
Improves Performance per watt
6
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Ref [1]
7. HSA EFFICIENCY IMPROVEMENT (AN EXAMPLE)
Improves Power and Performance: Move application from CPU to GPU, remove data copies,
and reduce launch time
35 W
Measured Power
25 fps
20 fps
30 W
25 W
DRAM
NB+GPU
DRAM
15 fps
20 W
NB+GPU
15 W
10 W
Measured Perf
10 fps
CPU Cores
CPU Cores
5W
5 fps
CPU+GPU
0 fps
0W
CPU
CPU
Simulate removing memory copies:
1.32 X
CPU+GPU
1.11 * 2.88 * 1.32 = 4.22 X Better Energy Efficiency
Easier to Program + Remove Copies
ENERGY COMPUTATION BREAKDOWN: MOTIONDSP 720P VIDEO CLEAN-UP
7
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Ref [1]
8. HETEROGENEOUS SYSTEM ARCHITECTURE FEATURE ROADMAP
Physical
Integration
Optimized
Platforms
Integrate CPU & GPU
in silicon
Architectural
Integration
System
Integration
Unified Address Space
for CPU and GPU
GPU compute
context switch
Unified Memory
Controller
User Mode Schedulng
GPU uses pageable
system memory via
CPU pointers
GPU graphics
pre-emption
Common
Manufacturing
Technology
8
GPU Compute C++
support
Bi-Directional Power
Mgmt between CPU
and GPU
Fully coherent memory
between CPU & GPU
Quality of Service
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
10. KAVERI SOC – ENABLING SHARED MEMORY AND PLATFORM
ATOMICS
Shared memory accesses between the CPU and
GPU happens via ‘system memory’.
– Corresponds to the notion of shared virtual memory
(SVM) in OpenCL 2.0, available via clSVMalloc()
call. With SVM, CPUs and GPUs can share an
address space and share the pointer to the same
memory location.
– The compiler supports SVM and atomics calls that
work across the CPU-GPU boundary.
– System-memory accesses may go one of three
paths
If coherence with CPU is not required:
GARLIC path
If kernel-granularity coherence with CPU is
required: ONION bus path
If instruction-granularity coherence with CPU
is required: Bypass L2 via ONION+ bus (required
by atomics)
10
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
11. CONCURRENT STACK PUSH USING ATOMIC COMPARE-ANDEXCHANGE (AN EXAMPLE)
Each CPU thread and each GPU workitem execute the following code concurrently:
The code shows an example implementation of a concurrent stack’s “push” operation.
The “compare_exchange_strong” is an atomic call that ensures only one of the CPU/GPU
thread/workitem succeeds in updating the “head” pointer of the stack stored in list[0]
do {
head = list[0]; //redundant because the atomic call updates head on failure
list[i] = head;
} while (!atomic_compare_exchange_strong(&list[0], &head,i));
0
3
0
2
1
1
2
2
3
3
5
3
5
4
4
5
-1
…
5
i=2 and i=4 contest for ACE
(List: 3 (head)->5->-1)
99
Time Instant
Workitem i=2
…
99
List after i=2 wins!
(List: 2 (head)->3->5->-1)
Workitem i=4
Before ACE
head=3, list[2]=3 head=3,list[4]=3
ACE
Wins!
After ACE completes list[0]=2
11
-1
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Loses and goes back & retries
list[0]=2
12. IMPLEMENTING PLATFORM ATOMICS FOR KAVERI
The compiler has implemented these atomics (per OpenCL 2.0 standards) for Kaveri.
The key issue in implementing these atomics is to make sure that both CPU and GPU see
the shared memory in “coherent” state.
The coherency is implemented using the ONION+ memory path and using the GPU ISA
instructions, which can invalidate/bypass L1/L2 caches selectively from the GPU side and
snoop to invalidate the CPU caches. This support is provided in the KV SOC.
For example: atomic_load with acquire semantics generates code on the GPU side as
shown (in Kaveri L2 is always bypassed for coherent access). Similarly, atomic_store with
release semantics generates the GPU ISA given later.
1. load with glc=1
2. S_waitcnt 0
3. buffer_wbinv_vol
// bypass the L1 cache
1. s_waitcnt 0
2. store with glc=0
// wait for any previous memop to complete
// L1 is a write-through cache, so write onto
memory as L2 is bypassed
// prevent any following memop to move up
3. s_waitcnt 0;
// wait for the load to complete
// invalidate L1 so that any following load reads from memory
OpenCL 2.0 and C11 atomics support various kinds of memory_scope & memory_ordering
12
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
14. TRADITIONAL VERIFICATION AND SOC CHALLENGE
CPU
NorthBridge
DRAM
Model
Graphics
model
GFX
SouthBridge
BFM
CPU-BASED VERIFICATION
Assembly based input
Memory image of x86 machine code is
preloaded into DRAM model
CPU fetches instructions from DRAM
and executes them
GPU-BASED VERIFICATION
Higher language (C/C++)
BFM model used across PCIe-based
interface to inject data
GPU sends requests to DRAM over 2
paths: coherent and non-coherent
SoC Verification Challenge
Layer of complexity due to HSA coherency environment.
SoC GPU needs to be programmed, which requires host
SoC CPU can be used the host. However, running the same host software stack results
in huge simulation time
One approach is Mailbox:
Inefficient due to lack of CPU-GPU interaction, longer run time.
GPU-focused verification not suitable for CPU-GPU interaction (HSA)
14
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
15. SOC VERIFICATION METHODOLOGY: TEST FLOW
GPU Test
Test (Open
CL)
CPU Test
One Thread
[ Driver
CPU]
Running driver code on simulated CPU is
impossible due to simulation run-times.
Intent Capture is a mechanism to allow existing
discrete GPU graphics tests to execute on the CPU
in a Heterogeneous APU simulation.
Intent Capture
Capture
Other
Threads
sp3 shader
Output
Replay()
CX
Shell
.sim
memory
image
APU RTL
Sim
Test
Output
Runs
The memory accesses and configuration writes from the test are extracted into C function calls
Intent Capture performs this activity and encapsulates the GPU test into a function called Replay.
On CPU side, one thread runs Replay function while other threads execute the CPU side of the test.
Composite test (CPU test + generated FusionReplay function) is compiled using cxshell to generate a .sim
memory image
15
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Ref [4]
16. POWER MANAGEMENT: BAPM
Multiple
Boost
Pstates
Pb0
...
Core cores
Pwr @ Pbase
Core
Pwr
Core
Pwr
Rest
of
APU
Pwr
Die Temp
APU Pwr
Pbx
Rest
of
APU
Pwr
App1 with
Rest
of
APU
Pwr
App2 with
Low CAC Allcores active
SWP0
P1
SWP1
…
…
HW
View
App3 with
High CAC
Med CAC HalfAll-cores active cores active
P0/Pbase
SW/OS
View
ILLUSTRATION WITH
CPU-CENTRIC SCENARIO
Ref[2]
CPU Core1
CPU Core2
Compute
Unit
Power
Monitor
calculates
CPU
Power
If Temp > Limit, reduce power allocation
Firmware
converts power
into
temperature
estimates
Compare
Temperature to
Limit & adjust
Voltage/Frequency
GPU
Power
Monitor
calculates
GPU
Power
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
GPU Core2
If Temp < Limit, increase power allocation
In a multi-core design, apps running on CPU/GPU cores may consume less power
Power-efficient algorithms exploit this power headroom for performance
The GPU can borrow power credit from the CPU in GPU-centric scenarios and vice versa
16
GPU Core1
17. BAPM VERIFICATION APPROACH @ SOC
•
CPU Core1
CPU
Power
Monitor
CPU Core2
CPU
Power
Monitor
•
•
NB
CAC
Manager
•
•
SMU F/W
GPU Core1
GPU
Power
Monitor
•
GPU Core2
GPU
Power
Monitor
•
•
Developed high and low power consuming CPU
patterns based on micro-architecture and power
analysis.
Interleaved high and low power patterns in random
stimulus
Used an Irritator to manipulate the credits sent to
CAC manager at times to hit corner cases like
back-to-back boost/throttle
Modeled F/W algorithm using a simple BFM
Added CSR framework to drive read/write to CAC
manager
A very few sanity tests run with real f/w loaded
through backdoor to check the end-to-end flow.
Used irritators to model GPU power credit
CPU-centric
reporting instead of running GPU applications.
GPU power monitor verified at GPU IP level
Efficient Coverage-driven random verification
CPU boosted because of GPU giving away credits and vice versa
Crosses of CPU/GPU events and effect on BAPM
17
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Multiple
Boost
Pstates
19. TEST STIMULUS REUSE AND PORTING TO SOC
Tool and flow differences/set-up across IP and SOC, make stimulus reuse difficult.
Using functional model to simulate IP[RTL] in SoC scenario
for IP test development and easy porting to SoC
cMemory
Memory Model
Test setup update @ IP level to support test run with SOC
as a new target
Export suite, test key
MPMM
MEMIO
Memory
Model
IP2SoC
script
UNB Perf options
CPU to GPU access
GPU C
Model
CPU C
Model/RTL
Bus Unit
A simple HSA SOC test with 1 Rd-WR in RTL takes about 18
hours whereas it is <1 hour on the Heterogeneous C model
Intent Capture and Playback methodology
DV Test
GPU C
Model
Test
Output
Common test options
reports
sim output
run_job command-line options: directories
GNB,XNB,UNB
Goal: Improve Quality, Reduce development time
19
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Run/Execute
Regression
NB/DCT prog. options
Test
Output
APU
Create job spec
[ip2soc –merge]
Test setup update such as configuration changes, test stimulus
defines allowed IP test to be reused.
Capture
Output
Replay
Capture
Output
Memory config
Perf_options.yml
20. HW-SW INTERACTION: MODELING AND ABSTRACTION
HW-SW INTERACTION : MODELING & ABSTRACTION
Complex and evolving logic moving from hardware to firmware for better controllability. Challenges:
Firmware algorithms are compute-intensive and often developed late in design cycle.
Additional challenge to Verification in terms of load and execute time of the software.
Connected Standby Verification Approach
Model the relevant section of the software using BFM with proper interface to the hardware
Add sufficient controllability to stress different paths of the BFM model - find coverage
Adaptive stimulus based on coverage of the BFM/state-machine
Goals: Improve Quality, Reduce development time
20
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
21. ADAPTIVE STIMULUS
Typically, power management transitions kick off after active code execution stops. This results in deeper
corner cases associated with thread-level coordination in multi-core design.
Predicting occurrences of deeper phases and targeting those by code/stimulus is difficult.
Define the power management modes as state machines - each state having granular phases including
thread specific information.
Dynamic irritator monitors these state transitions, inserts random/directed asynchronous events (like different
sorts of interrupts, probes, warmreset) and updates a scoreboard.
Events are generated very close to the relevant points - provides great controllability.
Dynamic irritator adapts based on scoreboard statistics - eventually putting more weightage to the less
frequently covered <state> X <event> buckets.
Goals: Improve Quality, Reduce development time
21
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
22. CONSTRAINT RANDOM STIMULUS AND RANDOMIZATION AT SOC
Random Initial
States
S11
S21
St
St
S23
St
St
Ref[3]
Complex SoC requires Randomization at different levels
SOC Constraints
IP Constraints
Register
Fuse
Modes: LFBR,
BfD,long_init/
unfused test
Run
Build
Randomization
utility
Package
level info
RandomConfig
executable
Time t=0 [config values
Import value after reset
CMD line
options
Goals: Improve Quality, Reduce development time
22
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
23. OVERCOMING LIMITATIONS OF GATE-LEVEL SIMULATION
Challenges with Netlist simulation :
Longer run-times
Longer debug times
Approach to minimize runtime: Compute intensive RTL and associated verification components must be
replaced with a less intensive test-vector applicator : Apply test vectors directly from FSDB file.
Create Gatesim files
(gatesim.v,forces.v )
Run RTL
simulation,get FSDB
Build w Netlist + Gatesim
files + TB to drive stimulus
from FSDB
10x runtime optimization over traditional approach.
Approach to minimize Debug effort: Verdi NPI based Methodology to automate Debug:
Ref [5]
23
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
Goals: Improve Quality, Reduce development time
Run Netlist
sims(with
FSDB dump)
25. REFERENCES
[1] A New Parallel Computing Platform – HSA, CTHPC 2013 Keynote
Speech, Roy Ju, AMD Senior Fellow
[2] AMD APUs :Dynamic Power Management Techniques, DAC 2013.
Praveen Dongara, System Architect
[3] Wilson Research Group-MGC 2013.
[4] Kaveri DTP. Internal Document.
[5] Innovative Approach to Overcome Limitations of Netlist Simulation,
SUNG 2013. Prodip K, Pankaj S,Meera M, Narendran K
25
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
26. GLOSSARY
GPU – Graphics processing unit
APU -- Accelerated Processing Unit
Open CL™ -- Open Computing Language
TDP – Thermal Design power – a measure of a design infrastructure’s ability to cool a device
AMD Turbo Core Technology – AMD boost mechanism
BIAPM -- Bi-directional Application Power Management.
Cac -- Capacitance AC switching, measures switching activity of a cluster
TDP -- Thermal Design Power, represents the average thermal dissipation power required to cool the design
Pstate -- Processor performance state
GARLIC -- Graphic Accelerated Reduced Latency Integrated Channel
ONION -- On-chip Northbridge to I/O Noncoherent bus
FSDB – Fast Signal Database
26
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
28. DYNAMIC FINE-GRAINED POWER TRANSFERS
The dynamically calculated temperature of
each core and the GPU enables the
operating point of each to be dynamically
balanced in-order to maximize
performance within temperature limits.
Low activity in one core enables it to be a
thermal sink for a more active core
100.0
100.0
95.0
95.0
90.0
95.0
90.0
85.0
90.0
85.0
80.0
85.0
80.0
75.0
80.0
75.0
GPU-centric
28
100.0
| 11th Intl. SoC Conference| Oct 23rd,24th, 2013
75.0
Balanced
Ref [2]
CPU-centric