The document discusses OpenCL for accelerating FPGA designs. It provides an overview of technology trends favoring parallelism and programmability. OpenCL is presented as a solution to bring FPGA design closer to software development by providing a standard programming model and faster compilation. The document describes how OpenCL maps to FPGAs by compiling kernels to hardware pipelines and discusses examples accelerated using OpenCL on FPGAs, including AES encryption, option pricing, document filtering, and video compression.
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyTELKOMNIKA JOURNAL
LDPC code and digital image watermarking technology, which is an effective method of digital copyright protection and information security, has been widely used. But this is a multi-disciplinary, multi technology application scheme. In order to realize FPGA design of LDPC decoder in the application scheme, an effective implementation method of digital watermarking application system must be found. In this paper, MATLAB software and Qt development environment are combined to achieve the digital watermarking application software design. It could get real-time input data for the LDPC decoder. Then the hardware of the LDPC decoder is primarily implemented by FPGA in the digital image watermarking system. And the serial port is used to make the output data of the decoder back to computer for verification. Through the simulation results, the Modelsim time simulation diagram is given, and the watermark image compared with the original image is got. The results show that the resource usage of our system is few, and the decoding rate is fast. It has a certain practical value.
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
Implementation of Soft-core processor(PicoBlaze) on FPGA using Xilinx.
Establishing communication between two PicoBlaze processors.
Creating an application using the multi-core processor.
This presentation contains the basics of FPGA design, what are HDL [hardware description languages], how VHDL design works on FPGA, what are the high tech applications, FPGA R & D opportunities, latest FPGA tools, resources and the National Activities on FPGA happened at Nepal.
A review of the history of digital design throughout the years until the era of programmable logic, and a detailed exploration of the architecture of FPGA chips, followed by an introduction to SoC FPGAs and some of their benefits.
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyTELKOMNIKA JOURNAL
LDPC code and digital image watermarking technology, which is an effective method of digital copyright protection and information security, has been widely used. But this is a multi-disciplinary, multi technology application scheme. In order to realize FPGA design of LDPC decoder in the application scheme, an effective implementation method of digital watermarking application system must be found. In this paper, MATLAB software and Qt development environment are combined to achieve the digital watermarking application software design. It could get real-time input data for the LDPC decoder. Then the hardware of the LDPC decoder is primarily implemented by FPGA in the digital image watermarking system. And the serial port is used to make the output data of the decoder back to computer for verification. Through the simulation results, the Modelsim time simulation diagram is given, and the watermark image compared with the original image is got. The results show that the resource usage of our system is few, and the decoding rate is fast. It has a certain practical value.
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
Implementation of Soft-core processor(PicoBlaze) on FPGA using Xilinx.
Establishing communication between two PicoBlaze processors.
Creating an application using the multi-core processor.
This presentation contains the basics of FPGA design, what are HDL [hardware description languages], how VHDL design works on FPGA, what are the high tech applications, FPGA R & D opportunities, latest FPGA tools, resources and the National Activities on FPGA happened at Nepal.
A review of the history of digital design throughout the years until the era of programmable logic, and a detailed exploration of the architecture of FPGA chips, followed by an introduction to SoC FPGAs and some of their benefits.
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
LCU14 310- Cisco ODP
---------------------------------------------------
Speaker: Robbie King
Date: September 17, 2014
---------------------------------------------------
★ Session Summary ★
Cisco to present their experience using ODP to provide portable accelerated access to crypto functions on various SoCs.
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137757
Google Event: https://plus.google.com/u/0/events/ckmld1hll5jjijq11frbqmptet8
Video: https://www.youtube.com/watch?v=eFlTmslVK-Y&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-310
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Introducing Container Technology to TSUBAME3.0 SupercomputerAkihiro Nomura
Invited Talk in ISC High Performance 2019 Focus Session "Containers for Acceleration and Accessibility in HPC and Cloud Ecosystems" https://2019.isc-program.com/presentation/?id=inv_sp183&sess=sess177
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
This presentation is about OpenDataPlane project: an open-source, cross-platform set of application programming interfaces (APIs) for the networking data plane. This set of slides describes the project status and basic design principles, as well as contains an overview of the main API groups.
The presentation was delivered by Taras Kondratiuk (Software Engineer, Cisco) at GlobalLogic Embedded TechTalk Kyiv on July 22, 2015.
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
Migrating an Oracle database to Postgres is never an automated operation. And it rarely (never?) involve just the database. Experience brought us to develop an agile methodology for the migration process, involving schema migration, data import, migration of procedures and queries up to the generation of unit tests for QA.
Pitfalls, technologies and main migration opportunities will be outlined, focusing on the reduction of total costs of ownership and management of a database solution in the middle-long term (without reducing quality and business continuity requirements).
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
Intel s'intéresse tout particulièrement aux FPGA et notamment au potentiel qu'ils apportent lorsque les ISV et développeurs ont des besoins très spécifiques en Génomique, traitement d'images, traitement de bases de données, et même dans le Cloud. Dans ce document vous aurez l'occasion d'en savoir plus sur notre stratégie, et sur un programme de recherche lancé par Intel et Altera impliquant des Xeon E5 équipés... de FPGA
Intel is looking at FPGA and what they bring to ISVs and developers and their very specific needs in genomics, image processing, databases, and even in the cloud. In this document you will have the opportunity to learn more about our strategy, and a research program initiated by Intel and Altera involving Xeon E5 with... FPGA inside.
Auteur(s)/Author(s):
P. K. Gupta, Director of Cloud Platform Technology, Intel Corporation
Topics of this presentation:
- Basics and best practices of developing single-page applications (SPA) and Web API Services on Microsoft .NET -
- Core with Docker and Linux.
- PowerShell Core automated builds.
- Markdown/PDF documentation.
- Documentation of public interfaces with Swagger/OAS/YAML.
- Automated testing of SPA on Protractor and testing the Web API on Postman/Newman.
This presentation by Sergii Fradkov (Consultant, Engineering), Andrii Zarharov (Lead Software Engineer, Consultant), Igor Magdich (Lead Test Engineer, Consultant) was delivered at GlobalLogic Kharkiv .NET TechTalk #1 on May 24, 2019.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/luxoft/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Alexey Rybakov, Senior Director at LUXOFT, presents the "Making Computer Vision Software Run Fast on Your Embedded Platform" tutorial at the May 2016 Embedded Vision Summit.
Many computer vision algorithms perform well on desktop class systems, but struggle on resource constrained embedded platforms. This how-to talk provides a comprehensive overview of various optimization methods that make vision software run fast on low power, small footprint hardware that is widely used in automotive, surveillance, and mobile devices. The presentation explores practical aspects of deep algorithm and software optimization such as thinning of input data, using dynamic regions of interest, mastering data pipelines and memory access, overcoming compiler inefficiencies, and more.
There are many challenges on FPGA design such as: FPGA Selection, System Design Challenges, Power and Resource optimization, Verification of Design etc.
Each and every FPGA Engineer face this challenges, so if they prepare for such challenges then they can accomplish and optimize FPGA based project or design in time and within budget.
For more details and consultation: www.digitronixnepal.com, email: digitronixnepali@gmail.com
Similar to TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design productivity/ Liad Weinberger (20)
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design productivity/ Liad Weinberger
1. May 1, 2013 1
OpenCL for ALTERA FPGAs
Accelerating performance and design
productivity
Liad Weinberger – Appilo
May 1st, 2013
2. May 1, 2013 2
Technology trends
• Over the past years
– Technology scaling favors programmability and parallelism
Fine-Grained
Massively
Parallel
Arrays
Single Cores Coarse-Grained
Massively
Parallel
Processor
Arrays
Multi-Cores
Coarse-Grained
CPUs and DSPs
CPUs DSPs Multi-Cores Array GPGPUs FPGAs
3. May 1, 2013 3
Technology trends
0
20
40
60
80
100
120
140
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022
Process node (nm)
• Moore’s law still in effect
– More FPGA real-estate
• More potential for parallelism – an extremely good thing!
• Designs that utilize this real-estate, becomes harder to
manage and maintain – this is not so good...
4. May 1, 2013 4
Technology trends
2007 2008 2009 2010 2011 2012 2013
Google trends
Worldwide Interest over the years
Verilog + VHDL
• Decreased interest
– Number of Google searches for VHDL or
Verilog in decline
5. May 1, 2013 5
Technology trends
2007 2008 2009 2010 2011 2012 2013
Google trends
Interest over the years
Verilog + VHDL
Python
• Software development keeps momentum
– Number of Google searches for Python (as a
representing language)
6. May 1, 2013 6
FPGA (hardware) development
• Design (programming) is complex
– Define state machine, data-paths, arbitration, IP interfaces, etc.
– Sophisticated iterative compilation process
• Synthesis, technology mapping, clustering, placement and routing, timing closure
• Leads to long compilation times (hours vs. minutes in software)
– Debug process is also very time-consuming
• Code is not portable
– Written in Verilog / VHDL
• Can’t re-target for CPUs, GPUs, DSPs, etc.
• Not scalable
Compilation
HDL
Timing
Closure
Set
Constraints
7. May 1, 2013 7
Software development
• Programming is straight-forward
– Ideas are expressed in languages such as C/C++/Python/etc.
• Typically, start with simple sequential implementation
• Use parallel APIs / language extensions, in order to exploit multi-core
architectures for additional performance
– Compilation times are usually reasonably short
• Simple straight-forward compilation/linking process
– Immediate feedback when debugging/profiling
• An assortment of tools available for both debugging and profiling
• Portability is still an issue
– Possible, but require pre-planning
Compiler
&Linker
C/C++
Python
etc.
C/C++
Python
etc.
C/C++/
Python/
etc.
8. May 1, 2013 8
Product development point-of-view
• Product producers want:
– Lower development and maintenance costs
– Competitive edge
• Higher performance
• Short time-in-market, and short time-to-market
– Agile development methods are becoming more and more popular
– Can’t afford long development cycles
– Trained developers with established experience
• Or cost-effective path for training new developers
– Flexibility
• No vendor-locking is preferred
• Ability to rapidly adapt product to market requirement changes
9. May 1, 2013 9
Our challenge
• How do we bring FPGA design process closer to the
software development model?
– Need to make FPGAs more accessible to the software development
community
• Change in mind-set: look at FPGAs as massively multi-core devices that
could be used in order to accelerate parallel applications
• A programming model that allows that
• Shorter compilation times and faster feedback for debugging and profiling
the design
10. May 1, 2013 10
An ideal programming environment...
• Based on a standard programming model
– Rather than something which is FPGA-specific
• Abstracts away the underlying details of the hardware
– VHDL / Verilog are similar to “assembly language” programming
– Useful in rare circumstances where the highest possible efficiency is needed
• The price of abstraction is not too high
– Still need to efficiently use the FPGA’s resources to achieve high throughput / low
area
• Allows for software-like compilation & debug cycles
– Faster compile times
– Profiling & user feedback
11. May 1, 2013 11
Introducing OpenCL
Parallel heterogeneous computing
12. May 1, 2013 12
A case for OpenCL
• What is OpenCL?
– An open, royalty-free standard for cross-platform parallel software programming of
heterogeneous systems
• CPU + DSPs
• CPU + GPUs
• CPU + FPGAs
– Maintained by KHRONOS group
• An industry consortium creating open, royalty-free standards
• Comprised of hardware and software vendors
– Enables software to leverage silicon acceleration
• Consists of two major parts:
– Application Programming Interface (API) for device management
– Device programming language based on C99 with
some restrictions and extensions to support explicit parallelism
Or maybe all together
13. May 1, 2013 13
Benefits of OpenCL
• Cross-vendor software portability
– Functional portability—Same code would normally execute on
different hardware, by different vendors
– Not performance portable—Code still needs to be optimized to
specific device (at least a device class)
• Allows for the management of available computational
resources under a single framework
– Views CPUs, GPUs, FPGAs, and other accelerators as devices that
could carry the computational needs of the application
14. May 1, 2013 14
OpenCL program structure
• Separation between managerial and computational code bases
– Managerial code executes on a host CPU
• Any type of conventional micro-processor
• Written in any language that has bindings for the OpenCL API
– The API is in ANSI-C
– There is a formal C++ binding
– Other bindings may exist
– Computational code executes on the compute devices (accelerators)
• Written in a language called OpenCL C
– Based on C99
– Adds restrictions and extensions for explicit parallelism
• Can be compiled either offline, or online, depending on implementation
• Will most likely consist only of those portions of the application we want to accelerate
16. May 1, 2013 16
OpenCL host application
• Communicates with the Accelerator Device via a set of
library routines
– Abstracts away host processor to HW accelerator communication via
a set of API calls
main() {
read_data( … );
maninpulate( … );
clEnqueueWriteBuffer( … );
clEnqueueNDRangeKernel(…,sum,…);
clEnqueueReadBuffer( … );
display_result( … );
}
Copy data
Host FPGA
Ask the FPGA to run
a particular kernel
Copy data
FPGA Host
17. May 1, 2013 17
OpenCL kernels
• Data-parallel function
– Executes by many parallel
threads
• Each thread has an identifier
which could be obtained with
a call to the get_global_id()
built-in function
• Uses qualifiers to define
where memory buffers reside
• Executed by a
compute device
– CPU
– GPU
– FPGA
– Other accelerator
float *a =
float *b =
float *y =
0 1 2 3 4 5 6 7
7 6 5 4 3 2 1 0
7 7 7 7 7 7 7 7
__kernel void
sum(__global float *a,
__global float *b,
__global float *y)
{
int gid = get_global_id(0);
y[gid] = a[gid] + b[gid];
}
__kernel void sum( … );
18. May 1, 2013 18
OpenCL on FPGAs
How does it map?
19. May 1, 2013 19
Compiling OpenCL to FPGAs
x86
PCIe
SOF X86 binary
ACL
Compiler
Standard
C Compiler
OpenCL
Host Program + Kernels
__kernel void
sum(__global float *a,
__global float *b,
__global float *y)
{
int gid = get_global_id(0);
y[gid] = a[gid] + b[gid];
}
Kernel Programs Host Program
main() {
read_data( … );
maninpulate( … );
clEnqueueWriteBuffer( … );
clEnqueueNDRangeKernel(…,sum,…);
clEnqueueReadBuffer( … );
display_result( … );
}
20. May 1, 2013 20
Compiling OpenCL to FPGAs
Load Load
Store
Load Load
Store
Load Load
Store
Load Load
Store
Load Load
Store
Load Load
Store
PCIe
DDRx
__kernel void
sum(__global float *a,
__global float *b,
__global float *y)
{
int gid = get_global_id(0);
y[gid] = a[gid] + b[gid];
}
Kernel Programs
Custom Hardware for Your Kernels
21. May 1, 2013 21
FPGA architecture for OpenCL
FPGA
Kernel
Pipeline
Kernel
Pipeline
Kernel
Pipeline
PCIe
DDR*
x86 /
External
Processor
External
Memory
Controller
& PHY
Memory
Memory
Memory
Memory
Memory
Memory
Global Memory Interconnect
Local Memory Interconnect
External
Memory
Controller
& PHY
Kernel System
22. May 1, 2013 22
Mapping multithreaded kernels to FPGAs
• Simplest way of mapping kernel functions to FPGAs is
to replicate hardware for each thread
– Inefficient and wasteful
• Technique: deep pipeline parallelism
– Attempt to create a deeply pipelined representation of a kernel
– On each clock cycle, we attempt to send in input data for a new
thread
– Method of mapping coarse grained thread parallelism to fine-grained
FPGA parallelism
23. May 1, 2013 23
Example pipeline for vector add
• On each cycle, the portions of
the pipeline are processing
different threads
• While thread 2 is being loaded,
thread 1 is being added, and
thread 0 is being stored
Load Load
Store
0 1 2 3 4 5 6 7
8 threads for vector add example
Thread IDs
+
24. May 1, 2013 24
Example pipeline for vector add
• On each cycle, the portions of
the pipeline are processing
different threads
• While thread 2 is being loaded,
thread 1 is being added, and
thread 0 is being stored
Load Load
Store
1 2 3 4 5 6 7
0
8 threads for vector add example
Thread IDs
+
25. May 1, 2013 25
Example pipeline for vector add
• On each cycle, the portions of
the pipeline are processing
different threads
• While thread 2 is being loaded,
thread 1 is being added, and
thread 0 is being stored
Load Load
Store
2 3 4 5 6 7
0
1
8 threads for vector add example
Thread IDs
+
26. May 1, 2013 26
Example pipeline for vector add
• On each cycle, the portions of
the pipeline are processing
different threads
• While thread 2 is being loaded,
thread 1 is being added, and
thread 0 is being stored
Load Load
Store
3 4 5 6 7
1
2
8 threads for vector add example
Thread IDs
+
0
27. May 1, 2013 27
Example pipeline for vector add
• On each cycle, the portions of
the pipeline are processing
different threads
• While thread 2 is being loaded,
thread 1 is being added, and
thread 0 is being stored
Load Load
Store
4 5 6 7
0
2
3
8 threads for vector add example
Thread IDs
+
1
28. May 1, 2013 28
Some examples
Using ALTERA’s OpenCL solution
29. May 1, 2013 29
AES encryption
• Counter (CTR) based encryption/decryption
– 256-bit key
• Advantage FPGA
– Integer arithmetic
– Coarse grain bit operations
– Complex decision making
• Results Platform Throughput (GB/s)
E5503 Xeon Processor 0.01 (single core)
AMD Radeon HD 7970 0.33
PCIe385 A7 Accelerator 5.20
42% utilization (2 kernels)
•Power conservation
•Fill up for even higher performance
30. May 1, 2013 30
Multi-asset barrier option pricing
• Monte-carlo simulation
– Heston model
– ND range
• Assets x paths (64x1000000)
• Advantage FPGA
– Complex control flow
• Results
tttt
S
ttttt
dWdtd
dWSdtSdS
Platform
Power
(W)
Performance
(Msims/s)
Msims/W
W3690 Xeon Processor 130 32 0.25
nVidia Tesla C2075 225 63 0.28
PCIe385 D5 Accelerator 23 170 7.40