Amazon EC2 F1 is a new compute instance with programmable hardware for application acceleration. With F1, you can directly access custom FPGA hardware on the instance in a few clicks.
Learning Objectives:
• Learn about the capabilities, features, and benefits of the new F1 instances
• Develop your FPGA using the F1 Hardware Developer Kit and FPGA Developer AMI
• Deploy your FPGA acceleration code using F1 instances
• Use F1 instances for hardware acceleration in your applications
• Learn how to offer pre-packaged Amazon FPGA Machine Images (AFIs) to your customers through the AWS Marketplace
5. NVIDIA Tesla
GPU Card
P2: GPU-accelerated computing
Enabling a high degree of parallelism – each
GPU has thousands of cores
Consistent, well documented set of APIs
(CUDA, OpenACC, OpenCL)
Supported by a wide variety of ISVs and
open source frameworks
Xilinx
UltraScale+
FPGA
F1: FPGA-accelerated computing
Massively parallel – each FPGA includes
millions of parallel system logic cells
Flexible – no fixed instruction set, can
implement wide or narrow datapaths
Programmable using available, cloud-based
FPGA development tools
GPU and FPGA for Accelerated Computing
6. CPU: High speed, lower efficiency GPU/FPGA: High throughput, higher efficiency
GPUs and FPGAs can provide massive parallelism and higher
efficiency than CPUs for certain categories of applications
Accelerated Computing Concepts
More parallelism for higher throughout…
7. A GPU is effective at processing the same set of operations in parallel – single instruction, multiple data
(SIMD). A GPU has a well-defined instruction-set, and fixed word sizes – for example single, double, or
half-precision integer and floating point values.
An FPGA is effective at processing the same or different operations in parallel – multiple
instructions, multiple data (MIMD). An FPGA does not have a predefined instruction-set, or a
fixed data width.
Control
ALU
ALU
Cache
DRAM
ALU
ALU
CPU
(one core)
FPGA
DRAM DRAM
GPU
Each FPGA in
F1 has more
than 2M of
these cells
Each GPU
in P2 has
2880 of
these
cores
DRAM
Parallel Processing in GPUs and FPGAs
BlockRAM
BlockRAM
DRAM DRAM
8. module filter1 (clock, rst, strm_in, strm_out)
for (i=0; i<NUMUNITS; i=i+1)
always@(posedge clock)
integer i,j; //index for loops
tmp_kernel[j] = k[i*OFFSETX];
FPGA handles compute-
intensive, deeply pipelined,
hardware-accelerated
operations
CPU handles the rest
application
How FPGA Acceleration Works
10. An FPGA is effective at processing data of many types in parallel, for example
creating a complex pipeline of parallel, multistage operations on a video stream, or
performing massive numbers of dependent or independent calculations for a
complex financial model…
An FPGA does not have an
instruction-set!
Data can be any bit-width (9-bit
integer? No problem!)
Complex control logic (such as a
state machine) is easy to
implement in an FPGA
Each FPGA in
F1 has more
than 2M of
these cells
Parallel Processing in FPGAs
12. Make FPGA acceleration available to a larger community of
developers, and to millions of potential end-customers
Provide dedicated and large amounts of FPGA logic in a
single EC2 instance, using multiple FPGAs
Simplify the development process by providing cloud-based
FPGA development tools
Allow developers to focus on algorithm design, by
abstracting FPGA I/O using well-defined interfaces
Provide access to a growing ecosystem of FPGA
programming tools and applications
Provide a Marketplace for FPGA applications, providing more
choice and easy access for all AWS customers
FPGA Acceleration in the AWS Cloud: Goals
13. New EC2 FPGA instance type for accelerated computing
Up to 8 Xilinx UltraScale+ 16nm VU9P FPGA devices in a single instance
The f1.16xlarge size provides:
8 FPGAs, each with over 2 million customer-accessible FPGA
programmable logic cells and over 5000 programmable DSP blocks
Each of the 8 FPGAs has 4 DDR-4 interfaces, with each interface
accessing a 16GiB, 72-bit wide, ECC-protected memory
Instance Size FPGAs DDR-4
(GiB)
FPGA
Link
FPGA
Direct
vCPUs Instance
Memory (GiB)
NVMe Instance
Storage (GB)
Network
Bandwidth*
f1.2xlarge 1 4 x 16 - - 8 122 1 x 480 10 Gbps Peak
f1.16xlarge 8 32 x 16 Y Y 64 976 4 x 960 30 Gbps
*In a placement group
F1 FPGA Instance Types on AWS
14. System Logic Block:
Each FPGA in F1 provides over 2M
of these logic blocks
DSP (Math) Block:
Each FPGA in F1 has more than
5000 of these blocks
I/O Blocks:
Used to communicate externally, for
example to DDR-4, PCIe, or ring
Block RAM:
Each FPGA in F1 has over 60Mb of
internal Block RAM, and over
230Mb of embedded UltraRAM
BlockRAM
BlockRAM
I/O Blocks
DDR-4 DDR-4
DDR-4 DDR-4
PCIe
FPGALink
What’s Inside the F1 FPGA?
15. AWS FPGA Shell
FPGA I/O is provided using pre-configured, pre-tested,
and secure I/O components, allowing FPGA developers
to focus on their differentiating value
The FPGA Shell allows for faster coding of core
acceleration functions by removing the need to develop
I/O related FPGA hardware
BlockRAM
BlockRAMDDR-4 DDR-4
DDR-4 DDR-4
FPGALink
PCIe
Abstracting FPGA I/O
16. Amazon
Machine
Image (AMI)
Amazon FPGA
Image (AFI)
EC2 F1
Instance
CPU
Application
on F1
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
FPGA Link
PCIe
DDR
Controllers
Launch Instance
and Load AFI
An F1 instance
can have any
number of AFIs
An AFI can be
loaded into the
FPGA in less than
1 second
FPGA Acceleration Using F1
18. Highly Efficient
• Algorithms Implemented in Hardware
• Gate-Level Circuit Design
• No Instruction Set Overhead
Massively Parallel
• Massively Parallel Circuits
• Multiple Compute Engines
• Rapid FPGA Reconfigurability
FPGA
Speeds Analysis of Whole Human Genomes from Hours to Minutes
Unprecedented Low Cost for Compute and Compressed Storage
F1 for Genomics Processing
19. F1 for Financial Computing
Modeling Counterparty Risk (CVA) and Regulatory Capital Requirements
20. F1 for Video Processing
Next Generation Video Compression for Broadcast Quality 4K content
Successfully ported
to F1 in just 3 weeks
21. F1 for Accelerated Analytics
Heterogeneous Compute Acceleration for Faster Data Discovery
23. Development steps
Launch the AWS-provided FPGA Developer AMI, which includes all
needed FPGA design and programming software, as well as the AWS
FPGA Hardware Development Kit (HDK)
Use Xilinx Vivado or SDAccel software and a hardware description
language (Verilog, VHDL, or OpenCL) with the HDK to describe and
simulate your custom FPGA logic
After successful simulation, use Vivado or SCAccel to synthesize and
place/route the FPGA logic to create an FPGA Design Check Point
(DCP), encrypt, and generate an Amazon FPGA Image (AFI)
Launch an F1 instance and load the AFI to the FPGA, using AFI
management tools provided by AWS
Developing Applications for F1
1
2
3
4
24. Generate an
Amazon FPGA
Image (AFI)
FPGA Place-and-Route
using Xilinx Vivado on C4 or
M4 instance
FPGA Logic Design using
Xilinx Vivado on C4 or M4
instance
Securely deploy
AFI on one or
more F1
instances
Developing Applications for F1
25. Choose and launch the AWS-provided FPGA Developer AMI, which includes all
needed FPGA design and programming software, as well as the AWS FPGA
Hardware Development Kit (HDK)
Developing Applications for F1
27. Use Xilinx Vivado or SCAccel software and a hardware description language
(Verilog, VHDL, or OpenCL) with the HDK to describe and simulate your custom
FPGA logic
After successful simulation, use scripts provided with the HDK to encrypt,
synthesize and place/route the FPGA logic to create a final FPGA Design Check
Point (DCP) and generate a secure, encrypted Amazon FPGA Image (AFI)
Developing Applications for F1
28. Launch an F1 instance and download the AFI to the FPGA, using AFI
management tools provided by AWS
Generate an
Amazon FPGA
Image (AFI)
Deploy AFI on
one or more F1
instances
Developing Applications for F1
29. Amazon EC2 FPGA
Deployment via Marketplace
Amazon
Machine
Image (AMI)
Amazon FPGA Image
(AFI)
AFI is secured, encrypted,
dynamically loaded into the
FPGA - can’t be copied or
downloaded
Customers
AWS Marketplace
Delivering FPGA Partner Solutions on AWS
via AWS Marketplace
30. Delivering FPGA Partner Solutions on AWS
AWS Marketplace Benefits
• Streamlined delivery of FPGA-accelerated solutions: Offer software as a
managed Amazon Machine Image (AMI) and one or more Amazon FPGA Images
(AFI), with secure 1-click purchasing.
• Discover new customers: Allow customers to launch directly from AWS
Marketplace, decreasing the length of sales cycles. Sellers can also offer free
trials with no additional engineering effort.
• Simplified billing & payments: Customers pay for AWS Marketplace software
as part of the regular AWS billing cycle. AWS manages the complexity of AMI and
AFI security, metering, billing, payment collection, and financial reporting.
• Secure your FPGA-based products: FPGA custom logic is deployed to
customers in a secure way, with no ability to view, copy, or edit the AFI logic.
• Provide Seamless Product Support: AWS Marketplace Product Support
Connection makes it easy to support your customers on AWS Marketplace.
31. FPGA: A Field Programmable Gate Array is a device that consists of very large numbers of
configurable logic and memory elements interconnected by configurable routing resources.
FPGAs differ from CPUs and GPUs by having no fixed instruction set, and in their ability to
implement operations and processes that are pipelined and parallelized in an almost unlimited
number of ways, using arbitrarily sized bit-widths.
AFI (Amazon FPGA Image): a file containing the binary image for an FPGA bitstream.
Loading an AFI onto an FPGA “programs” that device, within seconds, to perform one of more
application-specific functions.
HDL (Hardware Description Language): a low-level programming language designed for
describing logic functions for the purposed of simulation and for conversion (via synthesis) to
an FPGA or ASIC.
Vivado and SDAccel: a set of design tools produced by Xilinx (provider of the F1 FPGA
devices) for development of FPGA logic, pre-integrated and provided at no charge by AWS.
Verilog: a commonly-used HDL for FPGA design and simulation, supported by Vivado.
VHDL: another commonly-used HDL for FPGA , also supported by Vivado.
F1 Glossary
32. OpenCL (Open Computing Language): a higher-level alternative to HDL programming
based on C-language, and supported in the Xilinx SDAccel design tools. OpenCL can be
used to target either FPGAs or GPUs.
HDK (Hardware Development Kit): a set of tools, documentation, and associated FPGA
libraries provided by AWS to assist FPGA developers with more rapid FPGA development, in
particular to simplify the use of I/O from the FPGA to the host EC2 instance via PCIe, from
FPGA to memory, and from FPGA to FPGA.
AXI: an FPGA-internal bus format providing standardized interfaces for memory-mapped
communications and for high-speed streaming data. AXI is used in the F1 HDK to define
interfaces between AWS-provided interface logic, and custom logic provided by FPGA
developers.
Developer AMI: a preconfigured AMI, available in the AWS Marketplace, that includes all
necessary software and libraries for FPGA development, including the Vivado software and
the HDK libraries enabling HDL design and simulation.
F1 Glossary (cont)
33. Synthesis: the process, using software tools provided with Vivado, of converting an HDL or
OpenCL application into a lower-level format (sometimes referred to as a “netlist”)
representing the individual logic elements of the application, for example AND, OR, XOR
gates, adders and multipliers, shift registers, etc. This “netlist” must be further processed,
using place-and-route software, to create a downloadable bitstream.
Place-and-Route: the process, using software tools provided with Vivado, of mapping
individual logic elements to precise locations in the target FPGA, and specifying their
interconnections. Place-and-route is an iterative process that can require hours to complete
for larger applications and larger FPGAs.
Bitstream: a binary format representing the synthesized, placed, and routed FPGA
application ready for downloading to an FPGA.
Design Check Point (DCP): a binary file format containing the FPGA bitstream, ready for
ingestion during the creation of an Amazon FPGA Image (AFI).
F1 Glossary (cont)