SlideShare a Scribd company logo
1
Lessons of last 50 years of Computer Architecture
1. Raising the hardware/software interface creates
opportunities for architecture innovation
○ e.g., C, Python, TensorFlow, PyTorch
2. Ultimately benchmarks and the marketplace
settles architecture debates
○ e.g., SPEC, TPC, MLPerf, ...
2
Instruction Set Architecture?
• Software talks to hardware using a vocabulary
– Words called instructions
– Vocabulary called
instruction set architecture
(ISA)
• Most important interface since
determines software that can run on hardware
– Software is distributed as instructions
33
IBM Compatibility Problem in Early 1960s
By early 1960’s, IBM had 4 incompatible lines of computers!
701 ➡ 7094
650 ➡ 7074
702 ➡ 7080
1401 ➡ 7010
Each system had its own:
▪ Instruction set architecture (ISA)
▪ I/O system and Secondary Storage:
magnetic tapes, drums and disks
▪ Assemblers, compilers, libraries,...
▪ Market niche: business, scientific, real time, ...
IBM System/360 – one ISA to rule them all 4
Control versus Datapath
▪ Processor designs split between datapath, where numbers are stored and
arithmetic operations computed, and control, which sequences operations on
datapath
▪ Biggest challenge for computer designers was getting control correct
▪ Maurice Wilkes invented the
idea of microprogramming to
design the control unit of a
processor*
▪ Logic expensive vs. ROM or RAM
▪ ROM cheaper and faster than RAM
▪ Control design now programming
5
* "Micro-programming and the design of the control circuits in an electronic digital computer,"
M. Wilkes, and J. Stringer. Mathematical Proc. of the Cambridge Philosophical Society, Vol. 49, 1953.
Microprogramming in IBM 360
Model M30 M40 M50 M65
Datapath width 8 bits 16 bits 32 bits 64 bits
Microcode size 4k x 50 4k x 52 2.75k x 85 2.75k x 87
Clock cycle time (ROM) 750 ns 625 ns 500 ns 200 ns
Main memory cycle time 1500 ns 2500 ns 2000 ns 750 ns
Price (1964 $) $192,000 $216,000 $460,000 $1,080,000
Price (2018 $) $1,560,000 $1,760,000 $3,720,000 $8,720,000
6Fred Brooks, Jr.
IC Technology, Microcode, and CISC
▪ Logic, RAM, ROM all implemented using same transistors
▪ Semiconductor RAM ≈ same speed as ROM
▪ With Moore’s Law, memory for control store could grow
▪ Since RAM, easier to fix microcode bugs
▪ Allowed more complicated ISAs (CISC)
▪ Minicomputer (TTL server) example:
-Digital Equipment Corp. (DEC)
-VAX ISA in 1977
▪ 5K x 96b microcode
7
Microprocessor Evolution
▪ Rapid progress in 1970s, fueled by advances in MOS technology,
imitated minicomputers and mainframe ISAs
▪ “Microprocessor Wars”: compete by adding instructions (easy for microcode),
justified given assembly language programming
▪ Intel iAPX 432: Most ambitious 1970s micro, started in 1975
▪ 32-bit capability-based, object-oriented architecture, custom OS written in Ada
▪ Severe performance, complexity (multiple chips), and usability problems; announced 1981
▪ Intel 8086 (1978, 8MHz, 29,000 transistors)
▪ “Stopgap” 16-bit processor, 52 weeks to new chip
▪ ISA architected in 3 weeks (10 person weeks) assembly-compatible with 8 bit 8080
▪ IBM PC 1981 picks Intel 8088 for 8-bit bus (and Motorola 68000 was late)
8
▪ Estimated PC sales: 250,000
▪ Actual PC sales: 100,000,000 ⇒ 8086 “overnight” success
▪ Binary compatibility of PC software ⇒ bright future for 8086
Analyzing Microcoded Machines 1980s
▪ HW/SW interface rises from assembly to HLL programming
▪ Compilers now source of measurements
▪ John Cocke group at IBM
▪ Worked on a simple pipelined processor, 801 minicomputer
(ECL server), and advanced compilers inside IBM
▪ Ported their compiler to IBM 370, only used
simple register-register and load/store instructions (similar to 801)
▪ Up to 3X faster than existing compilers that used full 370 ISA!
▪ Emer and Clark at DEC in early 1980s*
▪ Found VAX 11/780 average clock cycles per instruction (CPI) = 10!
▪ Found 20% of VAX ISA ⇒ 60% of microcode, but only 0.2% of execution time!
9
* "A Characterization of Processor Performance in the VAX-11/780," J. Emer and D.Clark, ISCA, 1984.
John Cocke
From CISC to RISC
▪ Use RAM for instruction cache of user-visible instructions
▪ Software concept: Compiler vs. Interpreter
▪ Contents of fast instruction memory change to what application needs now
vs. ISA interpreter
▪ Use simple ISA
▪ Instructions as simple as microinstructions, but not as wide
▪ Enable pipelined implementations
▪ Compiled code only used a few CISC instructions anyways
▪ Chaitin’s register allocation scheme* benefits load-store ISAs
10
*Chaitin, Gregory J., et al. "Register allocation via coloring." Computer languages 6.1 (1981), 47-57.
Berkeley and Stanford RISC Chips
11
Fitzpatrick, Daniel, John Foderaro,
Manolis Katevenis, Howard Landman,
David Patterson, James Peek, Zvi
Peshkess, Carlo Séquin, Robert
Sherburne, and Korbin Van Dyke. "A
RISCy approach to VLSI." ACM
SIGARCH Computer Architecture News
10, no. 1 (1982)
Hennessy, John, Norman Jouppi, Steven
Przybylski, Christopher Rowen, Thomas
Gross, Forest Baskett, and John Gill.
"MIPS: A microprocessor architecture." In
ACM SIGMICRO Newsletter, vol. 13, no.
4, (1982).
Reduced Instruction Set Computer?
• Reduced Instruction Set Computer (RISC)
vocabulary uses simple words (instructions)
• RISC reads 25% more
instructions since simple vs.
Complex Instruction Set
Computer (CISC)
e.g., Intel 80x86
• But RISC reads them 5 times faster
• Net is 4 times faster
1212
▪ CISC executes fewer instructions /
program (≈ 3/4X instructions)
but many more clock cycles per
instruction (≈ 6X CPI)
⇒ RISC ≈ 4X faster than CISC
“Performance from architecture: comparing a RISC
and a CISC with similar hardware organization,”
Dileep Bhandarkar and Douglas Clark, Proc.
Symposium, ASPLOS, 1991.
Time
=
Instructions Clock cycles __Time___
Program Program * Instruction * Clock cycle
“Iron Law” of Processor Performance: How RISC can win
13
How to Measure Performance?
14
▪ Instruction rate (MIPS, millions of instructions per second)
+ Easy to understand, bigger is better
- But can’t compare different ISAs, higher MIPS can be slower
▪ Time to run toy program (puzzle)
+ Can compare different ISAs, shorter time always faster
- But not representative of real programs
▪ Synthetic programs (Whetstone, Dhrystone)
+ Tries to match characteristics of real programs
- Compilers can remove most code, less realistic over time
▪ Benchmark suite relative to reference computer (SPEC)
+ Real programs, bigger is better, geometric mean fair
- Must update every 2-3 years to stay uptodate ⇒ organization
CISC vs. RISC Today
PC Era
▪ Hardware translates x86
instructions into internal
RISC instructions
(Compiler vs Interpreter)
▪ Then use any RISC
technique inside MPU
▪ > 350M / year !
▪ x86 ISA eventually
dominates servers as well
as desktops
PostPC Era: Client/Cloud
▪ IP in SoC vs. MPU
▪ Value die area, energy as much as
performance
▪ > 20B total / year in 2017
▪ 99% Processors today are RISC
▪ Marketplace settles debate
15
Lessons from RISC vs CISC
● Less is More
○ It’s harder to come up with simple solutions, but they accelerate progress
● Importance of the software stack vs the hardware
○ If compiler can’t generate it, who cares?
● Importance of good benchmarks
○ Hard to make progress if you can’t measure it
○ For better or for worse, benchmarks shape a field
● Take the time for a quantitative approach vs rely on intuition to
start quickly
16
Moore’s Law Slowdown in Intel Processors
17
Moore, Gordon E. "No exponential is forever: but ‘Forever’ can be delayed!"
Solid-State Circuits Conference, 2003.
15XWe’re now in the
Post Moore’s Law Era
Technology & Power: Dennard Scaling
Power consumption
based on models in
Esmaeilzadeh
18
Energy scaling for fixed task is better, since more and faster transistors
Power consumption
based on models in
“Dark Silicon and the
End of Multicore
Scaling,” Hadi
Esmaelizadeh, ISCA,
2011
End of Growth of Single Program Speed?
19
End of
the
Line?
2X /
20 yrs
(3%/yr)
RISC
2X / 1.5 yrs
(52%/yr)
CISC
2X / 3.5 yrs
(22%/yr)
End of
Dennard
Scaling
⇒
Multicore
2X / 3.5
yrs
(23%/yr)
Am-
dahl’s
Law
⇒
2X /
6 yrs
(12%/yr)
Based on SPECintCPU. Source: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, 6/e. 2018
Domain Specific Architectures (DSAs)
• Achieve higher efficiency by tailoring the architecture to
characteristics of the domain
• Not one application, but a domain of applications
• Different from strict ASIC since still runs software
20
Why DSAs Can Win (no magic)
Tailor the Architecture to the Domain
• More effective parallelism for a specific domain:
• SIMD vs. MIMD
• VLIW vs. Speculative, out-of-order
• More effective use of memory bandwidth
• User controlled versus caches
• Eliminate unneeded accuracy
• IEEE replaced by lower precision FP
• 32-64 bit integers to 8-16 bit integers
• Domain specific programming language provides path for
software
21
Deep learning is causing
a machine learning revolution
From “A New Golden Age in
Computer Architecture:
Empowering the
Machine-Learning
Revolution.” Dean, J.,
Patterson, D., & Young, C.
(2018). IEEE Micro, 38(2),
21-29.
Tensor Processing Unit v1 (Announced May 2016)
Google-designed chip for neural net inference
In production use for 3 years: used by billions on
search queries, for neural machine translation,
for AlphaGo match, …
A Domain-Specific Architecture for Deep Neural Networks, Jouppi,
Young, Patil, Patterson, Communications of the ACM, September 2018
TPU: High-level Chip Architecture
▪ The Matrix Unit: 65,536 (256x256) 8-bit
multiply-accumulate units
▪ 700 MHz clock rate
▪ Peak: 92T operations/second
▪ 65,536 * 2 * 700M
▪ >25X as many MACs vs GPU
▪ >100X as many MACs vs CPU
▪ 4 MiB of on-chip Accumulator memory
+ 24 MiB of on-chip Unified Buffer
(activation memory)
▪ 3.5X as much on-chip memory vs GPU
▪ 8 GiB of off-chip weight DRAM memory
24
Perf/Watt TPU vs CPU & GPU
25
83
29
Using production applications vs
contemporary CPU and GPU
⇒
⇒
⇒
⇒
Reasons for TPUv1 Success
The Launching of “1000 Chips”
● Intel acquires DSA chip companies
● Nervana: ($0.4B) August 2016
● Movidius: ($0.4B) September 2016
● MobilEye: ($15.3B) March 2017
● Habana: ($2.0B) December 2019
● Alibaba, Amazon inference chips
● >100 startups ($2B) launch on own bets
● Dataflow architecture: Graphcore, ...
● Asynchronous logic: Wave Computing, ...
● Analog computing: Mythic, …
● Wafer Scale computer: Cerebras
● Coarse-Grained Reconfigurable Arch: SambaNova, ... 27
Helen of Troy by Evelyn De Morgan
How to Measure ML Performance?
28
Operation rate (GOPS, billions of operations per second)
Easy to understand, bigger is better
But peak rates not for same program
Operations can vary between DSAs (FP vs int, 4b/8b/16b/32b)
Time to run old DNN (MNIST, AlexNet)
Can compare different ISAs, shorter time always faster
But not representative of today’s DNNs
Benchmark suite relative to reference computer (MLPerf)
Real programs, bigger is better, same DNN model, same data set, geometric mean
fair comparison, batch size ranges set
Must update every 1-2 years to stay uptodate ⇒ organization
Embedded Computing and ML
● ML becoming one of the most important workloads
● But lots of applications don’t need highest performance
○ For many, just enough at low cost
● Microcontrollers most popular processors
○ Cheap, Low Power, fast enough for many apps
● Despite importance, no good microprocessor benchmarks
○ Still quote synthetic programs: Dhrystone, CoreMarks
● Decided to try to fix
● EmBench: better for all embedded, includes ML benchmarks also
7 Lessons for Embench
1. Embench must be free
2. Embench must be easy to port and run
3. Embench must be a suite of real programs
4. Embench must have a supporting organization to maintain it
5. Embench must report a single summarizing score
6. Embench should summarize using geometric mean and std. dev.
7. Embench must involve both academia and industry
The Plan
● Jan - Jun 2019: Small group created the initial version
− Dave Patterson, Jeremy Bennett, Palmer Dabbelt, Cesare Garlati
− mostly face-to-face
● Jun 2019 – Feb 2020: Wider group open to all
− under FOSSi, with mailing list and monthly conference call
− see www.embench.org
● Feb 2020: Launch Embench 0.5 at Embedded World
● Present: Working on Embench 0.6
Baseline Data
Name Comments Orig Source C LOC code size data size time (ms) branch memory compute
aha-mont64 Montgomery multiplication AHA 162 1,052 0 4,000 low low high
crc32 CRC error checking 32b MiBench 101 230 1,024 4,013 high med low
cubic Cubic root solver MiBench 125 2,472 0 4,140 low med med
edn More general filter WCET 285 1,452 1,600 3,984 low high med
huffbench Compress/Decompress Scott Ladd 309 1,628 1,004 4,109 med med med
matmult-int Integer matrix multiply WCET 175 420 1,600 4,020 med med med
minver Matrix inversion WCET 187 1,076 144 4,003 high low med
nbody Satellite N body, large data CLBG 172 708 640 3,774 med low high
nettle-aes Encrypt/decrypt Nettle 1,018 2,880 10,566 3,988 med high low
nettle-sha256 Crytographic hash Nettle 349 5,564 536 4,000 low med med
nsichneu Large - Petri net WCET 2,676 15,042 0 4,001 med high low
picojpeg JPEG MiBench2 2,182 8,036 1,196 3,748 med med high
qrduino QR codes Github 936 6,074 1,540 4,210 low med med
sglib-combined Simple Generic Library for C SGLIB 1,844 2,324 800 4,028 high high low
slre Regex SLRE 506 2,428 126 3,994 high med med
st Statistics WCET 117 880 0 4,151 med low high
statemate State machine (car window) C-LAB 1,301 3,692 64 4,000 high high low
ud LUD composition Int WCET 95 702 0 4,002 med low high
Public Repository
What Affects Embench Results?
● Instruction Set Architecture: Arm, ARC, RISC-V, AVR, ...
− extensions: ARM: v7, Thumb2, …, RV32I, M, C, ...
● Compiler: open (GCC, LLVM) and proprietary (IAR, …)
− which optimizations included: Loop unrolling, inlining procedures,
minimize code size, …
− older ISAs likely have more mature and better compilers?
● Libraries
− open (GCC, LLVM) and proprietary (IAR, Sega, ...)
● Embench excludes libraries when sizing
− they can swamp code size for embedded benchmark
Impact of optimizations of GCC on RISC-V: Speed
● -msave-restore
invokes functions to
save and restore
registers at procedure
entry and exit instead
of inline code of stores
and loads
− ISA Alternative
would be Store
Multiple instruction
and Load Multiple
instruction
PULP RI5CY RV32IMC GCC 10.1.0 (higher is faster)
Impact of optimizations of GCC on RISC-V: Size
● -msave-restore
invokes functions to
save and restore
registers at
procedure entry and
exit instead of inline
code of stores and
loads
● ISA Alternative
would be Store
Multiple instruction
and Load Multiple
instruction
PULP RI5CY RV32IMC GCC 10.1.0 (lower is smaller)
Comparing Architectures with GCC: Speed
● GCC 10.2.0
− higher is faster
Arm Cortex-M4, no FPU
PULP RI5CY RV32IMC GCC 10.2.0 (soft core in FPGA
Comparing Architectures with GCC: Size
● GCC 10.2.0
− lower is smaller
Arm Cortex-M4, no FPU
PULP RI5CY RV32IMC GCC 10.2.0 (soft core in FPGA)
Comparing Compilers GCC v LLVM: Speed
● PULP RI5CY RV32IMC
− higher is faster
● Clang/LLVM variations
− -msave-restore
enabled by default
with ‑Os
− -Oz for further code
size optimization
GCC 10.2.0
Clang/LLVM 11.0.0 rc
Comparing Compilers GCC v LLVM: Size
● PULP RI5CY RV32IMC
− lower is smaller
● Clang/LLVM variations
− -msave-restore
enabled by default
with ‑Os
− -Oz for further code
size optimization
GCC 10.2.0
Clang/LLVM 11.0.0 rc
Code Size over GCC versions
Lots More to Explore with Embench
● More compilers: LLVM, IAR, …
− and more optimizations
● More architectures: MIPS, Tensilica, ARMv8, RV64I, ...
− and more instruction extensions: bit manipulation, vector, floating point, …
● More processors: ARM M7, M33, M24, RISC-V Rocket, BOOM, ...
● Context switch times
● In later versions of Embench: Interrupt Latency
− floating point programs for larger machines in Embench 0.6
● Published results in embench-iot-results repository
● Want to help? Email info@embench.org
Benchmarking Lessons?
1) Must show code size with performance so as to get meaningful
results
2) Importance of geometric standard deviation as well as geometric
mean
3) More mature architecture have more mature compilers
Conclusions
● End of Dennard Scaling, slowing of Moore’s Law ⇒ DSA
● ML DSAs need HW/SW codesign
● To measure progress, need good benchmarks,
● MLPerf for data center and high end edge
● For microcontrollers, Embench 0.5 suite is already better than
synthetic programs Dhrystone and CoreMark, and will get better
− Many more studies: more ISAs, more compilers, more cores,
● Let us know if you’d like to help: Email info@embench.org

More Related Content

What's hot

HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
Linaro
 
Tech talk with Antmicro - Building an open source system verilog ecosystem
Tech talk with Antmicro - Building an open source system verilog ecosystemTech talk with Antmicro - Building an open source system verilog ecosystem
Tech talk with Antmicro - Building an open source system verilog ecosystem
RISC-V International
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
 
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
Design and Testing Challenges for Chiplet Based Design: Assembly and Test ViewDesign and Testing Challenges for Chiplet Based Design: Assembly and Test View
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
ODSA Workgroup
 
AI on the Edge
AI on the EdgeAI on the Edge
AI on the Edge
Jared Rhodes
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
RISC-V International
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
Usman Qayyum
 
Intel Corporation - BA401
Intel Corporation - BA401Intel Corporation - BA401
Intel Corporation - BA401
guest3ea4529f
 
4th generation computer
4th generation computer4th generation computer
4th generation computer
Sohag Babu
 
RIOT: towards open source, secure DevOps on microcontroller-based IoT
RIOT: towards open source, secure DevOps on microcontroller-based IoTRIOT: towards open source, secure DevOps on microcontroller-based IoT
RIOT: towards open source, secure DevOps on microcontroller-based IoT
Alexandre Abadie
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Embarcados
 
GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報
NVIDIA Japan
 
JETSON : AI at the EDGE
JETSON : AI at the EDGEJETSON : AI at the EDGE
JETSON : AI at the EDGE
Skolkovo Robotics Center
 
vlsi design summer training ppt
vlsi design summer training pptvlsi design summer training ppt
vlsi design summer training ppt
Bhagwan Lal Teli
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
Sundance Multiprocessor Technology Ltd.
 
AI talk at CogX 2018
AI talk at CogX 2018AI talk at CogX 2018
AI talk at CogX 2018
Alison B. Lowndes
 
What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
Andreas Olofsson
 
Introduction to VLSI
Introduction to VLSI Introduction to VLSI
Introduction to VLSI
illpa
 
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
Edge AI and Vision Alliance
 
Resent intel microprocessor
Resent intel microprocessorResent intel microprocessor
Resent intel microprocessor
Kartik Kalpande Patil
 

What's hot (20)

HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
Tech talk with Antmicro - Building an open source system verilog ecosystem
Tech talk with Antmicro - Building an open source system verilog ecosystemTech talk with Antmicro - Building an open source system verilog ecosystem
Tech talk with Antmicro - Building an open source system verilog ecosystem
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
Design and Testing Challenges for Chiplet Based Design: Assembly and Test ViewDesign and Testing Challenges for Chiplet Based Design: Assembly and Test View
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
 
AI on the Edge
AI on the EdgeAI on the Edge
AI on the Edge
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Intel Corporation - BA401
Intel Corporation - BA401Intel Corporation - BA401
Intel Corporation - BA401
 
4th generation computer
4th generation computer4th generation computer
4th generation computer
 
RIOT: towards open source, secure DevOps on microcontroller-based IoT
RIOT: towards open source, secure DevOps on microcontroller-based IoTRIOT: towards open source, secure DevOps on microcontroller-based IoT
RIOT: towards open source, secure DevOps on microcontroller-based IoT
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報GTC 2018 で発表された自動運転最新情報
GTC 2018 で発表された自動運転最新情報
 
JETSON : AI at the EDGE
JETSON : AI at the EDGEJETSON : AI at the EDGE
JETSON : AI at the EDGE
 
vlsi design summer training ppt
vlsi design summer training pptvlsi design summer training ppt
vlsi design summer training ppt
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
 
AI talk at CogX 2018
AI talk at CogX 2018AI talk at CogX 2018
AI talk at CogX 2018
 
What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
 
Introduction to VLSI
Introduction to VLSI Introduction to VLSI
Introduction to VLSI
 
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
 
Resent intel microprocessor
Resent intel microprocessorResent intel microprocessor
Resent intel microprocessor
 

Similar to “A New Golden Age for Computer Architecture: Processor Innovation to Enable Ubiquitous AI,” a Keynote Presentation from David Patterson

CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
AshokRachapalli1
 
CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
AshokRachapalli1
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
Perry Lea
 
software engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.pptsoftware engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.ppt
SomnathMule5
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
karan saini
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
Chaudhary Manzoor
 
Risc cisc Difference
Risc cisc DifferenceRisc cisc Difference
Risc cisc Difference
Sehrish Asif
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
zionsaint
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
Manish Prajapati
 
Lect-3 Evaluation of computer architecture.pptx.pdf
Lect-3 Evaluation of computer architecture.pptx.pdfLect-3 Evaluation of computer architecture.pptx.pdf
Lect-3 Evaluation of computer architecture.pptx.pdf
harm4202
 
Unit I_MT2301.pdf
Unit I_MT2301.pdfUnit I_MT2301.pdf
Unit I_MT2301.pdf
Kannan Kanagaraj
 
CISC.pptx
CISC.pptxCISC.pptx
CISC.pptx
UmaimaAsif3
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and present
Muhammad Danish Badar
 
Risc processors
Risc processorsRisc processors
Risc processors
Ganesh Rocky
 
Sistem mikroprosessor
Sistem mikroprosessorSistem mikroprosessor
Sistem mikroprosessor
fahmihafid
 
Computer Evolution
Computer EvolutionComputer Evolution
Computer Evolution
Education Front
 
System On Chip
System On ChipSystem On Chip
System On Chip
A B Shinde
 
Dsdco IE: RISC and CISC architectures and design issues
Dsdco IE: RISC and CISC architectures and design issuesDsdco IE: RISC and CISC architectures and design issues
Dsdco IE: RISC and CISC architectures and design issues
Home
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
dhivyak49
 

Similar to “A New Golden Age for Computer Architecture: Processor Innovation to Enable Ubiquitous AI,” a Keynote Presentation from David Patterson (20)

CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
 
CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
 
software engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.pptsoftware engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.ppt
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Risc cisc Difference
Risc cisc DifferenceRisc cisc Difference
Risc cisc Difference
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Lect-3 Evaluation of computer architecture.pptx.pdf
Lect-3 Evaluation of computer architecture.pptx.pdfLect-3 Evaluation of computer architecture.pptx.pdf
Lect-3 Evaluation of computer architecture.pptx.pdf
 
Unit I_MT2301.pdf
Unit I_MT2301.pdfUnit I_MT2301.pdf
Unit I_MT2301.pdf
 
CISC.pptx
CISC.pptxCISC.pptx
CISC.pptx
 
Comparison between computers of past and present
Comparison between computers of past and presentComparison between computers of past and present
Comparison between computers of past and present
 
Risc processors
Risc processorsRisc processors
Risc processors
 
Sistem mikroprosessor
Sistem mikroprosessorSistem mikroprosessor
Sistem mikroprosessor
 
Computer Evolution
Computer EvolutionComputer Evolution
Computer Evolution
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Dsdco IE: RISC and CISC architectures and design issues
Dsdco IE: RISC and CISC architectures and design issuesDsdco IE: RISC and CISC architectures and design issues
Dsdco IE: RISC and CISC architectures and design issues
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 

More from Edge AI and Vision Alliance

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
Edge AI and Vision Alliance
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
Edge AI and Vision Alliance
 
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
Edge AI and Vision Alliance
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
Edge AI and Vision Alliance
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
Edge AI and Vision Alliance
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
Edge AI and Vision Alliance
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
 
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

“A New Golden Age for Computer Architecture: Processor Innovation to Enable Ubiquitous AI,” a Keynote Presentation from David Patterson

  • 1. 1
  • 2. Lessons of last 50 years of Computer Architecture 1. Raising the hardware/software interface creates opportunities for architecture innovation ○ e.g., C, Python, TensorFlow, PyTorch 2. Ultimately benchmarks and the marketplace settles architecture debates ○ e.g., SPEC, TPC, MLPerf, ... 2
  • 3. Instruction Set Architecture? • Software talks to hardware using a vocabulary – Words called instructions – Vocabulary called instruction set architecture (ISA) • Most important interface since determines software that can run on hardware – Software is distributed as instructions 33
  • 4. IBM Compatibility Problem in Early 1960s By early 1960’s, IBM had 4 incompatible lines of computers! 701 ➡ 7094 650 ➡ 7074 702 ➡ 7080 1401 ➡ 7010 Each system had its own: ▪ Instruction set architecture (ISA) ▪ I/O system and Secondary Storage: magnetic tapes, drums and disks ▪ Assemblers, compilers, libraries,... ▪ Market niche: business, scientific, real time, ... IBM System/360 – one ISA to rule them all 4
  • 5. Control versus Datapath ▪ Processor designs split between datapath, where numbers are stored and arithmetic operations computed, and control, which sequences operations on datapath ▪ Biggest challenge for computer designers was getting control correct ▪ Maurice Wilkes invented the idea of microprogramming to design the control unit of a processor* ▪ Logic expensive vs. ROM or RAM ▪ ROM cheaper and faster than RAM ▪ Control design now programming 5 * "Micro-programming and the design of the control circuits in an electronic digital computer," M. Wilkes, and J. Stringer. Mathematical Proc. of the Cambridge Philosophical Society, Vol. 49, 1953.
  • 6. Microprogramming in IBM 360 Model M30 M40 M50 M65 Datapath width 8 bits 16 bits 32 bits 64 bits Microcode size 4k x 50 4k x 52 2.75k x 85 2.75k x 87 Clock cycle time (ROM) 750 ns 625 ns 500 ns 200 ns Main memory cycle time 1500 ns 2500 ns 2000 ns 750 ns Price (1964 $) $192,000 $216,000 $460,000 $1,080,000 Price (2018 $) $1,560,000 $1,760,000 $3,720,000 $8,720,000 6Fred Brooks, Jr.
  • 7. IC Technology, Microcode, and CISC ▪ Logic, RAM, ROM all implemented using same transistors ▪ Semiconductor RAM ≈ same speed as ROM ▪ With Moore’s Law, memory for control store could grow ▪ Since RAM, easier to fix microcode bugs ▪ Allowed more complicated ISAs (CISC) ▪ Minicomputer (TTL server) example: -Digital Equipment Corp. (DEC) -VAX ISA in 1977 ▪ 5K x 96b microcode 7
  • 8. Microprocessor Evolution ▪ Rapid progress in 1970s, fueled by advances in MOS technology, imitated minicomputers and mainframe ISAs ▪ “Microprocessor Wars”: compete by adding instructions (easy for microcode), justified given assembly language programming ▪ Intel iAPX 432: Most ambitious 1970s micro, started in 1975 ▪ 32-bit capability-based, object-oriented architecture, custom OS written in Ada ▪ Severe performance, complexity (multiple chips), and usability problems; announced 1981 ▪ Intel 8086 (1978, 8MHz, 29,000 transistors) ▪ “Stopgap” 16-bit processor, 52 weeks to new chip ▪ ISA architected in 3 weeks (10 person weeks) assembly-compatible with 8 bit 8080 ▪ IBM PC 1981 picks Intel 8088 for 8-bit bus (and Motorola 68000 was late) 8 ▪ Estimated PC sales: 250,000 ▪ Actual PC sales: 100,000,000 ⇒ 8086 “overnight” success ▪ Binary compatibility of PC software ⇒ bright future for 8086
  • 9. Analyzing Microcoded Machines 1980s ▪ HW/SW interface rises from assembly to HLL programming ▪ Compilers now source of measurements ▪ John Cocke group at IBM ▪ Worked on a simple pipelined processor, 801 minicomputer (ECL server), and advanced compilers inside IBM ▪ Ported their compiler to IBM 370, only used simple register-register and load/store instructions (similar to 801) ▪ Up to 3X faster than existing compilers that used full 370 ISA! ▪ Emer and Clark at DEC in early 1980s* ▪ Found VAX 11/780 average clock cycles per instruction (CPI) = 10! ▪ Found 20% of VAX ISA ⇒ 60% of microcode, but only 0.2% of execution time! 9 * "A Characterization of Processor Performance in the VAX-11/780," J. Emer and D.Clark, ISCA, 1984. John Cocke
  • 10. From CISC to RISC ▪ Use RAM for instruction cache of user-visible instructions ▪ Software concept: Compiler vs. Interpreter ▪ Contents of fast instruction memory change to what application needs now vs. ISA interpreter ▪ Use simple ISA ▪ Instructions as simple as microinstructions, but not as wide ▪ Enable pipelined implementations ▪ Compiled code only used a few CISC instructions anyways ▪ Chaitin’s register allocation scheme* benefits load-store ISAs 10 *Chaitin, Gregory J., et al. "Register allocation via coloring." Computer languages 6.1 (1981), 47-57.
  • 11. Berkeley and Stanford RISC Chips 11 Fitzpatrick, Daniel, John Foderaro, Manolis Katevenis, Howard Landman, David Patterson, James Peek, Zvi Peshkess, Carlo Séquin, Robert Sherburne, and Korbin Van Dyke. "A RISCy approach to VLSI." ACM SIGARCH Computer Architecture News 10, no. 1 (1982) Hennessy, John, Norman Jouppi, Steven Przybylski, Christopher Rowen, Thomas Gross, Forest Baskett, and John Gill. "MIPS: A microprocessor architecture." In ACM SIGMICRO Newsletter, vol. 13, no. 4, (1982).
  • 12. Reduced Instruction Set Computer? • Reduced Instruction Set Computer (RISC) vocabulary uses simple words (instructions) • RISC reads 25% more instructions since simple vs. Complex Instruction Set Computer (CISC) e.g., Intel 80x86 • But RISC reads them 5 times faster • Net is 4 times faster 1212
  • 13. ▪ CISC executes fewer instructions / program (≈ 3/4X instructions) but many more clock cycles per instruction (≈ 6X CPI) ⇒ RISC ≈ 4X faster than CISC “Performance from architecture: comparing a RISC and a CISC with similar hardware organization,” Dileep Bhandarkar and Douglas Clark, Proc. Symposium, ASPLOS, 1991. Time = Instructions Clock cycles __Time___ Program Program * Instruction * Clock cycle “Iron Law” of Processor Performance: How RISC can win 13
  • 14. How to Measure Performance? 14 ▪ Instruction rate (MIPS, millions of instructions per second) + Easy to understand, bigger is better - But can’t compare different ISAs, higher MIPS can be slower ▪ Time to run toy program (puzzle) + Can compare different ISAs, shorter time always faster - But not representative of real programs ▪ Synthetic programs (Whetstone, Dhrystone) + Tries to match characteristics of real programs - Compilers can remove most code, less realistic over time ▪ Benchmark suite relative to reference computer (SPEC) + Real programs, bigger is better, geometric mean fair - Must update every 2-3 years to stay uptodate ⇒ organization
  • 15. CISC vs. RISC Today PC Era ▪ Hardware translates x86 instructions into internal RISC instructions (Compiler vs Interpreter) ▪ Then use any RISC technique inside MPU ▪ > 350M / year ! ▪ x86 ISA eventually dominates servers as well as desktops PostPC Era: Client/Cloud ▪ IP in SoC vs. MPU ▪ Value die area, energy as much as performance ▪ > 20B total / year in 2017 ▪ 99% Processors today are RISC ▪ Marketplace settles debate 15
  • 16. Lessons from RISC vs CISC ● Less is More ○ It’s harder to come up with simple solutions, but they accelerate progress ● Importance of the software stack vs the hardware ○ If compiler can’t generate it, who cares? ● Importance of good benchmarks ○ Hard to make progress if you can’t measure it ○ For better or for worse, benchmarks shape a field ● Take the time for a quantitative approach vs rely on intuition to start quickly 16
  • 17. Moore’s Law Slowdown in Intel Processors 17 Moore, Gordon E. "No exponential is forever: but ‘Forever’ can be delayed!" Solid-State Circuits Conference, 2003. 15XWe’re now in the Post Moore’s Law Era
  • 18. Technology & Power: Dennard Scaling Power consumption based on models in Esmaeilzadeh 18 Energy scaling for fixed task is better, since more and faster transistors Power consumption based on models in “Dark Silicon and the End of Multicore Scaling,” Hadi Esmaelizadeh, ISCA, 2011
  • 19. End of Growth of Single Program Speed? 19 End of the Line? 2X / 20 yrs (3%/yr) RISC 2X / 1.5 yrs (52%/yr) CISC 2X / 3.5 yrs (22%/yr) End of Dennard Scaling ⇒ Multicore 2X / 3.5 yrs (23%/yr) Am- dahl’s Law ⇒ 2X / 6 yrs (12%/yr) Based on SPECintCPU. Source: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, 6/e. 2018
  • 20. Domain Specific Architectures (DSAs) • Achieve higher efficiency by tailoring the architecture to characteristics of the domain • Not one application, but a domain of applications • Different from strict ASIC since still runs software 20
  • 21. Why DSAs Can Win (no magic) Tailor the Architecture to the Domain • More effective parallelism for a specific domain: • SIMD vs. MIMD • VLIW vs. Speculative, out-of-order • More effective use of memory bandwidth • User controlled versus caches • Eliminate unneeded accuracy • IEEE replaced by lower precision FP • 32-64 bit integers to 8-16 bit integers • Domain specific programming language provides path for software 21
  • 22. Deep learning is causing a machine learning revolution From “A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution.” Dean, J., Patterson, D., & Young, C. (2018). IEEE Micro, 38(2), 21-29.
  • 23. Tensor Processing Unit v1 (Announced May 2016) Google-designed chip for neural net inference In production use for 3 years: used by billions on search queries, for neural machine translation, for AlphaGo match, … A Domain-Specific Architecture for Deep Neural Networks, Jouppi, Young, Patil, Patterson, Communications of the ACM, September 2018
  • 24. TPU: High-level Chip Architecture ▪ The Matrix Unit: 65,536 (256x256) 8-bit multiply-accumulate units ▪ 700 MHz clock rate ▪ Peak: 92T operations/second ▪ 65,536 * 2 * 700M ▪ >25X as many MACs vs GPU ▪ >100X as many MACs vs CPU ▪ 4 MiB of on-chip Accumulator memory + 24 MiB of on-chip Unified Buffer (activation memory) ▪ 3.5X as much on-chip memory vs GPU ▪ 8 GiB of off-chip weight DRAM memory 24
  • 25. Perf/Watt TPU vs CPU & GPU 25 83 29 Using production applications vs contemporary CPU and GPU
  • 27. The Launching of “1000 Chips” ● Intel acquires DSA chip companies ● Nervana: ($0.4B) August 2016 ● Movidius: ($0.4B) September 2016 ● MobilEye: ($15.3B) March 2017 ● Habana: ($2.0B) December 2019 ● Alibaba, Amazon inference chips ● >100 startups ($2B) launch on own bets ● Dataflow architecture: Graphcore, ... ● Asynchronous logic: Wave Computing, ... ● Analog computing: Mythic, … ● Wafer Scale computer: Cerebras ● Coarse-Grained Reconfigurable Arch: SambaNova, ... 27 Helen of Troy by Evelyn De Morgan
  • 28. How to Measure ML Performance? 28 Operation rate (GOPS, billions of operations per second) Easy to understand, bigger is better But peak rates not for same program Operations can vary between DSAs (FP vs int, 4b/8b/16b/32b) Time to run old DNN (MNIST, AlexNet) Can compare different ISAs, shorter time always faster But not representative of today’s DNNs Benchmark suite relative to reference computer (MLPerf) Real programs, bigger is better, same DNN model, same data set, geometric mean fair comparison, batch size ranges set Must update every 1-2 years to stay uptodate ⇒ organization
  • 29. Embedded Computing and ML ● ML becoming one of the most important workloads ● But lots of applications don’t need highest performance ○ For many, just enough at low cost ● Microcontrollers most popular processors ○ Cheap, Low Power, fast enough for many apps ● Despite importance, no good microprocessor benchmarks ○ Still quote synthetic programs: Dhrystone, CoreMarks ● Decided to try to fix ● EmBench: better for all embedded, includes ML benchmarks also
  • 30. 7 Lessons for Embench 1. Embench must be free 2. Embench must be easy to port and run 3. Embench must be a suite of real programs 4. Embench must have a supporting organization to maintain it 5. Embench must report a single summarizing score 6. Embench should summarize using geometric mean and std. dev. 7. Embench must involve both academia and industry
  • 31. The Plan ● Jan - Jun 2019: Small group created the initial version − Dave Patterson, Jeremy Bennett, Palmer Dabbelt, Cesare Garlati − mostly face-to-face ● Jun 2019 – Feb 2020: Wider group open to all − under FOSSi, with mailing list and monthly conference call − see www.embench.org ● Feb 2020: Launch Embench 0.5 at Embedded World ● Present: Working on Embench 0.6
  • 32. Baseline Data Name Comments Orig Source C LOC code size data size time (ms) branch memory compute aha-mont64 Montgomery multiplication AHA 162 1,052 0 4,000 low low high crc32 CRC error checking 32b MiBench 101 230 1,024 4,013 high med low cubic Cubic root solver MiBench 125 2,472 0 4,140 low med med edn More general filter WCET 285 1,452 1,600 3,984 low high med huffbench Compress/Decompress Scott Ladd 309 1,628 1,004 4,109 med med med matmult-int Integer matrix multiply WCET 175 420 1,600 4,020 med med med minver Matrix inversion WCET 187 1,076 144 4,003 high low med nbody Satellite N body, large data CLBG 172 708 640 3,774 med low high nettle-aes Encrypt/decrypt Nettle 1,018 2,880 10,566 3,988 med high low nettle-sha256 Crytographic hash Nettle 349 5,564 536 4,000 low med med nsichneu Large - Petri net WCET 2,676 15,042 0 4,001 med high low picojpeg JPEG MiBench2 2,182 8,036 1,196 3,748 med med high qrduino QR codes Github 936 6,074 1,540 4,210 low med med sglib-combined Simple Generic Library for C SGLIB 1,844 2,324 800 4,028 high high low slre Regex SLRE 506 2,428 126 3,994 high med med st Statistics WCET 117 880 0 4,151 med low high statemate State machine (car window) C-LAB 1,301 3,692 64 4,000 high high low ud LUD composition Int WCET 95 702 0 4,002 med low high
  • 34. What Affects Embench Results? ● Instruction Set Architecture: Arm, ARC, RISC-V, AVR, ... − extensions: ARM: v7, Thumb2, …, RV32I, M, C, ... ● Compiler: open (GCC, LLVM) and proprietary (IAR, …) − which optimizations included: Loop unrolling, inlining procedures, minimize code size, … − older ISAs likely have more mature and better compilers? ● Libraries − open (GCC, LLVM) and proprietary (IAR, Sega, ...) ● Embench excludes libraries when sizing − they can swamp code size for embedded benchmark
  • 35. Impact of optimizations of GCC on RISC-V: Speed ● -msave-restore invokes functions to save and restore registers at procedure entry and exit instead of inline code of stores and loads − ISA Alternative would be Store Multiple instruction and Load Multiple instruction PULP RI5CY RV32IMC GCC 10.1.0 (higher is faster)
  • 36. Impact of optimizations of GCC on RISC-V: Size ● -msave-restore invokes functions to save and restore registers at procedure entry and exit instead of inline code of stores and loads ● ISA Alternative would be Store Multiple instruction and Load Multiple instruction PULP RI5CY RV32IMC GCC 10.1.0 (lower is smaller)
  • 37. Comparing Architectures with GCC: Speed ● GCC 10.2.0 − higher is faster Arm Cortex-M4, no FPU PULP RI5CY RV32IMC GCC 10.2.0 (soft core in FPGA
  • 38. Comparing Architectures with GCC: Size ● GCC 10.2.0 − lower is smaller Arm Cortex-M4, no FPU PULP RI5CY RV32IMC GCC 10.2.0 (soft core in FPGA)
  • 39. Comparing Compilers GCC v LLVM: Speed ● PULP RI5CY RV32IMC − higher is faster ● Clang/LLVM variations − -msave-restore enabled by default with ‑Os − -Oz for further code size optimization GCC 10.2.0 Clang/LLVM 11.0.0 rc
  • 40. Comparing Compilers GCC v LLVM: Size ● PULP RI5CY RV32IMC − lower is smaller ● Clang/LLVM variations − -msave-restore enabled by default with ‑Os − -Oz for further code size optimization GCC 10.2.0 Clang/LLVM 11.0.0 rc
  • 41. Code Size over GCC versions
  • 42. Lots More to Explore with Embench ● More compilers: LLVM, IAR, … − and more optimizations ● More architectures: MIPS, Tensilica, ARMv8, RV64I, ... − and more instruction extensions: bit manipulation, vector, floating point, … ● More processors: ARM M7, M33, M24, RISC-V Rocket, BOOM, ... ● Context switch times ● In later versions of Embench: Interrupt Latency − floating point programs for larger machines in Embench 0.6 ● Published results in embench-iot-results repository ● Want to help? Email info@embench.org
  • 43. Benchmarking Lessons? 1) Must show code size with performance so as to get meaningful results 2) Importance of geometric standard deviation as well as geometric mean 3) More mature architecture have more mature compilers
  • 44. Conclusions ● End of Dennard Scaling, slowing of Moore’s Law ⇒ DSA ● ML DSAs need HW/SW codesign ● To measure progress, need good benchmarks, ● MLPerf for data center and high end edge ● For microcontrollers, Embench 0.5 suite is already better than synthetic programs Dhrystone and CoreMark, and will get better − Many more studies: more ISAs, more compilers, more cores, ● Let us know if you’d like to help: Email info@embench.org