SlideShare a Scribd company logo
1 of 66
Download to read offline
L01-1
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Introduction and Applications
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
February 6, 2023
L01-2
Sze and Emer
ACM’s Celebration of 50 Years of the ACM Turing Award
(June 2017)
“Compute has been the oxygen of deep learning”
– Ilya Sutskever, Research Director of Open AI
February 6, 2023
L01-3
Sze and Emer
Sze and Emer
Why is Deep Learning Hot Now?
136,000 photos
uploaded every
minute
2.5 Petabytes
of customer
data hourly
500 hours of
video uploaded
every minute
Big Data
Availability
GPU
Acceleration
New ML
Techniques
February 6, 2023
L01-4
Sze and Emer
Sze and Emer
ImageNet Challenge
Image Classification Task
1.2M training images
1000 object categories
Object Detection Task
456k training images
200 object categories
February 6, 2023
L01-5
Sze and Emer
Sze and Emer
ImageNet: Image Classification Task
0
5
10
15
20
25
30
2010 2011 2012 2013 2014 2015 Human
Top 5 Classification Error (%)
large error rate reduction
due to Deep Learning
[Russakovsky, IJCV 2015]
Deep Learning-based designs
Hand-crafted feature-
based designs
February 6, 2023
L01-6
Sze and Emer
Sze and Emer
GPU Usage for ImageNet Challenge
February 6, 2023
L01-7
Sze and Emer
Sze and Emer
Deep Learning on Images and Video
• Image Classification
• Object Detection
• Object Localization
• Image Segmentation
• Action Recognition
• Image Generation
February 6, 2023
Computer Vision (6.8301/0[6.819/869])
L01-8
Sze and Emer
Sze and Emer
Deep Learning for Speech and Language
• Speech Recognition
• Natural Language Processing
• Speech Translation
• Audio Generation
February 6, 2023
Speech (6.8620[6.345]), Natural Language Processing (6.8610/1[6.864/806])
L01-9
Sze and Emer
Sze and Emer
Deep Learning on Games
Google DeepMind AlphaGo (2016)
Training on 160,000 games required 3 weeks on 50 GPUs!
During play (inference) AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs.
(A distributed version used 40 search threads, 1,202 CPUs and 176 GPUs.)
February 6, 2023
L01-10
Sze and Emer
Sze and Emer
Deep Learning on Games
Deepmind’s AlphaStar defeats top professional players at StarCraft II
(January 2019)
https://www.deepmind.com/blog/alphastar-
mastering-the-real-time-strategy-game-starcraft-ii
StarCraft, considered to be one of the most
challenging Real-Time Strategy (RTS) games
and one of the longest-played e-sports of all
time, has emerged by consensus as a
“grand challenge” for AI research.
Requires overcoming research challenges
including Game Theory, Imperfect
Information, Long term planning, Real-
time, and Large action space
Training required 14 days, using 16 TPUs for each agent!
Each agent experienced up to 200 years of real-time play.
February 6, 2023
L01-11
Sze and Emer
Sze and Emer
Deep Learning for Medical Applications
Cancer Detection
[Jermyn, JBO 2016]
February 6, 2023
[Mikael, JCO 2023]
[Yala, Radiology 2019]
Brain
Breast Lung
[Stokes, Cell 2020]
Antibiotic Discovery
Machine Learning for Healthcare 6.7930[6.871]
L01-12
Sze and Emer
Sze and Emer
February 6, 2023
• Structure of protein provides insight on its
function, which are important for
development of disease treatments (e.g.,
proteins associated with SARS-CoV-2)
• Current approaches involve years of work
and multi-million dollars equipment
• Protein Folding Problem: Predict 3D
structure of protein based on 1D sequence
of amino acids.
• Estimated 10300 possibilities à Grand
challenge in Biology for the past 50
years!
Cryo-electron microscopy
Image source: Quora
Image source: Dill et al., Annu. Rev. Biophys. 2008 37:289
Deep Learning in Biology
L01-13
Sze and Emer
Sze and Emer
Deep Learning in Biology
February 6, 2023
Source: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
DeepMind’s AlphaFold2 recognized as solution to
protein folding challenge (November 2020)
Trained on publicly available data consisting of ~170,000 protein structures
together with large databases containing protein sequences of unknown structure.
It uses approximately 16 TPUv3s (which is 128 TPUv3 cores or roughly
equivalent to ~100-200 GPUs) run over a few weeks.
L01-14
Sze and Emer
Sze and Emer
Deep Learning in Fighting COVID-19
“A project using some of the most powerful supercomputers on the planet --
…-- is running millions of simulations, training a machine learning system to
identify the factors that might make a molecule a good candidate….”
https://www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=300481&org=NSF&from=news
February 6, 2023
L01-15
Sze and Emer
Sze and Emer
Large Language Models: ChatGPT
February 6, 2023 Slide credit: Liane Bernstein
L01-16
Sze and Emer
Sze and Emer
Computing Cost of ChatGPT (Training)
• ChatGPT is based on a variant of GPT-3 [Brown, NeurIPS 2020]
• GPT-3 has 96-layers, 175 billion parameters and requires
3.14x1023 FLOPS of computing for training
• It would take 355 years to train GPT-3 on a Tesla V100 GPU
• It would cost ~$4,600,000 to train GPT-3 on using the lowest cost
GPU cloud provider
February 6, 2023
Source: https://lambdalabs.com/blog/demystifying-gpt-3
L01-17
Sze and Emer
Sze and Emer
Computing Cost of ChatGPT (Inference)
February 6, 2023
https://twitter.com/sama/status/1599671496636780546 https://twitter.com/sama/status/1599669571795185665
Estimated monthly cost of
$1.5 to $8 million!
Source: https://medium.com/swlh/3-questions-
puzzled-me-about-openais-chatgpt-and-here-is-
what-i-learned-1dda74b5f6db
L01-18
Sze and Emer
Sze and Emer
Compute for Training versus Inference
• Training high cost per iteration, but low frequency
• Inference low cost per iteration, but high frequency
• Regarding computing at Google:
– “Across all three years [2019-2021], about three-fifths of the ML energy use was for
inference, and two-fifths were for training” [Patterson, Computer, 2022]
February 6, 2023
• Regarding computing at Meta:
– “…trillions of inference per day across Meta’s
data centers. … we observe a rough power
capacity breakdown of 10:20:70 for AI
infrastructures devoted to the three key phases
— Experimentation, Training, and Inference”
[Wu, MLSys 2022]
L01-19
Sze and Emer
Sze and Emer
19
Compute Demands Growing Exponentially
L01-20
Sze and Emer
Sze and Emer
20
Compute Demands Growing Exponentially
Source: https://www.economist.com/technology-quarterly/2020/06/11/the-cost-of-training-machines-is-becoming-a-problem
AlexNet to AlphaGo Zero: A 300,000x Increase in Compute
L01-21
Sze and Emer
Sze and Emer
Compute Demands for Deep Neural Networks
[Strubell, ACL 2019]
[Strubell, ACL 2019]
February 6, 2023
L01-22
Sze and Emer
Sze and Emer
Processing at “Edge” instead of the “Cloud”
Communication Privacy Latency
February 6, 2023
L01-23
Sze and Emer
Sze and Emer
Deep Learning for Self-Driving Cars
February 6, 2023
L01-24
Sze and Emer
Sze and Emer
Compute Challenges for Self-Driving Cars
(Feb 2018)
Cameras and radar generate ~6
gigabytes of data every 30 seconds.
Prototypes use around 2,500 Watts.
Generates wasted heat and some
prototypes need water-cooling!
February 6, 2023
L01-25
Sze and Emer
Sze and Emer
Carbon Footprint of Self-Driving Cars
February 6, 2023
[Sudhakar, IEEE Micro 2023]
L01-26
Sze and Emer
Sze and Emer
Existing Processors Consume Too Much Power
< 1 Watt > 10 Watts
February 6, 2023
L01-27
Sze and Emer
Sze and Emer
Need for Specialized Computing
Slow down of Moore’s Law and Dennard Scaling
General purpose microprocessors not getting faster or more efficient
February 6, 2023
Domain specific hardware
needed for significant
improvement in
speed and energy-efficiency
à redesign computing
hardware from ground up!
L01-28
Sze and Emer
Sze and Emer
Challenges and Opportunities
• Define the domain and degree of flexibility required
• Requires understanding of domain, variants of algorithms (identify which
ones are important), and resulting workloads
• Handle heterogenous computing at the system level
• Handle heterogenous devices (emerging device technology)
• May have tighter resource constraints since hardware cannot be used for
other applications
• Co-design across algorithms and hardware
• Domain specific languages to program specialized hardware
• Tools for rapid evaluation and prototyping
February 6, 2023
L01-29
Sze and Emer
Sze and Emer
Chips Off the Old Block: Computers Are
Taking Design Cues From Human Brains
(September 16, 2017)
After training a speech-recognition algorithm, for
example, Microsoft offers it up as an online service, and
it actually starts identifying commands that people
speak into their smartphones. G.P.U.s are not quite as
efficient during this stage of the process. So, many
companies are now building chips specifically to
do what the other chips have learned.
Google built its own specialty chip, a Tensor
Processing Unit, or T.P.U. Nvidia is building a similar
chip. And Microsoft has reprogrammed specialized
chips from Altera, which was acquired by Intel, so that
it too can run neural networks more easily.
Software Companies are Building HW
February 6, 2023
L01-30
Sze and Emer
Sze and Emer
HW Beyond Cloud Computing
February 6, 2023 [Also Nvidia, Intel, Qualcomm…]
L01-31
Sze and Emer
Sze and Emer
Startups Building Custom Hardware
February 6, 2023
Big Bets On A.I. Open a New Frontier for
Chips Start-Ups, Too. (January 14, 2018)
“Today, at least 45 start-ups are working
on chips that can power tasks like speech
and self-driving cars, and at least five of
them have raised more than $100 million
from investors. Venture capitalists
invested more than $1.5 billion in chip
start-ups last year, nearly doubling the
investments made two years ago, according
to the research firm CB Insights.”
L01-32
Sze and Emer
Sze and Emer
Class Overview
February 6, 2023
L01-33
Sze and Emer
Sze and Emer
Course Outline
• Overview of Deep Neural Networks (DNNs)
• DNN Development Resources
• DNNs on Programmable Hardware
• DNN Accelerator Architecture
• DNN Model and Hardware Co-Design
• Advanced Technologies for DNN
February 6, 2023
L01-34
Sze and Emer
Sze and Emer
Takeaways
• Know the key computations used by DNNs
• Be familiar with how DNN computations are mapped to various
hardware platforms
• Understand the tradeoffs between various architectures and platforms
• Be able to evaluate different DNN accelerator implementations with
benchmarks and comparison metrics
• Have an appreciation for the utility of various optimization and
approximation approaches
• Be able to distill the key attributes of recent implementation trends and
opportunities
February 6, 2023
L01-35
Sze and Emer
Sze and Emer
Course Objective
By the end of this course, we want you to be able understand a new
design in terms of attributes like:
– Order of computation
– Partitioning of computation
– Flow of data for computation
– Data movement in the storage hierarchy
– Data attribute specific optimizations
– Exploiting algorithm/hardware co-design
– Degree of flexibility
February 6, 2023
L01-36
Sze and Emer
Sze and Emer
Course Staff and Contact Info
• Instructors (Office hours by request)
– Joel Emer (jsemer@mit.edu)
– Vivienne Sze (sze@mit.edu)
• Teaching Assistants (Office hours TBD)
– Michael Gilbert (gilbertm@mit.edu)
– Jaeyeon Won (jaeyeon@mit.edu)
• Schedule
– Lectures: MW 1PM-2:30PM – 4-163
– Recitations: F 10AM-11AM – TBD
• Review Lectures (ML & Computer
Architecture Basics) + Tools for Labs
• Course Website: http://csg.csail.mit.edu/6.5930/
Slide Contributors: Yu-Hsin Chen and Tien-Ju Yang
(http://eyeriss.mit.edu/tutorial.html)
February 6, 2023
Joel Emer
Michael Gilbert
Vivienne Sze
Jaeyeon Won
L01-37
Sze and Emer
Sze and Emer
Course Requirements and Materials
• Pre-requisites
– 6.3000[6.003] (Signal Processing) or 6.3900[6.036] (Intro to Machine Learning)
– 6.1910[6.004] (Computation Structures)
• We will use Python and PyTorch
– PyTorch website: https://pytorch.org/
– Introduction to PyTorch Code Examples: https://cs230.stanford.edu/blog/pytorch/
• Course Textbook/Readings
– Book “Efficient Processing of Deep Neural Networks”
• https://doi.org/10.1007/978-3-031-01766-7 (download free on MIT network)
• We welcome feedback (including errata) on Piazza thread
– Selected papers published in past 3 to 5 years.
• Course Handouts (uploaded on website)
February 6, 2023
L01-38
Sze and Emer
Sze and Emer
Related Classes
• Computer System Architecture (6.5900[6.823])
• Digital Circuits and Systems (6.6010[6.374])
• Digital Image Processing (6.7010[6.344])
• Machine Learning (6.3900[6.036]/ 6.7900[6.867])
• Computer Vision (6.8301[6.819]/6.8300[6.869])
February 6, 2023
L01-39
Sze and Emer
Sze and Emer
Survey
https://forms.gle/sMcUaxknErK7VgAu7
Due: Wed, Feb 8
February 6, 2023
L01-40
Sze and Emer
Sze and Emer
Recitations
• Machine Learning Review (Feb 10)
• Computer Architecture Review (Feb 17, Feb 24)
• Tools
– PyTorch (Feb 10)
– Accelergy (Mar 10)
– Timeloop (Mar 17)
– Sparseloop (Mar 24)
• Lab Overview (on same week as Lab released)
• Note: Above dates are tentative. Will announce updates on website and Piazza.
February 6, 2023
L01-41
Sze and Emer
Sze and Emer
Labs
• Lab 1: Inference + DNN Model Design (Release Today)
– Release: Feb 6 – Due: Feb 27 (3 weeks)
• Lab 2: Kernel + Tiling Optimization
– Release: Feb 27 – Due: March 13 (2 weeks)
• Lab 3: Hardware Design & Mapping
– Release: March 13 – Due: April 5 (2.5 weeks + vacation)
• Lab 4: Sparsity
– Release: April 5 – Due: April 19 (2 weeks)
February 6, 2023
L01-42
Sze and Emer
Sze and Emer
Questions
• For each lecture (excluding this one) we ask that each student either ask a
question or answer another student’s question related to the lecture in the
Piazza folder for that lecture.
• Multiple (distinct) answers to a single question are permitted.
• If you ask a question in class, please also submit the question to Piazza (for
grading purposes). Note: providing the lecturer’s answer doesn’t count.
• Participation will be counted for all Q&A activity prior to the weekend
following the lecture.
February 6, 2023
L01-43
Sze and Emer
Sze and Emer
ML Arxiv Papers per Year
February 6, 2023
“Number of ML Arxiv papers published per year beat Moore's Law!… ”
– Jeff Dean (Google)
L01-44
Sze and Emer
Sze and Emer
Paper Review
• We will be forming a program committee of the (fictional) Deep Learning Hardware
(DLH) Conference 2023
– Learn how to read papers (critique and extract information)
– Use concepts from class to analyze and gain deep understanding of paper
– Enable wider coverage of papers
– Gain insight of how paper decisions are made
• Schedule
– March 10 – Papers released
– March 17– Rank order topics of interest & Enter conflicts (e.g., advisor or past co-authors)
– March 20 – Papers assigned (check for undetected conflicts)
– April 03, 19, 26, May 03, 08 – Paper reviews sessions
– Reviews due at 11:59 pm ET the day before the review session
February 6, 2023
L01-45
Sze and Emer
Sze and Emer
Paper Sessions
• Dataflows (April 03)
• Sparsity I (April 19)
• Sparsity II (April 26)
• Bit Precision (May 03)
• Advanced technologies and processing in memory (May 08)
February 6, 2023
L01-46
Sze and Emer
Sze and Emer
Paper Reviews
• All student (6.5930/1) will submit written reviews for 7 papers
– Three in one session (main topic)
– One for each of the other four sessions
– Participate in paper discussion during session and vote on papers
• Students in 6.5930 (Grad Version)
– Present one paper in front of class (lead reviewer)
– Present strengths and weaknesses for three papers
February 6, 2023
L01-47
Sze and Emer
Sze and Emer
Real World versus DLH23 PC
Real World DLH23
Papers per PC
member
20 – 110 papers 3 + 4
Review Time 2 – 5 weeks 2 – 7 weeks
Length of PC
meeting
1 to 2 days 5 classes x 1.5 hours
= 7.5 hours
Number of papers
discussed
60 – 120 25 – 30
Discussion time
per paper
5 – 15 min 12 – 15 min
February 6, 2023
L01-48
Sze and Emer
Sze and Emer
Design Project
• Project
– Choose from a list of suggested projects
– May propose own project (requires formal proposal and approval by course staff)
• Use tools from labs
– PyTorch, Accelergy, Timeloop, Sparseloop
• Teams of two or three (depending on class size)
• Schedule
– March 20 – List of projects released
– April 10 – Submit project selection (or proposal)
– May 08 – Project Report Due
• Poster session or short presentation (depending on class size)
February 6, 2023
L01-49
Sze and Emer
Sze and Emer
Example Design Projects
• Hardware design study of well-known DNN accelerators (e.g., TPU, NVDLA)
and analyze architectural tradeoffs though modeling
• Analyze co-design approaches to gain deeper understanding of impact on
accuracy and efficiency
• Evaluate impact of new technologies (RRAM, Optical)
• Extend tools for improved design exploration and analysis capabilities
February 6, 2023
Goal: Apply concepts and tools from class!
L01-50
Sze and Emer
Sze and Emer
Assignments and Grading
• Grading
– Questions: 5%
– Four Labs: 40% [10% each]
– Paper Reviews: 20%
– Final Design Project: 35%
• All assignments are due by 11:59PM ET on the due date (submitted online)
• Late Policy for Labs:
– You should always submit your labs on time. Nonetheless, since unexpected situations, like illnesses, might occur, you
have a budget of 5 late days throughout the semester.
– The budget is spent in increments of 1 day
– You do not need to inform us about your use of your budget. The course staff will keep track of the days you have spent.
– You will receive zero credit once your budget is expended (Note: please contact course staff regarding extenuating
circumstances)
• No late days for paper reviews and project due to tight timeline
• Labs are to be completed individually, although discussion of course concepts covered in the laboratories is encouraged.
February 6, 2023
L01-51
Sze and Emer
Sze and Emer
Background of
Deep Neural Networks
February 6, 2023
L01-52
Sze and Emer
Sze and Emer
Artificial Intelligence
Artificial Intelligence
“The science and engineering of creating
intelligent machines”
- John McCarthy,
1956
February 6, 2023
L01-53
Sze and Emer
Sze and Emer
Artificial Intelligence
AI and Machine Learning
Machine Learning
“Field of study that gives computers the
ability to learn without being explicitly
programmed”
– Arthur Samuel, 1959
February 6, 2023
L01-54
Sze and Emer
Sze and Emer
Artificial Intelligence
Brain-Inspired Machine Learning
Machine Learning
Brain-Inspired
An algorithm that takes its basic
functionality from our understanding
of how the brain operates
February 6, 2023
L01-55
Sze and Emer
Sze and Emer
How Does the Brain Work?
• The basic computational unit of the brain is a neuron
à 86B neurons in the brain
• Neurons are connected with nearly 1014 – 1015 synapses
• Neurons receive input signal from dendrites and produce output signal along axon,
which interact with the dendrites of other neurons via synaptic weights
• Synaptic weights – learnable & control influence strength
Image Source: Stanford
February 6, 2023
L01-56
Sze and Emer
Sze and Emer
Artificial Intelligence
Spiking-based Machine Learning
Machine Learning
Brain-Inspired
Spiking
February 6, 2023
L01-57
Sze and Emer
Sze and Emer
Spiking Architecture
• Brain-inspired - Integrate and fire
• Example: IBM TrueNorth
[Merolla, Science 2014; Esser, PNAS 2016]
http://www.research.ibm.com/articles/brain-chip.shtml
February 6, 2023
L01-58
Sze and Emer
Sze and Emer
Artificial Intelligence
Machine Learning with Neural Networks
Machine Learning
Brain-Inspired
Spiking Neural
Networks
February 6, 2023
L01-59
Sze and Emer
Sze and Emer
Neural Networks: Weighted Sum
Modified Image Source: Stanford
yj = f wi xi
i
∑ + b
⎛
⎝
⎜
⎞
⎠
⎟
w1x1
w2 x2
w0 x0
x0 w0
synapse
axon
dendrite
axon
neuron
yj
February 6, 2023
L01-60
Sze and Emer
Sze and Emer
Many Weighted Sums
weighted
sum
non-linear
function f(・)
activation
W00
W30
x0
x1
x2
x3
y1
y0
y2
𝑦! = 𝑓 $
"#$
%
𝑊"!×𝑥"
February 6, 2023
L01-61
Sze and Emer
Sze and Emer
Artificial Intelligence
Deep Learning
Machine Learning
Brain-Inspired
Spiking Neural
Networks
Deep
Learning
February 6, 2023
L01-62
Sze and Emer
Sze and Emer
Deep Learning Network
Input:
Image
Output:
“Volvo XC90”
Modified Image Source: [Lee, CACM 2011]
Low Level Features High Level Features
February 6, 2023
L01-63
Sze and Emer
Sze and Emer
DNN Timeline
• 1940s: Neural networks were proposed
• 1960s: Deep neural networks were proposed
• 1989: Neural network for recognizing digits (LeNet)
• 1990s: Hardware for shallow neural nets
– Example: Intel ETANN (1992)
• 2011: Breakthrough DNN-based speech recognition
– Microsoft real-time speech translation
• 2012: DNNs for vision supplanting traditional ML
– AlexNet for image classification
• 2014+: Rise of DNN accelerator research
– Examples: Neuflow, DianNao, etc.
February 6, 2023
L01-64
Sze and Emer
Sze and Emer
Deep Learning Platforms
• CPU
– Intel, ARM, AMD…
• GPU
– NVIDIA, AMD…
• Fine Grained Reconfigurable (FPGA)
– Microsoft BrainWave
• Coarse Grained Programmable/Reconfigurable
– Wave Computing, Plasticine, Graphcore…
• Application Specific
– Neuflow, *DianNao, Eyeriss, Cnvlutin, SCNN, TPU, …
February 6, 2023
L01-65
Sze and Emer
Sze and Emer
DNN-focused Publications at Architecture Conferences
Also new conferences focusing on addressing compute
challenges for ML/DNN, e.g., MLSys, AICAS, TinyML
February 6, 2023
L01-66
Sze and Emer
Sze and Emer
February 6, 2023
Next Lecture:
Overview of Deep Neural Networks

More Related Content

Similar to L01.pdf

Micromouse
MicromouseMicromouse
Micromousebutest
 
2019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v22019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v2Tao Wang
 
Cal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun MicrosystemsCal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun MicrosystemsLarry Smarr
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Metaverse - A Guide to the Next generation Internet by Credit Suisse
Metaverse - A Guide to the Next generation Internet by Credit SuisseMetaverse - A Guide to the Next generation Internet by Credit Suisse
Metaverse - A Guide to the Next generation Internet by Credit SuisseHarsha MV
 
HARDWARE DESIGN FOR MACHINE LEARNING
HARDWARE DESIGN FOR MACHINE LEARNINGHARDWARE DESIGN FOR MACHINE LEARNING
HARDWARE DESIGN FOR MACHINE LEARNINGijaia
 
Hardware design for_machine_learning
Hardware design for_machine_learningHardware design for_machine_learning
Hardware design for_machine_learninggerogepatton
 
DHFL Curriculum Vitae (1)-2
DHFL Curriculum Vitae (1)-2DHFL Curriculum Vitae (1)-2
DHFL Curriculum Vitae (1)-2Shyam Garg
 
Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4LennartF
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringAdvanced-Concepts-Team
 
DHFL Curriculum Vitae
DHFL Curriculum VitaeDHFL Curriculum Vitae
DHFL Curriculum VitaeShyam Garg
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateDATAVERSITY
 
How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?Amazon Web Services
 
Arunkumar-Dot Net (2)
Arunkumar-Dot Net (2)Arunkumar-Dot Net (2)
Arunkumar-Dot Net (2)A.ARUN KUMAR
 
DARQ - The Bleeding Edge Technology
DARQ - The Bleeding Edge TechnologyDARQ - The Bleeding Edge Technology
DARQ - The Bleeding Edge TechnologyIRJET Journal
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruptionUsman Qayyum
 
Emoji Encryption Using AES Algorithm
Emoji Encryption Using AES AlgorithmEmoji Encryption Using AES Algorithm
Emoji Encryption Using AES Algorithmijtsrd
 

Similar to L01.pdf (20)

Micromouse
MicromouseMicromouse
Micromouse
 
2019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v22019 4-nn-and-dl-tao wang@unc-v2
2019 4-nn-and-dl-tao wang@unc-v2
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Cal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun MicrosystemsCal-(IT)2 Projects with Sun Microsystems
Cal-(IT)2 Projects with Sun Microsystems
 
2018-11-05 Intro to AI
2018-11-05 Intro to AI2018-11-05 Intro to AI
2018-11-05 Intro to AI
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Metaverse - A Guide to the Next generation Internet by Credit Suisse
Metaverse - A Guide to the Next generation Internet by Credit SuisseMetaverse - A Guide to the Next generation Internet by Credit Suisse
Metaverse - A Guide to the Next generation Internet by Credit Suisse
 
HARDWARE DESIGN FOR MACHINE LEARNING
HARDWARE DESIGN FOR MACHINE LEARNINGHARDWARE DESIGN FOR MACHINE LEARNING
HARDWARE DESIGN FOR MACHINE LEARNING
 
Hardware design for_machine_learning
Hardware design for_machine_learningHardware design for_machine_learning
Hardware design for_machine_learning
 
DHFL Curriculum Vitae (1)-2
DHFL Curriculum Vitae (1)-2DHFL Curriculum Vitae (1)-2
DHFL Curriculum Vitae (1)-2
 
Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4
 
Serguei Seloussov - Future of computing and SIT MSc program
Serguei Seloussov - Future of computing and SIT MSc programSerguei Seloussov - Future of computing and SIT MSc program
Serguei Seloussov - Future of computing and SIT MSc program
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
 
DHFL Curriculum Vitae
DHFL Curriculum VitaeDHFL Curriculum Vitae
DHFL Curriculum Vitae
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning Update
 
How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?How Can We Answer the Really BIG Questions?
How Can We Answer the Really BIG Questions?
 
Arunkumar-Dot Net (2)
Arunkumar-Dot Net (2)Arunkumar-Dot Net (2)
Arunkumar-Dot Net (2)
 
DARQ - The Bleeding Edge Technology
DARQ - The Bleeding Edge TechnologyDARQ - The Bleeding Edge Technology
DARQ - The Bleeding Edge Technology
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
 
Emoji Encryption Using AES Algorithm
Emoji Encryption Using AES AlgorithmEmoji Encryption Using AES Algorithm
Emoji Encryption Using AES Algorithm
 

More from TRNHONGLINHBCHCM (20)

L06-handout.pdf
L06-handout.pdfL06-handout.pdf
L06-handout.pdf
 
L10-handout.pdf
L10-handout.pdfL10-handout.pdf
L10-handout.pdf
 
L09-handout.pdf
L09-handout.pdfL09-handout.pdf
L09-handout.pdf
 
L06.pdf
L06.pdfL06.pdf
L06.pdf
 
L07.pdf
L07.pdfL07.pdf
L07.pdf
 
L03.pdf
L03.pdfL03.pdf
L03.pdf
 
L08-handout.pdf
L08-handout.pdfL08-handout.pdf
L08-handout.pdf
 
L05.pdf
L05.pdfL05.pdf
L05.pdf
 
L11.pdf
L11.pdfL11.pdf
L11.pdf
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
L04.pdf
L04.pdfL04.pdf
L04.pdf
 
L12.pdf
L12.pdfL12.pdf
L12.pdf
 
L02.pdf
L02.pdfL02.pdf
L02.pdf
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 
Projects.pdf
Projects.pdfProjects.pdf
Projects.pdf
 
L15.pdf
L15.pdfL15.pdf
L15.pdf
 
L10.pdf
L10.pdfL10.pdf
L10.pdf
 
L08.pdf
L08.pdfL08.pdf
L08.pdf
 
PaperReview.pdf
PaperReview.pdfPaperReview.pdf
PaperReview.pdf
 
L13.pdf
L13.pdfL13.pdf
L13.pdf
 

Recently uploaded

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 

Recently uploaded (20)

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 

L01.pdf

  • 1. L01-1 Sze and Emer 6.5930/1 Hardware Architectures for Deep Learning Introduction and Applications Joel Emer and Vivienne Sze Massachusetts Institute of Technology Electrical Engineering & Computer Science February 6, 2023
  • 2. L01-2 Sze and Emer ACM’s Celebration of 50 Years of the ACM Turing Award (June 2017) “Compute has been the oxygen of deep learning” – Ilya Sutskever, Research Director of Open AI February 6, 2023
  • 3. L01-3 Sze and Emer Sze and Emer Why is Deep Learning Hot Now? 136,000 photos uploaded every minute 2.5 Petabytes of customer data hourly 500 hours of video uploaded every minute Big Data Availability GPU Acceleration New ML Techniques February 6, 2023
  • 4. L01-4 Sze and Emer Sze and Emer ImageNet Challenge Image Classification Task 1.2M training images 1000 object categories Object Detection Task 456k training images 200 object categories February 6, 2023
  • 5. L01-5 Sze and Emer Sze and Emer ImageNet: Image Classification Task 0 5 10 15 20 25 30 2010 2011 2012 2013 2014 2015 Human Top 5 Classification Error (%) large error rate reduction due to Deep Learning [Russakovsky, IJCV 2015] Deep Learning-based designs Hand-crafted feature- based designs February 6, 2023
  • 6. L01-6 Sze and Emer Sze and Emer GPU Usage for ImageNet Challenge February 6, 2023
  • 7. L01-7 Sze and Emer Sze and Emer Deep Learning on Images and Video • Image Classification • Object Detection • Object Localization • Image Segmentation • Action Recognition • Image Generation February 6, 2023 Computer Vision (6.8301/0[6.819/869])
  • 8. L01-8 Sze and Emer Sze and Emer Deep Learning for Speech and Language • Speech Recognition • Natural Language Processing • Speech Translation • Audio Generation February 6, 2023 Speech (6.8620[6.345]), Natural Language Processing (6.8610/1[6.864/806])
  • 9. L01-9 Sze and Emer Sze and Emer Deep Learning on Games Google DeepMind AlphaGo (2016) Training on 160,000 games required 3 weeks on 50 GPUs! During play (inference) AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. (A distributed version used 40 search threads, 1,202 CPUs and 176 GPUs.) February 6, 2023
  • 10. L01-10 Sze and Emer Sze and Emer Deep Learning on Games Deepmind’s AlphaStar defeats top professional players at StarCraft II (January 2019) https://www.deepmind.com/blog/alphastar- mastering-the-real-time-strategy-game-starcraft-ii StarCraft, considered to be one of the most challenging Real-Time Strategy (RTS) games and one of the longest-played e-sports of all time, has emerged by consensus as a “grand challenge” for AI research. Requires overcoming research challenges including Game Theory, Imperfect Information, Long term planning, Real- time, and Large action space Training required 14 days, using 16 TPUs for each agent! Each agent experienced up to 200 years of real-time play. February 6, 2023
  • 11. L01-11 Sze and Emer Sze and Emer Deep Learning for Medical Applications Cancer Detection [Jermyn, JBO 2016] February 6, 2023 [Mikael, JCO 2023] [Yala, Radiology 2019] Brain Breast Lung [Stokes, Cell 2020] Antibiotic Discovery Machine Learning for Healthcare 6.7930[6.871]
  • 12. L01-12 Sze and Emer Sze and Emer February 6, 2023 • Structure of protein provides insight on its function, which are important for development of disease treatments (e.g., proteins associated with SARS-CoV-2) • Current approaches involve years of work and multi-million dollars equipment • Protein Folding Problem: Predict 3D structure of protein based on 1D sequence of amino acids. • Estimated 10300 possibilities à Grand challenge in Biology for the past 50 years! Cryo-electron microscopy Image source: Quora Image source: Dill et al., Annu. Rev. Biophys. 2008 37:289 Deep Learning in Biology
  • 13. L01-13 Sze and Emer Sze and Emer Deep Learning in Biology February 6, 2023 Source: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology DeepMind’s AlphaFold2 recognized as solution to protein folding challenge (November 2020) Trained on publicly available data consisting of ~170,000 protein structures together with large databases containing protein sequences of unknown structure. It uses approximately 16 TPUv3s (which is 128 TPUv3 cores or roughly equivalent to ~100-200 GPUs) run over a few weeks.
  • 14. L01-14 Sze and Emer Sze and Emer Deep Learning in Fighting COVID-19 “A project using some of the most powerful supercomputers on the planet -- …-- is running millions of simulations, training a machine learning system to identify the factors that might make a molecule a good candidate….” https://www.nsf.gov/discoveries/disc_summ.jsp?cntn_id=300481&org=NSF&from=news February 6, 2023
  • 15. L01-15 Sze and Emer Sze and Emer Large Language Models: ChatGPT February 6, 2023 Slide credit: Liane Bernstein
  • 16. L01-16 Sze and Emer Sze and Emer Computing Cost of ChatGPT (Training) • ChatGPT is based on a variant of GPT-3 [Brown, NeurIPS 2020] • GPT-3 has 96-layers, 175 billion parameters and requires 3.14x1023 FLOPS of computing for training • It would take 355 years to train GPT-3 on a Tesla V100 GPU • It would cost ~$4,600,000 to train GPT-3 on using the lowest cost GPU cloud provider February 6, 2023 Source: https://lambdalabs.com/blog/demystifying-gpt-3
  • 17. L01-17 Sze and Emer Sze and Emer Computing Cost of ChatGPT (Inference) February 6, 2023 https://twitter.com/sama/status/1599671496636780546 https://twitter.com/sama/status/1599669571795185665 Estimated monthly cost of $1.5 to $8 million! Source: https://medium.com/swlh/3-questions- puzzled-me-about-openais-chatgpt-and-here-is- what-i-learned-1dda74b5f6db
  • 18. L01-18 Sze and Emer Sze and Emer Compute for Training versus Inference • Training high cost per iteration, but low frequency • Inference low cost per iteration, but high frequency • Regarding computing at Google: – “Across all three years [2019-2021], about three-fifths of the ML energy use was for inference, and two-fifths were for training” [Patterson, Computer, 2022] February 6, 2023 • Regarding computing at Meta: – “…trillions of inference per day across Meta’s data centers. … we observe a rough power capacity breakdown of 10:20:70 for AI infrastructures devoted to the three key phases — Experimentation, Training, and Inference” [Wu, MLSys 2022]
  • 19. L01-19 Sze and Emer Sze and Emer 19 Compute Demands Growing Exponentially
  • 20. L01-20 Sze and Emer Sze and Emer 20 Compute Demands Growing Exponentially Source: https://www.economist.com/technology-quarterly/2020/06/11/the-cost-of-training-machines-is-becoming-a-problem AlexNet to AlphaGo Zero: A 300,000x Increase in Compute
  • 21. L01-21 Sze and Emer Sze and Emer Compute Demands for Deep Neural Networks [Strubell, ACL 2019] [Strubell, ACL 2019] February 6, 2023
  • 22. L01-22 Sze and Emer Sze and Emer Processing at “Edge” instead of the “Cloud” Communication Privacy Latency February 6, 2023
  • 23. L01-23 Sze and Emer Sze and Emer Deep Learning for Self-Driving Cars February 6, 2023
  • 24. L01-24 Sze and Emer Sze and Emer Compute Challenges for Self-Driving Cars (Feb 2018) Cameras and radar generate ~6 gigabytes of data every 30 seconds. Prototypes use around 2,500 Watts. Generates wasted heat and some prototypes need water-cooling! February 6, 2023
  • 25. L01-25 Sze and Emer Sze and Emer Carbon Footprint of Self-Driving Cars February 6, 2023 [Sudhakar, IEEE Micro 2023]
  • 26. L01-26 Sze and Emer Sze and Emer Existing Processors Consume Too Much Power < 1 Watt > 10 Watts February 6, 2023
  • 27. L01-27 Sze and Emer Sze and Emer Need for Specialized Computing Slow down of Moore’s Law and Dennard Scaling General purpose microprocessors not getting faster or more efficient February 6, 2023 Domain specific hardware needed for significant improvement in speed and energy-efficiency à redesign computing hardware from ground up!
  • 28. L01-28 Sze and Emer Sze and Emer Challenges and Opportunities • Define the domain and degree of flexibility required • Requires understanding of domain, variants of algorithms (identify which ones are important), and resulting workloads • Handle heterogenous computing at the system level • Handle heterogenous devices (emerging device technology) • May have tighter resource constraints since hardware cannot be used for other applications • Co-design across algorithms and hardware • Domain specific languages to program specialized hardware • Tools for rapid evaluation and prototyping February 6, 2023
  • 29. L01-29 Sze and Emer Sze and Emer Chips Off the Old Block: Computers Are Taking Design Cues From Human Brains (September 16, 2017) After training a speech-recognition algorithm, for example, Microsoft offers it up as an online service, and it actually starts identifying commands that people speak into their smartphones. G.P.U.s are not quite as efficient during this stage of the process. So, many companies are now building chips specifically to do what the other chips have learned. Google built its own specialty chip, a Tensor Processing Unit, or T.P.U. Nvidia is building a similar chip. And Microsoft has reprogrammed specialized chips from Altera, which was acquired by Intel, so that it too can run neural networks more easily. Software Companies are Building HW February 6, 2023
  • 30. L01-30 Sze and Emer Sze and Emer HW Beyond Cloud Computing February 6, 2023 [Also Nvidia, Intel, Qualcomm…]
  • 31. L01-31 Sze and Emer Sze and Emer Startups Building Custom Hardware February 6, 2023 Big Bets On A.I. Open a New Frontier for Chips Start-Ups, Too. (January 14, 2018) “Today, at least 45 start-ups are working on chips that can power tasks like speech and self-driving cars, and at least five of them have raised more than $100 million from investors. Venture capitalists invested more than $1.5 billion in chip start-ups last year, nearly doubling the investments made two years ago, according to the research firm CB Insights.”
  • 32. L01-32 Sze and Emer Sze and Emer Class Overview February 6, 2023
  • 33. L01-33 Sze and Emer Sze and Emer Course Outline • Overview of Deep Neural Networks (DNNs) • DNN Development Resources • DNNs on Programmable Hardware • DNN Accelerator Architecture • DNN Model and Hardware Co-Design • Advanced Technologies for DNN February 6, 2023
  • 34. L01-34 Sze and Emer Sze and Emer Takeaways • Know the key computations used by DNNs • Be familiar with how DNN computations are mapped to various hardware platforms • Understand the tradeoffs between various architectures and platforms • Be able to evaluate different DNN accelerator implementations with benchmarks and comparison metrics • Have an appreciation for the utility of various optimization and approximation approaches • Be able to distill the key attributes of recent implementation trends and opportunities February 6, 2023
  • 35. L01-35 Sze and Emer Sze and Emer Course Objective By the end of this course, we want you to be able understand a new design in terms of attributes like: – Order of computation – Partitioning of computation – Flow of data for computation – Data movement in the storage hierarchy – Data attribute specific optimizations – Exploiting algorithm/hardware co-design – Degree of flexibility February 6, 2023
  • 36. L01-36 Sze and Emer Sze and Emer Course Staff and Contact Info • Instructors (Office hours by request) – Joel Emer (jsemer@mit.edu) – Vivienne Sze (sze@mit.edu) • Teaching Assistants (Office hours TBD) – Michael Gilbert (gilbertm@mit.edu) – Jaeyeon Won (jaeyeon@mit.edu) • Schedule – Lectures: MW 1PM-2:30PM – 4-163 – Recitations: F 10AM-11AM – TBD • Review Lectures (ML & Computer Architecture Basics) + Tools for Labs • Course Website: http://csg.csail.mit.edu/6.5930/ Slide Contributors: Yu-Hsin Chen and Tien-Ju Yang (http://eyeriss.mit.edu/tutorial.html) February 6, 2023 Joel Emer Michael Gilbert Vivienne Sze Jaeyeon Won
  • 37. L01-37 Sze and Emer Sze and Emer Course Requirements and Materials • Pre-requisites – 6.3000[6.003] (Signal Processing) or 6.3900[6.036] (Intro to Machine Learning) – 6.1910[6.004] (Computation Structures) • We will use Python and PyTorch – PyTorch website: https://pytorch.org/ – Introduction to PyTorch Code Examples: https://cs230.stanford.edu/blog/pytorch/ • Course Textbook/Readings – Book “Efficient Processing of Deep Neural Networks” • https://doi.org/10.1007/978-3-031-01766-7 (download free on MIT network) • We welcome feedback (including errata) on Piazza thread – Selected papers published in past 3 to 5 years. • Course Handouts (uploaded on website) February 6, 2023
  • 38. L01-38 Sze and Emer Sze and Emer Related Classes • Computer System Architecture (6.5900[6.823]) • Digital Circuits and Systems (6.6010[6.374]) • Digital Image Processing (6.7010[6.344]) • Machine Learning (6.3900[6.036]/ 6.7900[6.867]) • Computer Vision (6.8301[6.819]/6.8300[6.869]) February 6, 2023
  • 39. L01-39 Sze and Emer Sze and Emer Survey https://forms.gle/sMcUaxknErK7VgAu7 Due: Wed, Feb 8 February 6, 2023
  • 40. L01-40 Sze and Emer Sze and Emer Recitations • Machine Learning Review (Feb 10) • Computer Architecture Review (Feb 17, Feb 24) • Tools – PyTorch (Feb 10) – Accelergy (Mar 10) – Timeloop (Mar 17) – Sparseloop (Mar 24) • Lab Overview (on same week as Lab released) • Note: Above dates are tentative. Will announce updates on website and Piazza. February 6, 2023
  • 41. L01-41 Sze and Emer Sze and Emer Labs • Lab 1: Inference + DNN Model Design (Release Today) – Release: Feb 6 – Due: Feb 27 (3 weeks) • Lab 2: Kernel + Tiling Optimization – Release: Feb 27 – Due: March 13 (2 weeks) • Lab 3: Hardware Design & Mapping – Release: March 13 – Due: April 5 (2.5 weeks + vacation) • Lab 4: Sparsity – Release: April 5 – Due: April 19 (2 weeks) February 6, 2023
  • 42. L01-42 Sze and Emer Sze and Emer Questions • For each lecture (excluding this one) we ask that each student either ask a question or answer another student’s question related to the lecture in the Piazza folder for that lecture. • Multiple (distinct) answers to a single question are permitted. • If you ask a question in class, please also submit the question to Piazza (for grading purposes). Note: providing the lecturer’s answer doesn’t count. • Participation will be counted for all Q&A activity prior to the weekend following the lecture. February 6, 2023
  • 43. L01-43 Sze and Emer Sze and Emer ML Arxiv Papers per Year February 6, 2023 “Number of ML Arxiv papers published per year beat Moore's Law!… ” – Jeff Dean (Google)
  • 44. L01-44 Sze and Emer Sze and Emer Paper Review • We will be forming a program committee of the (fictional) Deep Learning Hardware (DLH) Conference 2023 – Learn how to read papers (critique and extract information) – Use concepts from class to analyze and gain deep understanding of paper – Enable wider coverage of papers – Gain insight of how paper decisions are made • Schedule – March 10 – Papers released – March 17– Rank order topics of interest & Enter conflicts (e.g., advisor or past co-authors) – March 20 – Papers assigned (check for undetected conflicts) – April 03, 19, 26, May 03, 08 – Paper reviews sessions – Reviews due at 11:59 pm ET the day before the review session February 6, 2023
  • 45. L01-45 Sze and Emer Sze and Emer Paper Sessions • Dataflows (April 03) • Sparsity I (April 19) • Sparsity II (April 26) • Bit Precision (May 03) • Advanced technologies and processing in memory (May 08) February 6, 2023
  • 46. L01-46 Sze and Emer Sze and Emer Paper Reviews • All student (6.5930/1) will submit written reviews for 7 papers – Three in one session (main topic) – One for each of the other four sessions – Participate in paper discussion during session and vote on papers • Students in 6.5930 (Grad Version) – Present one paper in front of class (lead reviewer) – Present strengths and weaknesses for three papers February 6, 2023
  • 47. L01-47 Sze and Emer Sze and Emer Real World versus DLH23 PC Real World DLH23 Papers per PC member 20 – 110 papers 3 + 4 Review Time 2 – 5 weeks 2 – 7 weeks Length of PC meeting 1 to 2 days 5 classes x 1.5 hours = 7.5 hours Number of papers discussed 60 – 120 25 – 30 Discussion time per paper 5 – 15 min 12 – 15 min February 6, 2023
  • 48. L01-48 Sze and Emer Sze and Emer Design Project • Project – Choose from a list of suggested projects – May propose own project (requires formal proposal and approval by course staff) • Use tools from labs – PyTorch, Accelergy, Timeloop, Sparseloop • Teams of two or three (depending on class size) • Schedule – March 20 – List of projects released – April 10 – Submit project selection (or proposal) – May 08 – Project Report Due • Poster session or short presentation (depending on class size) February 6, 2023
  • 49. L01-49 Sze and Emer Sze and Emer Example Design Projects • Hardware design study of well-known DNN accelerators (e.g., TPU, NVDLA) and analyze architectural tradeoffs though modeling • Analyze co-design approaches to gain deeper understanding of impact on accuracy and efficiency • Evaluate impact of new technologies (RRAM, Optical) • Extend tools for improved design exploration and analysis capabilities February 6, 2023 Goal: Apply concepts and tools from class!
  • 50. L01-50 Sze and Emer Sze and Emer Assignments and Grading • Grading – Questions: 5% – Four Labs: 40% [10% each] – Paper Reviews: 20% – Final Design Project: 35% • All assignments are due by 11:59PM ET on the due date (submitted online) • Late Policy for Labs: – You should always submit your labs on time. Nonetheless, since unexpected situations, like illnesses, might occur, you have a budget of 5 late days throughout the semester. – The budget is spent in increments of 1 day – You do not need to inform us about your use of your budget. The course staff will keep track of the days you have spent. – You will receive zero credit once your budget is expended (Note: please contact course staff regarding extenuating circumstances) • No late days for paper reviews and project due to tight timeline • Labs are to be completed individually, although discussion of course concepts covered in the laboratories is encouraged. February 6, 2023
  • 51. L01-51 Sze and Emer Sze and Emer Background of Deep Neural Networks February 6, 2023
  • 52. L01-52 Sze and Emer Sze and Emer Artificial Intelligence Artificial Intelligence “The science and engineering of creating intelligent machines” - John McCarthy, 1956 February 6, 2023
  • 53. L01-53 Sze and Emer Sze and Emer Artificial Intelligence AI and Machine Learning Machine Learning “Field of study that gives computers the ability to learn without being explicitly programmed” – Arthur Samuel, 1959 February 6, 2023
  • 54. L01-54 Sze and Emer Sze and Emer Artificial Intelligence Brain-Inspired Machine Learning Machine Learning Brain-Inspired An algorithm that takes its basic functionality from our understanding of how the brain operates February 6, 2023
  • 55. L01-55 Sze and Emer Sze and Emer How Does the Brain Work? • The basic computational unit of the brain is a neuron à 86B neurons in the brain • Neurons are connected with nearly 1014 – 1015 synapses • Neurons receive input signal from dendrites and produce output signal along axon, which interact with the dendrites of other neurons via synaptic weights • Synaptic weights – learnable & control influence strength Image Source: Stanford February 6, 2023
  • 56. L01-56 Sze and Emer Sze and Emer Artificial Intelligence Spiking-based Machine Learning Machine Learning Brain-Inspired Spiking February 6, 2023
  • 57. L01-57 Sze and Emer Sze and Emer Spiking Architecture • Brain-inspired - Integrate and fire • Example: IBM TrueNorth [Merolla, Science 2014; Esser, PNAS 2016] http://www.research.ibm.com/articles/brain-chip.shtml February 6, 2023
  • 58. L01-58 Sze and Emer Sze and Emer Artificial Intelligence Machine Learning with Neural Networks Machine Learning Brain-Inspired Spiking Neural Networks February 6, 2023
  • 59. L01-59 Sze and Emer Sze and Emer Neural Networks: Weighted Sum Modified Image Source: Stanford yj = f wi xi i ∑ + b ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ w1x1 w2 x2 w0 x0 x0 w0 synapse axon dendrite axon neuron yj February 6, 2023
  • 60. L01-60 Sze and Emer Sze and Emer Many Weighted Sums weighted sum non-linear function f(・) activation W00 W30 x0 x1 x2 x3 y1 y0 y2 𝑦! = 𝑓 $ "#$ % 𝑊"!×𝑥" February 6, 2023
  • 61. L01-61 Sze and Emer Sze and Emer Artificial Intelligence Deep Learning Machine Learning Brain-Inspired Spiking Neural Networks Deep Learning February 6, 2023
  • 62. L01-62 Sze and Emer Sze and Emer Deep Learning Network Input: Image Output: “Volvo XC90” Modified Image Source: [Lee, CACM 2011] Low Level Features High Level Features February 6, 2023
  • 63. L01-63 Sze and Emer Sze and Emer DNN Timeline • 1940s: Neural networks were proposed • 1960s: Deep neural networks were proposed • 1989: Neural network for recognizing digits (LeNet) • 1990s: Hardware for shallow neural nets – Example: Intel ETANN (1992) • 2011: Breakthrough DNN-based speech recognition – Microsoft real-time speech translation • 2012: DNNs for vision supplanting traditional ML – AlexNet for image classification • 2014+: Rise of DNN accelerator research – Examples: Neuflow, DianNao, etc. February 6, 2023
  • 64. L01-64 Sze and Emer Sze and Emer Deep Learning Platforms • CPU – Intel, ARM, AMD… • GPU – NVIDIA, AMD… • Fine Grained Reconfigurable (FPGA) – Microsoft BrainWave • Coarse Grained Programmable/Reconfigurable – Wave Computing, Plasticine, Graphcore… • Application Specific – Neuflow, *DianNao, Eyeriss, Cnvlutin, SCNN, TPU, … February 6, 2023
  • 65. L01-65 Sze and Emer Sze and Emer DNN-focused Publications at Architecture Conferences Also new conferences focusing on addressing compute challenges for ML/DNN, e.g., MLSys, AICAS, TinyML February 6, 2023
  • 66. L01-66 Sze and Emer Sze and Emer February 6, 2023 Next Lecture: Overview of Deep Neural Networks