SlideShare a Scribd company logo
Seda Ogrenci-Memik
ECE & CS
Northwestern
1
Computer Engineering Research
@Northwestern
•  Housed within the Electrical and Computer
Engineering Department, co-owned by the
Computer Science Department
–  Faculty have dual appointments
–  PhD programs in both departments have
Computer Engineering tracks
•  Design Automation / CAD and VLSI,
Computer Architecture, Internet of Things,
Embedded and Cyberphysical Systems, Big
Data & AI
2
Computer Engineering Research
@Northwestern
•  Hardware assisted ML
•  Neuromorphic computing
•  Hardware security
•  Photonic circuits in computer architectures
•  Emphatic computing
•  Batteryless computing
•  Secure and verified software for automotive
application
•  Adaptive architecture – compiler in the loop
3
My Research Interests
•  Electronic Design Automation (EDA)
•  Circuits and Systems
•  High Performance Computing (HPC)
Systems
4
Research Activities
•  Design Automation
–  High-Level Synthesis
•  Converting applications described
with programming languages to
hardware
•  Circuits and Systems
–  Thermal and Power
Management of ICs
•  Temperature Sensors, 3D
Integration, Power and
Performance Modeling using
Machine Learning
5
Research Activities
•  High Performance
Computing Systems
– FPGA-based
accelerators for HPC
applications
– Thermal and Power
Management in large
scale systems
6
HW Assisted ML
•  Design Automation for real-time AI
Cyberinfrastructure for scientists who deploy HW
systems for ML
7
Physics	
Material	
Science	
Astro-
physics	
Particle	
Tracking	
Image	Reconstruction	
Real-time	control	of	
microscopy	
Accelerator	
Control	
Semantic	
compression	
Data	filtering,	compression,	reconstruction,	
feedback	controllers
Handling Big Data
•  Source of the computational challenge
– Particle acceleration experiment in high-energy
physics
•  Billions of “event” happen every ~25ns creating Pb/
sec data rates
•  “Interesting” events (~3% of all) need to be
recognized and filtered for further processing
–  Sky survey data collected from observatories
stream in at rates ~Gb/s (or more)
•  Limited energy (generators in South Pole) to move
data to datacenters
8
Handling Big Data
•  Source of the computational challenge
–  CPU-GPU systems perform streaming inference on
images captured by an electron microscope within
~300ms latency
–  Desired goal is to perform inference on images
captured at 10s of millions of frames per second
under 50ms latency
–  ~50ms latency could allow real-time control of the
EM during material synthesis
•  Think of making defects for a quantum material,
deposition of nano layers, etc.
9
Projects – Complete/Underway
•  READS – Accelerator Real-time Edge AI
for Distributed Systems
•  Design of a reconfigurable autoencoder
algorithm for detector front-end ASICs
10
Projects – Preliminary
•  CryoAI - 22nm testchip
•  Adaptive ML accelerators
11
READS – Accelerator Real-time
Edge AI for Distributed Systems
•  Goal: integrate ML into accelerator operations
•  Challenges: Beam Loss
–  High Energy Physics (HEP) experiment use proton
beams
–  Particles get lost through interactions with the beam
vacuum pipe
–  Intensity/pace of beam extraction affects radiation in
the environment
–  Human operators tune parameters of hundreds to
thousands of devices inside the accelerator complex
–  If beam loss (“leak”) is at extreme levels, system
needs to shut-down – loss of access to facility
12
READS – Accelerator Real-time
Edge AI for Distributed Systems
13
•  Muon Complex
•  Two rings are on top of
each other: Main
Injector Ring and the
Recycler Ring
•  Protons are accelerated
within the Booster
•  Injected into the
Recycler Ring and Main
Injector Ring
Kyle	Hazelwood,	Mattson	Thieme	|	READS:	Beam	Loss	De-blending	for	Main	Injector	and	Recycler	
Project Overview
11/9/21	
•  Main	Injector	and	Recycler	share	an	enclosure	
•  Both	machines	can	and	do	often	have	high	intensity	beam	in	them	
simultaneous	
•  Both	machines	can	generate	significant	beam	loss	
•  “Lost”	portion	of	the	beam	escapes	as	dangerous	radiation	and	if	that	is	
too	much,	it	forces	the	facility	to	shot	down.	
•  The	machine	origin	of	a	beam	loss	is	often	hard	to	distinguish	
Main	Injector	tunnel	
Recycler	(top)	Main	Injector	(bottom)
Kyle	Hazelwood,	Mattson	Thieme	|	READS:	Beam	Loss	De-blending	for	Main	Injector	and	Recycler	
Project Overview
11/9/21	
• Using	time,	location	and	state	of	the	machine,	machine	experts	
can	sometimes	attribute	loss	to	a	particular	machine	
– This	suggests	a	Machine	Learning	(ML)	model	may	be	
trainable	to	automatically	attribute	loss	and	replicate	or	
improve	upon	the	expert's	ability	
• Often	losses	from	one	machine	end	up	tripping	the	machine	
permit	of	the	other	resulting	in	unnecessary	beam	downtime	
	
The	projects	aims	to	deploy	a	machine	learning	model	on	a	FPGA	
that	when	fed	streamed	beam	loss	readings	from	around	the	Main	
Injector	complex,	will	infer	in	real-time	the	machine	loss	origin
Kyle	Hazelwood,	Mattson	Thieme	|	READS:	Beam	Loss	De-blending	for	Main	Injector	and	Recycler	
ML Model Architecture: Overview
11/9/21	
Objective:	Assign	BLM-wise	probabilities	for	the	loss	originating	in	MI/RR
READS – Accelerator Real-time
Edge AI for Distributed Systems
•  System Design Goal: PID Controller gains
need to be optimized in real-time ~ms
•  The ML Processor receives inputs from
sensors
– beam position monitor (BPM)
– beam loss monitor (BLM)
17
READS – Accelerator Real-time
Edge AI for Distributed Systems
•  System Architecture:
18
READS – Accelerator Real-time
Edge AI for Distributed Systems
•  System Architecture:
– Edge device continuously optimizes the online
control agent
– Data streamed to a cloud system for large
scale training
19
READS – Accelerator Real-time
Edge AI for Distributed Systems
•  Algorithm-
Architecture Co-
design
–  A common
toolchain to
program FPGA
devices as well
as create
interfaces
20
READS – Accelerator Real-time
Edge AI for Distributed Systems
•  Algorithm-Architecture Co-design
–  Create modular neural network components
•  Regroup, recombine
–  Establish physics/science-aware methodology
•  Hardware-aware quantization and pruning techniques
•  Homogeneous versus heterogeneous quantization
•  Quantization-aware training
•  Control resource re-use trade-offs of high level
synthesis tools
•  Differentiate between the relative hardware cost of
storage, DSP, interconnect
21
READS – Accelerator Real-time
Edge AI for Distributed Systems
22
Target - Arria10 HPS + FPGA (on-chip ram)
IP	generated	for	the	ML-module	by	hls4ml	tool	flow	
hls_input	(on-
chip	ram)	
hls_output	
(on-chip	
ram)	
irq	
(PIO)	
hls4ml_ct
rl	
(VHDL)	
mm_mast
er	
mm_mast
er	
HPS	
h2f_axi_master	 h2f_axi_master	 h2f_axi_master	
h2f_irq1	
start,	done	
signal	etc	
+  Testbench	
of	hls4ml	
READS Project
Hardware Optimization for the ML-IP
–  The current model implemented as IP consists of
•  Dense Layer – ReLU – Dense Layer - Sigmoid
–  259 inputs and 518 outputs
•  16 bits ac_fixed values
–  Representation still under investigation
–  HLS (hls4ml) tool was optimized to explore resource
sharing for computation such as dense layer
–  HLS tool was not equipped with features to optimize other
layers such as Sigmoid
•  When we are dealing with a model with 100s of outputs situation
changes
–  Logic Synthesis stage (Quartus)
•  Large scale models introduce clock timing violations
that were not observed in smaller models
24
Project #2: Reconfigurable autoencoder for
detector front-end ASICs
•  Edge computing for particle collider experiments
•  Data collected from a large number of photon
detectors are compressed to representative
information of the “shape”
–  Charge measurements from the detectors are
compressed to a radiation pattern
–  6 million detector channels sending data at 40MHz
–  The data from the original space is compressed to
lower dimensionality by the edge ASIC, transmitted,
then decoded on the receiving end
25
Reconfigurable autoencoder for detector
front-end ASICs
•  System constraints
– Low power
•  Will be part of a larger system with power budget
– Radiation tolerant
– Reprogrammable weights through accessible
registers to enable updates and customization
26
Reconfigurable autoencoder for detector
front-end ASICs
•  Autoencoder:
neural network
with single
convolutional
layer followed
by a dense
layer
27
Reconfigurable autoencoder for detector
front-end ASICs
28
Glossary	of	Deep	Learning:	Autoencoder,	by	Jaron	Collins
Reconfigurable autoencoder for detector
front-end ASICs
•  Dataflow
–  22-bit signals from 48 detector cells
29
Reconfigurable autoencoder for detector
front-end ASICs
•  Network properties
–  CNN layer: eight 3x3x3 kernel matrices resulting in 128
outputs
–  ReLu activation after CNN and after final dense layer
–  6-bits weights
–  Dense layer produces 16 10-bit outputs
–  Chip can be reconfigured to produce as low as total of
64 bits in output
30
Reconfigurable autoencoder for detector
front-end ASICs
31
Reconfigurable autoencoder for detector
front-end ASICs
32
Reconfigurable autoencoder for detector
front-end ASICs
•  Chip specs
–  6b weight and bias parameters
•  Total: 2,286 = 13,724 bits
–  Parameters loaded via I2C interface
–  Decoder component implemented off-detector on
FPGA
–  Chip latency – 25ns
–  7nJ per inference
–  280mW
–  Can withstand 200MRad ionizing radiation
–  2.5mm2
33
Reconfigurable autoencoder for detector
front-end ASICs
34
•  Protection against Single Event Upsets
Reconfigurable autoencoder for detector
front-end ASICs
35
•  Protection against Single Event Upsets
Partial	TMR:	suppresses	error	without	
correction	
Full	TMR:	both	suppresses	and	corrects	
error
Reconfigurable autoencoder for detector
front-end ASICs
36
•  Partial TMR can also be implemented in
a few different hybrid modes
– Applied to only registers
– Include clock lines
– Apply to a sub-block
Reconfigurable autoencoder for detector
front-end ASICs
37
•  Exploring trade-offs in TMR
–  Area versus TMR coverage
–  Leakage Power versus TMR coverage
–  Fanout (routability) versus TMR coverage
–  Timing (impact on slack)
Projects – Preliminary
•  Adaptive ML accelerators
38
Adaptive ML accelerators
•  Why adaptation?
– Varying energy constraints
– Input variability
– Translation to new platform/device
– Transfer Learning
39
Adaptive ML accelerators
•  Some directions
–  Emulate loss through drop out/connect
•  Techniques in ML literature equate these phenomena to
forcing weights to zero
•  Loss in (because of) hardware may look very different
–  Continuous diagnostics
•  Evaluation of network certainty
•  Concept of surprise
–  Detection of temporal dominance of classes
•  Reconfigure optimized version for dominant class
40
Adaptive ML accelerators
•  Some directions
–  Critical path based hardware-aware resource
management
•  Class-based CP: using contributions of a neuron to a
specific class
–  Mean Absolute Activation, first order Taylor Approximations
•  Generalized CP: relative participation of output channels at
the routing of the output from a layer
–  Distribute resources (e.g., total number of bits
allocated for weights) according to a criticality metric
41
Adaptive ML accelerators
•  Some directions
–  Characterize circuits (e.g., SRAM) to create models
•  Associate voltage drop/power outage with cell decay
•  Create characteristic bit masks for weights
– Neural network architecture search
•  Lessons learned from design space exploration in
high-level synthesis
•  Search for a new cell from basic building blocks
–  Input: a set of convolutions and pooling of varying size
–  Think of it as your module library
42
Adaptive ML accelerators
•  Some directions
–  Partition system into two part
•  Part1: Fixed ML architecture
•  Part2: ML architecture with capability to train within the edge
device
– Training consumes resources and energy on
the limited edge device
•  Need to explore the trade-off carefully
•  Co-design ML architecture and training hardware
43
Future Directions
•  Fluid implementations
–  Quickly re-targetable from software to hardware
•  Scientific domains offer a vast space of
computational challenges
–  They need interpretable systems
–  Laws of nature apparent in the system’s output
•  Multiple paths need to converge
–  Photonic circuits – not much automated
–  FPGA
–  Neuromorphic – not much automated
–  Quantum
44
Collaborators
•  Northwestern University
–  Han Liu (CS), Kristian Hahn (Physics)
–  Manuel Blanco Valentin, Rui Shi, Deniz Ulusel,
Bincong Ye, Yingyi Luo (now at Google), Sid Joshi
(now at Intel)
•  Fermi National Laboratories
–  Nhan Tran, Farah Fahim, Christian Herwig, Kiyomi
Seiya, et al.
•  Columbia University
–  Giuseppe di Guglielmo
•  Lehigh University
–  Josh Agar
45
THANK YOU!
46

More Related Content

Similar to Design Automation Approaches for Real-Time Edge Computing for Science Applications

Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
Pratik Gohel
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
Deepak Shankar
 
CAOS: A CAD Framework for FPGA-Based Systems
CAOS: A CAD Framework for FPGA-Based SystemsCAOS: A CAD Framework for FPGA-Based Systems
CAOS: A CAD Framework for FPGA-Based Systems
NECST Lab @ Politecnico di Milano
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems Design
Reza Sameni
 
Integrated modeling and simulation framework for wireless sensor networks
Integrated modeling and simulation framework for wireless sensor networksIntegrated modeling and simulation framework for wireless sensor networks
Integrated modeling and simulation framework for wireless sensor networks
Daniele Gianni
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
Ryousei Takano
 
L26.B.FA20.ppt
L26.B.FA20.pptL26.B.FA20.ppt
L26.B.FA20.ppt
prabavathi121
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
 
2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos
2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos
2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos
OPAL-RT TECHNOLOGIES
 
Processors selection
Processors selectionProcessors selection
Processors selection
Pradeep Shankhwar
 
To be smart or not to be?
To be smart or not to be?To be smart or not to be?
To be smart or not to be?
Tal Lavian Ph.D.
 
Node architecture
Node architectureNode architecture
Node architecture
GhufranEssam
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Deepak Shankar
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthArchitectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidth
Deepak Shankar
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Deepak Shankar
 
Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processors
PeriyanayagiS
 
chap 18 multicore computers
chap 18 multicore computers chap 18 multicore computers
chap 18 multicore computers Sher Shah Merkhel
 

Similar to Design Automation Approaches for Real-Time Edge Computing for Science Applications (20)

Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
 
CAOS: A CAD Framework for FPGA-Based Systems
CAOS: A CAD Framework for FPGA-Based SystemsCAOS: A CAD Framework for FPGA-Based Systems
CAOS: A CAD Framework for FPGA-Based Systems
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems Design
 
Integrated modeling and simulation framework for wireless sensor networks
Integrated modeling and simulation framework for wireless sensor networksIntegrated modeling and simulation framework for wireless sensor networks
Integrated modeling and simulation framework for wireless sensor networks
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
L26.B.FA20.ppt
L26.B.FA20.pptL26.B.FA20.ppt
L26.B.FA20.ppt
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos
2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos
2017 Atlanta Regional User Seminar - Real-Time Microgrid Demos
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
To be smart or not to be?
To be smart or not to be?To be smart or not to be?
To be smart or not to be?
 
Node architecture
Node architectureNode architecture
Node architecture
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthArchitectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidth
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
 
Introduction to Digital Signal processors
Introduction to Digital Signal processorsIntroduction to Digital Signal processors
Introduction to Digital Signal processors
 
chap 18 multicore computers
chap 18 multicore computers chap 18 multicore computers
chap 18 multicore computers
 

More from Facultad de Informática UCM

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
Facultad de Informática UCM
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
Facultad de Informática UCM
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
Facultad de Informática UCM
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
Facultad de Informática UCM
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
Facultad de Informática UCM
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
Facultad de Informática UCM
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
Facultad de Informática UCM
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
Facultad de Informática UCM
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
Facultad de Informática UCM
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Facultad de Informática UCM
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
Facultad de Informática UCM
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
Facultad de Informática UCM
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
Facultad de Informática UCM
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
Facultad de Informática UCM
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
Facultad de Informática UCM
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Facultad de Informática UCM
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
Facultad de Informática UCM
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Facultad de Informática UCM
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
Facultad de Informática UCM
 
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Facultad de Informática UCM
 

More from Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
Evolution and Trends in Edge AI Systems and Architectures for the Internet of...
 

Recently uploaded

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 

Recently uploaded (20)

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 

Design Automation Approaches for Real-Time Edge Computing for Science Applications

  • 3. Computer Engineering Research @Northwestern •  Housed within the Electrical and Computer Engineering Department, co-owned by the Computer Science Department –  Faculty have dual appointments –  PhD programs in both departments have Computer Engineering tracks •  Design Automation / CAD and VLSI, Computer Architecture, Internet of Things, Embedded and Cyberphysical Systems, Big Data & AI 2
  • 4. Computer Engineering Research @Northwestern •  Hardware assisted ML •  Neuromorphic computing •  Hardware security •  Photonic circuits in computer architectures •  Emphatic computing •  Batteryless computing •  Secure and verified software for automotive application •  Adaptive architecture – compiler in the loop 3
  • 5. My Research Interests •  Electronic Design Automation (EDA) •  Circuits and Systems •  High Performance Computing (HPC) Systems 4
  • 6. Research Activities •  Design Automation –  High-Level Synthesis •  Converting applications described with programming languages to hardware •  Circuits and Systems –  Thermal and Power Management of ICs •  Temperature Sensors, 3D Integration, Power and Performance Modeling using Machine Learning 5
  • 7. Research Activities •  High Performance Computing Systems – FPGA-based accelerators for HPC applications – Thermal and Power Management in large scale systems 6
  • 8. HW Assisted ML •  Design Automation for real-time AI Cyberinfrastructure for scientists who deploy HW systems for ML 7 Physics Material Science Astro- physics Particle Tracking Image Reconstruction Real-time control of microscopy Accelerator Control Semantic compression Data filtering, compression, reconstruction, feedback controllers
  • 9. Handling Big Data •  Source of the computational challenge – Particle acceleration experiment in high-energy physics •  Billions of “event” happen every ~25ns creating Pb/ sec data rates •  “Interesting” events (~3% of all) need to be recognized and filtered for further processing –  Sky survey data collected from observatories stream in at rates ~Gb/s (or more) •  Limited energy (generators in South Pole) to move data to datacenters 8
  • 10. Handling Big Data •  Source of the computational challenge –  CPU-GPU systems perform streaming inference on images captured by an electron microscope within ~300ms latency –  Desired goal is to perform inference on images captured at 10s of millions of frames per second under 50ms latency –  ~50ms latency could allow real-time control of the EM during material synthesis •  Think of making defects for a quantum material, deposition of nano layers, etc. 9
  • 11. Projects – Complete/Underway •  READS – Accelerator Real-time Edge AI for Distributed Systems •  Design of a reconfigurable autoencoder algorithm for detector front-end ASICs 10
  • 12. Projects – Preliminary •  CryoAI - 22nm testchip •  Adaptive ML accelerators 11
  • 13. READS – Accelerator Real-time Edge AI for Distributed Systems •  Goal: integrate ML into accelerator operations •  Challenges: Beam Loss –  High Energy Physics (HEP) experiment use proton beams –  Particles get lost through interactions with the beam vacuum pipe –  Intensity/pace of beam extraction affects radiation in the environment –  Human operators tune parameters of hundreds to thousands of devices inside the accelerator complex –  If beam loss (“leak”) is at extreme levels, system needs to shut-down – loss of access to facility 12
  • 14. READS – Accelerator Real-time Edge AI for Distributed Systems 13 •  Muon Complex •  Two rings are on top of each other: Main Injector Ring and the Recycler Ring •  Protons are accelerated within the Booster •  Injected into the Recycler Ring and Main Injector Ring
  • 15. Kyle Hazelwood, Mattson Thieme | READS: Beam Loss De-blending for Main Injector and Recycler Project Overview 11/9/21 •  Main Injector and Recycler share an enclosure •  Both machines can and do often have high intensity beam in them simultaneous •  Both machines can generate significant beam loss •  “Lost” portion of the beam escapes as dangerous radiation and if that is too much, it forces the facility to shot down. •  The machine origin of a beam loss is often hard to distinguish Main Injector tunnel Recycler (top) Main Injector (bottom)
  • 17. Kyle Hazelwood, Mattson Thieme | READS: Beam Loss De-blending for Main Injector and Recycler ML Model Architecture: Overview 11/9/21 Objective: Assign BLM-wise probabilities for the loss originating in MI/RR
  • 18. READS – Accelerator Real-time Edge AI for Distributed Systems •  System Design Goal: PID Controller gains need to be optimized in real-time ~ms •  The ML Processor receives inputs from sensors – beam position monitor (BPM) – beam loss monitor (BLM) 17
  • 19. READS – Accelerator Real-time Edge AI for Distributed Systems •  System Architecture: 18
  • 20. READS – Accelerator Real-time Edge AI for Distributed Systems •  System Architecture: – Edge device continuously optimizes the online control agent – Data streamed to a cloud system for large scale training 19
  • 21. READS – Accelerator Real-time Edge AI for Distributed Systems •  Algorithm- Architecture Co- design –  A common toolchain to program FPGA devices as well as create interfaces 20
  • 22. READS – Accelerator Real-time Edge AI for Distributed Systems •  Algorithm-Architecture Co-design –  Create modular neural network components •  Regroup, recombine –  Establish physics/science-aware methodology •  Hardware-aware quantization and pruning techniques •  Homogeneous versus heterogeneous quantization •  Quantization-aware training •  Control resource re-use trade-offs of high level synthesis tools •  Differentiate between the relative hardware cost of storage, DSP, interconnect 21
  • 23. READS – Accelerator Real-time Edge AI for Distributed Systems 22
  • 24. Target - Arria10 HPS + FPGA (on-chip ram) IP generated for the ML-module by hls4ml tool flow hls_input (on- chip ram) hls_output (on-chip ram) irq (PIO) hls4ml_ct rl (VHDL) mm_mast er mm_mast er HPS h2f_axi_master h2f_axi_master h2f_axi_master h2f_irq1 start, done signal etc +  Testbench of hls4ml READS Project
  • 25. Hardware Optimization for the ML-IP –  The current model implemented as IP consists of •  Dense Layer – ReLU – Dense Layer - Sigmoid –  259 inputs and 518 outputs •  16 bits ac_fixed values –  Representation still under investigation –  HLS (hls4ml) tool was optimized to explore resource sharing for computation such as dense layer –  HLS tool was not equipped with features to optimize other layers such as Sigmoid •  When we are dealing with a model with 100s of outputs situation changes –  Logic Synthesis stage (Quartus) •  Large scale models introduce clock timing violations that were not observed in smaller models 24
  • 26. Project #2: Reconfigurable autoencoder for detector front-end ASICs •  Edge computing for particle collider experiments •  Data collected from a large number of photon detectors are compressed to representative information of the “shape” –  Charge measurements from the detectors are compressed to a radiation pattern –  6 million detector channels sending data at 40MHz –  The data from the original space is compressed to lower dimensionality by the edge ASIC, transmitted, then decoded on the receiving end 25
  • 27. Reconfigurable autoencoder for detector front-end ASICs •  System constraints – Low power •  Will be part of a larger system with power budget – Radiation tolerant – Reprogrammable weights through accessible registers to enable updates and customization 26
  • 28. Reconfigurable autoencoder for detector front-end ASICs •  Autoencoder: neural network with single convolutional layer followed by a dense layer 27
  • 29. Reconfigurable autoencoder for detector front-end ASICs 28 Glossary of Deep Learning: Autoencoder, by Jaron Collins
  • 30. Reconfigurable autoencoder for detector front-end ASICs •  Dataflow –  22-bit signals from 48 detector cells 29
  • 31. Reconfigurable autoencoder for detector front-end ASICs •  Network properties –  CNN layer: eight 3x3x3 kernel matrices resulting in 128 outputs –  ReLu activation after CNN and after final dense layer –  6-bits weights –  Dense layer produces 16 10-bit outputs –  Chip can be reconfigured to produce as low as total of 64 bits in output 30
  • 32. Reconfigurable autoencoder for detector front-end ASICs 31
  • 33. Reconfigurable autoencoder for detector front-end ASICs 32
  • 34. Reconfigurable autoencoder for detector front-end ASICs •  Chip specs –  6b weight and bias parameters •  Total: 2,286 = 13,724 bits –  Parameters loaded via I2C interface –  Decoder component implemented off-detector on FPGA –  Chip latency – 25ns –  7nJ per inference –  280mW –  Can withstand 200MRad ionizing radiation –  2.5mm2 33
  • 35. Reconfigurable autoencoder for detector front-end ASICs 34 •  Protection against Single Event Upsets
  • 36. Reconfigurable autoencoder for detector front-end ASICs 35 •  Protection against Single Event Upsets Partial TMR: suppresses error without correction Full TMR: both suppresses and corrects error
  • 37. Reconfigurable autoencoder for detector front-end ASICs 36 •  Partial TMR can also be implemented in a few different hybrid modes – Applied to only registers – Include clock lines – Apply to a sub-block
  • 38. Reconfigurable autoencoder for detector front-end ASICs 37 •  Exploring trade-offs in TMR –  Area versus TMR coverage –  Leakage Power versus TMR coverage –  Fanout (routability) versus TMR coverage –  Timing (impact on slack)
  • 39. Projects – Preliminary •  Adaptive ML accelerators 38
  • 40. Adaptive ML accelerators •  Why adaptation? – Varying energy constraints – Input variability – Translation to new platform/device – Transfer Learning 39
  • 41. Adaptive ML accelerators •  Some directions –  Emulate loss through drop out/connect •  Techniques in ML literature equate these phenomena to forcing weights to zero •  Loss in (because of) hardware may look very different –  Continuous diagnostics •  Evaluation of network certainty •  Concept of surprise –  Detection of temporal dominance of classes •  Reconfigure optimized version for dominant class 40
  • 42. Adaptive ML accelerators •  Some directions –  Critical path based hardware-aware resource management •  Class-based CP: using contributions of a neuron to a specific class –  Mean Absolute Activation, first order Taylor Approximations •  Generalized CP: relative participation of output channels at the routing of the output from a layer –  Distribute resources (e.g., total number of bits allocated for weights) according to a criticality metric 41
  • 43. Adaptive ML accelerators •  Some directions –  Characterize circuits (e.g., SRAM) to create models •  Associate voltage drop/power outage with cell decay •  Create characteristic bit masks for weights – Neural network architecture search •  Lessons learned from design space exploration in high-level synthesis •  Search for a new cell from basic building blocks –  Input: a set of convolutions and pooling of varying size –  Think of it as your module library 42
  • 44. Adaptive ML accelerators •  Some directions –  Partition system into two part •  Part1: Fixed ML architecture •  Part2: ML architecture with capability to train within the edge device – Training consumes resources and energy on the limited edge device •  Need to explore the trade-off carefully •  Co-design ML architecture and training hardware 43
  • 45. Future Directions •  Fluid implementations –  Quickly re-targetable from software to hardware •  Scientific domains offer a vast space of computational challenges –  They need interpretable systems –  Laws of nature apparent in the system’s output •  Multiple paths need to converge –  Photonic circuits – not much automated –  FPGA –  Neuromorphic – not much automated –  Quantum 44
  • 46. Collaborators •  Northwestern University –  Han Liu (CS), Kristian Hahn (Physics) –  Manuel Blanco Valentin, Rui Shi, Deniz Ulusel, Bincong Ye, Yingyi Luo (now at Google), Sid Joshi (now at Intel) •  Fermi National Laboratories –  Nhan Tran, Farah Fahim, Christian Herwig, Kiyomi Seiya, et al. •  Columbia University –  Giuseppe di Guglielmo •  Lehigh University –  Josh Agar 45