The document provides an introduction to Eyeriss, an energy-efficient reconfigurable accelerator for deep convolutional neural networks (CNNs). Some key points:
- Eyeriss uses a row stationary dataflow that reduces energy costs compared to other dataflows like weight stationary and output stationary.
- It has a 4-level memory hierarchy from DRAM to register files to minimize data movement costs.
- A network-on-chip and multicast/point-to-point delivery allows single-cycle data delivery between components.
- Compression techniques like run-length compression are used to further reduce data movement costs.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
14th slide set of CECS 542
Non-functional requirements, namely the subcategories of quality requirements, process requirements, and system constraints
Complete course: http://foss2serve.org/index.php/Requirements_Engineering,_CSU_Long_Beach,_Penzenstadler
Speeding Time to Insight with a Modern ELT ApproachDatabricks
The availability of new tools in the modern data stack is changing the way data teams operate. Specifically, the modern data stack supports an “ELT” approach for managing data, rather than the traditional “ETL” approach. In an ELT approach, data sources are automatically loaded in a normalized state into Delta Lake and opinionated transformations happen in the data destination using dbt. This workflow allows data analysts to move more quickly from raw data to insight, while creating repeatable data pipelines robust to changes in the source datasets. In this presentation, we’ll illustrate how easy it is for even a data analytics team of one to to develop an end-to-end data pipeline. We’ll load data from GitHub into Delta Lake, then use pre-built dbt models to feed a daily Redash dashboard on sales performance by manager, and use the same transformed models to power the data science team’s predictions of future sales by segment.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
This was a presentation done for the Techspace of IoT Asia 2017 oon 30th March 2017. This is an introductory session to introduce the concept of Long Short-Term Memory (LSTMs) for the prediction in Time Series. I also shared the Keras code to work out a simple Sin Wave example and a Household power consumption data to use for the predictions. The links for the code can be found in the presentation.
Skip, residual and densely connected RNN architecturesfgodin
Slides presented during the Datascience Meetup @Sentiance. Based on the following paper:
"Improving Language Modeling using Densely Connected Recurrent Neural Networks".
See http://www.fredericgodin.com/publications/ for more info.
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
This presentation on Convolutional neural network tutorial (CNN) will help you understand what is a convolutional neural network, hoe CNN recognizes images, what are layers in the convolutional neural network and at the end, you will see a use case implementation using CNN. CNN is a feed forward neural network that is generally used to analyze visual images by processing data with grid like topology. A CNN is also known as a "ConvNet". Convolutional networks can also perform optical character recognition to digitize text and make natural-language processing possible on analog and hand-written documents. CNNs can also be applied to sound when it is represented visually as a spectrogram. Now, lets deep dive into this presentation to understand what is CNN and how do they actually work.
Below topics are explained in this CNN presentation(Convolutional Neural Network presentation)
1. Introduction to CNN
2. What is a convolutional neural network?
3. How CNN recognizes images?
4. Layers in convolutional neural network
5. Use case implementation using CNN
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains, including significantly improved performance for ACID tables. The talk will also provide a glimpse of what is expected to come in the near future.
ArrayUDF: User-Defined Scientific Data Analysis on ArraysGoon83
User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management and other non-trivial tasks to the system. This general approach is at the heart of the modern Big Data systems, such MapReduce/Spark and SciDB. However, a wide variety of common scientific data operations -- such as computing the moving average of a time series, the vorticity of a fluid flow, etc., -- are hard to express and slow to execute with these Big Data systems. In this talk, we will introduce a brand new Big Data system namely ArrayUDF (https://bitbucket.org/arrayudf/arrayudf) for scientific data sets, especially for multi-dimensional arrays. The ArrayUDF allows flexible expressiveness of UDF for scientific data analysis on the strength of their common character--structural locality. ArrayUDF executes the UDF directly on arrays stored in files, such as HDF5, without any data load overload. ArrayUDF's desi
gn and implementation considerations for parallel data processing on large-scale HPC will also be introduced. The performance tests on Edison at NERSC show that ArrayUDF is around 2000X faster than Spark on processing large scientific datasets.
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
Searching for information within large sets of unstructured, heterogeneous scientific data can be very challenging unless an inverted index has been created in advance. Several solutions, mainly based on the Hadoop ecosystem, have been proposed to accelerate the process of index construction. These solutions perform well when data are already distributed across the cluster nodes involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index heterogeneous data. We further improve the performance by using GPUs and POSIX Threads programming for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark.
14th slide set of CECS 542
Non-functional requirements, namely the subcategories of quality requirements, process requirements, and system constraints
Complete course: http://foss2serve.org/index.php/Requirements_Engineering,_CSU_Long_Beach,_Penzenstadler
Speeding Time to Insight with a Modern ELT ApproachDatabricks
The availability of new tools in the modern data stack is changing the way data teams operate. Specifically, the modern data stack supports an “ELT” approach for managing data, rather than the traditional “ETL” approach. In an ELT approach, data sources are automatically loaded in a normalized state into Delta Lake and opinionated transformations happen in the data destination using dbt. This workflow allows data analysts to move more quickly from raw data to insight, while creating repeatable data pipelines robust to changes in the source datasets. In this presentation, we’ll illustrate how easy it is for even a data analytics team of one to to develop an end-to-end data pipeline. We’ll load data from GitHub into Delta Lake, then use pre-built dbt models to feed a daily Redash dashboard on sales performance by manager, and use the same transformed models to power the data science team’s predictions of future sales by segment.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
This was a presentation done for the Techspace of IoT Asia 2017 oon 30th March 2017. This is an introductory session to introduce the concept of Long Short-Term Memory (LSTMs) for the prediction in Time Series. I also shared the Keras code to work out a simple Sin Wave example and a Household power consumption data to use for the predictions. The links for the code can be found in the presentation.
Skip, residual and densely connected RNN architecturesfgodin
Slides presented during the Datascience Meetup @Sentiance. Based on the following paper:
"Improving Language Modeling using Densely Connected Recurrent Neural Networks".
See http://www.fredericgodin.com/publications/ for more info.
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
This presentation on Convolutional neural network tutorial (CNN) will help you understand what is a convolutional neural network, hoe CNN recognizes images, what are layers in the convolutional neural network and at the end, you will see a use case implementation using CNN. CNN is a feed forward neural network that is generally used to analyze visual images by processing data with grid like topology. A CNN is also known as a "ConvNet". Convolutional networks can also perform optical character recognition to digitize text and make natural-language processing possible on analog and hand-written documents. CNNs can also be applied to sound when it is represented visually as a spectrogram. Now, lets deep dive into this presentation to understand what is CNN and how do they actually work.
Below topics are explained in this CNN presentation(Convolutional Neural Network presentation)
1. Introduction to CNN
2. What is a convolutional neural network?
3. How CNN recognizes images?
4. Layers in convolutional neural network
5. Use case implementation using CNN
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains, including significantly improved performance for ACID tables. The talk will also provide a glimpse of what is expected to come in the near future.
ArrayUDF: User-Defined Scientific Data Analysis on ArraysGoon83
User-Defined Functions (UDF) allow application programmers to specify analysis operations on data, while leaving the data management and other non-trivial tasks to the system. This general approach is at the heart of the modern Big Data systems, such MapReduce/Spark and SciDB. However, a wide variety of common scientific data operations -- such as computing the moving average of a time series, the vorticity of a fluid flow, etc., -- are hard to express and slow to execute with these Big Data systems. In this talk, we will introduce a brand new Big Data system namely ArrayUDF (https://bitbucket.org/arrayudf/arrayudf) for scientific data sets, especially for multi-dimensional arrays. The ArrayUDF allows flexible expressiveness of UDF for scientific data analysis on the strength of their common character--structural locality. ArrayUDF executes the UDF directly on arrays stored in files, such as HDF5, without any data load overload. ArrayUDF's desi
gn and implementation considerations for parallel data processing on large-scale HPC will also be introduced. The performance tests on Edison at NERSC show that ArrayUDF is around 2000X faster than Spark on processing large scientific datasets.
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...Advanced-Concepts-Team
Searching for information within large sets of unstructured, heterogeneous scientific data can be very challenging unless an inverted index has been created in advance. Several solutions, mainly based on the Hadoop ecosystem, have been proposed to accelerate the process of index construction. These solutions perform well when data are already distributed across the cluster nodes involved in the elaboration. On the other hand, the cost of distributing data can introduce noticeable overhead. We propose ISODAC, a new approach aimed at improving efficiency without sacrificing reliability. Our solution reduces to the bare minimum the number of I/O operations by using a stream of in-memory operations to extract and index heterogeneous data. We further improve the performance by using GPUs and POSIX Threads programming for the most computationally intensive tasks of the indexing procedure. ISODAC indexes heterogeneous documents up to 10.6x faster than other widely adopted solutions, such as Apache Spark.
Invited talk at the SBPM'11 workshop associated to ESWC2011, where we summarise the key results of the ADMIRE project. Please note that some of the DISPEL syntax may be already obsolete by the time of reading this presentation.
TMPA-2017: Layered Layouts for Software Systems VisualizationIosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Layered Layouts for Software Systems Visualization
Alexey A. Mitsyuk, Yaroslav V. Kotylev, Higher School of Economics
For video follow the link: https://youtu.be/NFsgWQpMQjs
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
Leopard ISWC Semantic Web Challenge 2017 at ISWC2017.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Word Embedding Models & Support Vector Machines for Text ClassificationNa'im Tyson
An overview of Word Embedding Models and Support Vector Machines for use in the task of Text Classification. Emphasis is on applying vectors from Doc2Vec to train a Support Vector Machine to classify novel documents.
The talk was given at Montclair State University in a Text Analytics Class within the Business School.
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondAnne Nicolas
Berkeley Packet Filter is an old friend for most people that deal with network under Linux. But its extended version eBPF is completely redefining the scope of usage and interaction with the kernel. It can indeed be used to instrument most parts of the kernel. This goes from network tracing to process or I/O monitoring.
This talk will provide an overview of eBPF, from concept to tools like BCC. It will then focus on XDP for eXtreme Data Path and the possible applications in term of networking provided by this new framework.
Eric Leblond, Stamus Network
A brief introduction to deep learning, providing rough interpretation to deep neural networks and simple implementations with Keras for deep learning beginners.
This talk will examine issues of workflow execution, in particular using the Pegasus Workflow Management System, on distributed resources and how these resources can be provisioned ahead of the workflow execution. Pegasus was designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target cyberinfrastructure. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. In some cases, it is beneficial to provision the resources ahead of the workflow execution, enabling the re-use of resources across workflow tasks. The talk will examine the benefits of resource provisioning for workflow execution.
I summarize requirements for an "Open Analytics Environment" (aka "the Cauldron"), and some work being performed at the University of Chicago and Argonne National Laboratory towards its realization.
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Phil Carns from Argonne presents: HPC I/O for Computational Scientists.
"Darshan is a scalable HPC I/O characterization tool. It captures an accurate but concise picture of application I/O behavior with minimum overhead."
Darshan was originally developed on the IBM Blue Gene series of computers deployed at the Argonne Leadership Computing Facility, but it is portable across a wide variety of platforms include the Cray XE6, Cray XC30, and Linux clusters. Darshan routinely instruments jobs using up to 786,432 compute cores on the Mira system at ALCF.
Watch the video: https://wp.me/p3RLHQ-hv9
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
1. Introduction to Eyeriss1
Michael (Tao-Yi) Lee
tylee@mlpanda.rocks
NTU IoX Center
October 24, 2017
1Y. H. Chen et al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep
Convolutional Neural Networks”. In: IEEE Journal of Solid-State Circuits 52.1 (Jan. 2017),
pp. 127–138.
2. Outline
1 Introduction
2 Eyeriss Highlights
Memory Hierarchy
Row Stationary Data Flow
Network-on-a-chip (NoC)
Compression and Data Gating
3 Summary
4 Appendix
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 2 / 37
3. Introduction
Contributions of Eyeriss
A novel energy-efficient CNN dataflow that has been verified in
a fabricated chip
A taxonomy of CNN dataflows that classifies previous work into
three categories (WS, OS, NLR)
Figure: Eyeriss Die Photo (35 fps @ 278 mW running AlexNet[10])
4000µm
4000µm
168 PE
GlobalBuffer
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 3 / 37
4. Introduction
Features of Eyeriss
Use row stationary (RS) on spatial architecture with 168
processing elements to reduce energy cost of data flow
4 level memory hierachy: Maximally local data reuse
Network-on-a-chip (NoC)
Multicast
P2P single cycle delievery
Compression and data gating
Run-length compression (RLC)
PE data gating
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 4 / 37
6. Introduction
Recap on CNN
Forward computation in CONV layers
Given ofmap O, ifmap I, bias B, weight W, stride size U
O[z][u][x][y] =ReLU B[u] +
C−1
k=0
R−1
i=0
S−1
j=0
I[z][k][Ux + i][Uy + j] × W[u][k][i][j]
partial sum
(1)
where 0 ≤ z < N, 0 ≤ u < M, 0 ≤ y < E, 0 ≤ x < F
E = (H − R + U)/U (2)
F = (W − S + U)/U (3)
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 6 / 37
7. Eyeriss Highlights Memory Hierarchy
PE Matrix and Memory Hierarchy
1. Spatial Architecture: Allows data to flow in four directions
2. PE operates independently with one CLKcore (i.e. not systolic)
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
MAC
pixel
W
psumi
psumo
DRAM
Challenge
How to optimize data flow in order to minimize energy consumption?
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 7 / 37
8. Eyeriss Highlights Memory Hierarchy
PE Matrix and Memory Hierarchy (Eyeriss)
4 level memory hierachy
DRAM → Global Buffer (GLB) → Network-on-a-Chip (NoC) →
Register File (RF)
on-chip
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
RF(1kB)
(EC=1X)
NoC (EC=2X)
DRAM GLB
(EC=6X)
FIFO (EC2
=500X)
2Relative energy cost
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 8 / 37
9. Eyeriss Highlights Row Stationary Data Flow
CNN dataflows
Row Stationary
Weight Stationary (WS)
Output Stationary (OS)
No Local Reuse (NLR)
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 9 / 37
10. Eyeriss Highlights Row Stationary Data Flow
Comparison of Dataflows (I)
Focus on flows of psum, weight and pixels in next slides
RS uses 1.4X – 2.5X lower energy than other dataflows
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 10 / 37
11. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution
a b c
Kernel
∗
a b c d e
Image
=
a b c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
12. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution
a b c
Kernel
∗
a b c d e
Image
=
a b c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
13. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution
a b c
Kernel
∗
a b c d e
Image
=
a b c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 11 / 37
14. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution (PE)
a b c
Kernel
∗ a b c d e
Image
= a b c
PSum
PEReg File
c b a
c b a
a
de
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
15. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution (PE)
a b c
Kernel
∗ a b c d e
Image
= a b c
PSum
PEReg File
d c b
c b a
b
e
a
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
16. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 1D Convolution (PE)
a b c
Kernel
∗ a b c d e
Image
= a b c
PSum
PEReg File
e d c
c b a
c b a
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 12 / 37
17. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution
1a 1b 1c
2a 2b 2c
3a 3b 3c
Kernel
∗
1a 1b 1c 1d 1e
2a 2b 2c 2d 2e
3a 3b 3c 3d 3e
4a 4b 4c 4d 4e
5a 5b 5c 5d 5e
Image
=
1a 1b 1c
2a 2b 2c
3a 3b 3c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
18. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution
1a 1b 1c
2a 2b 2c
3a 3b 3c
Kernel
∗
1a 1b 1c 1d 1e
2a 2b 2c 2d 2e
3a 3b 3c 3d 3e
4a 4b 4c 4d 4e
5a 5b 5c 5d 5e
Image
=
1a 1b 1c
2a 2b 2c
3a 3b 3c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
19. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution
1a 1b 1c
2a 2b 2c
3a 3b 3c
Kernel
∗
1a 1b 1c 1d 1e
2a 2b 2c 2d 2e
3a 3b 3c 3d 3e
4a 4b 4c 4d 4e
5a 5b 5c 5d 5e
Image
=
1a 1b 1c
2a 2b 2c
3a 3b 3c
PSum
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 13 / 37
21. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a1b 2b 3b
PEReg File
3d 3c 3b
3c 3b 3a
1b
3e
1b
PEReg File
4d 4c 4b
3c 3b 3a
2b
4e
2b
PEReg File
5d 5c 5b
3c 3b 3a
3b
5e
3b
PEReg File
2d 2c 2b
2c 2b 2a
1b
2e
1b
PEReg File
3d 3c 3b
2c 2b 2a
2b
3e
2b
PEReg File
4d 4c 4b
2c 2b 2a
3b
4e
3b
PEReg File
1d 1c 1b
1c 1b 1a
1b
1e
1b
PEReg File
2d 2c 2b
1c 1b 1a
2b
2e
2b
PEReg File
3d 3c 3b
1c 1b 1a
3b
3e
3b
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
22. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a1b 2b 3b1c 2c 3c
PEReg File
3e 3d 3c
3c 3b 3a
1c 1b 1a
PEReg File
4e 4d 4c
3c 3b 3a
2c 2b 2a
PEReg File
5e 5d 5c
3c 3b 3a
3c 3b 3a
PEReg File
2e 2d 2c
2c 2b 2a
1c 1b 1a
PEReg File
3e 3d 3c
2c 2b 2a
2c 2b 2a
PEReg File
4e 4d 4c
2c 2b 2a
3c 3b 3a
PEReg File
1e 1d 1c
1c 1b 1a
1c 1b 1a
PEReg File
2e 2d 2c
1c 1b 1a
2c 2b 2a
PEReg File
3e 3d 3c
1c 1b 1a
3c 3b 3a
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
23. Eyeriss Highlights Row Stationary Data Flow
Row Stationary in 2D Convolution (PE)
1a 2a 3a1b 2b 3b1c 2c 3c
PEReg File
3e 3d 3c
3c 3b 3a
1c 1b 1a
PEReg File
4e 4d 4c
3c 3b 3a
2c 2b 2a
PEReg File
5e 5d 5c
3c 3b 3a
3c 3b 3a
PEReg File
2e 2d 2c
2c 2b 2a
1c 1b 1a
PEReg File
3e 3d 3c
2c 2b 2a
2c 2b 2a
PEReg File
4e 4d 4c
2c 2b 2a
3c 3b 3a
PEReg File
1e 1d 1c
1c 1b 1a
1c 1b 1a
PEReg File
2e 2d 2c
1c 1b 1a
2c 2b 2a
PEReg File
3e 3d 3c
1c 1b 1a
3c 3b 3a
Psum propagate vertically Psum propagate vertically Psum propagate vertically
Psum propagate diagnally Pixel propagate diagnally Pixel propagate diagnally
Pixel propagate diagnally
Pixel propagate diagnally
Weight propagate horizontally
Weight propagate horizontally
Weight propagate horizontally
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 14 / 37
24. Eyeriss Highlights Row Stationary Data Flow
Weight Stationary (WS)3
Minimize weight read energy consumption
maximize convolutional and filter reuse of weights
Examples:
Chakradhar et al. 2010
Gokhale et al. 2014
Park et al. 2015
Cavigelli et al. 2015
3Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 15 / 37
25. Eyeriss Highlights Row Stationary Data Flow
Output Stationary (OS)4
Minimize partial sum R/W energy consumption
maximize local accumulation
Examples:
Gupta et al. 2015
Du et al. 2015
Peemen et al. 2013
4Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 16 / 37
26. Eyeriss Highlights Row Stationary Data Flow
No Local Reuse (NLR)5
Use a large global buffer as shared storage
Reduce DRAM access energy consumption
Examples:
Chen et al. 2014
Chen et al. 2014
Zhang et al. 2015
5Image adopted from Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 17 / 37
27. Eyeriss Highlights Row Stationary Data Flow
Comparison of Dataflows (II)
RS reuses data in local register files (RF), a lot! ⇒ Saves energy of
moving data
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 18 / 37
28. Eyeriss Highlights Row Stationary Data Flow
Beyond 2D Convolution - Multiple Images
Processing in PE
Concatenate image rows
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 19 / 37
29. Eyeriss Highlights Row Stationary Data Flow
Beyond 2D Convolution - Multiple Filters
Processing in PE
Interleave filter rows
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 20 / 37
30. Eyeriss Highlights Row Stationary Data Flow
Beyond 2D Convolution - Multiple Channels
Processing in PE
Interleave channels
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 21 / 37
32. Eyeriss Highlights Row Stationary Data Flow
AlexNet PE Mapping
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 23 / 37
33. Eyeriss Highlights Row Stationary Data Flow
AlexNet Inter-Pass Data Caching
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 24 / 37
34. Eyeriss Highlights Row Stationary Data Flow
AlexNet Shape Parameters
How do we map AlexNet onto Eyeriss?
L H6
R E C M U
1 227 11 55 3 96 4
2 31 5 27 48 256 1
3 15 3 13 256 384 1
4 15 3 13 192 384 1
5 15 3 13 192 256 1
m7
n e p q r t
96 1 7 16 1 1 2
64 1 27 16 2 1 1
64 4 13 16 4 1 4
64 4 13 16 3 2 2
64 4 13 16 3 2 2
6H: ifmap width, R: kernel width, E: ofmap width, C: Channels, M: # kernels, U: Stride
7m: # ofmap chan stored in GLB, n: # ifmap, e: width of PE set, p: # filters proc., q: #
chan proc., r: # pe proc. diff. chan., t: # pe proc. diff. filter.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 25 / 37
35. Eyeriss Highlights Row Stationary Data Flow
AlexNet Shape Mapping Illustrated
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 26 / 37
36. Eyeriss Highlights Network-on-a-chip (NoC)
NoC Optimized for RS
Global input/output network: use Multicast Controller (MC) to
broadcaset GLB data into assigned PE. Data is augmented with
(row, col) in GLB
filter GI/ON
ifmap GI/ON
psum GI/ON
Local network: dedicated 64b data bus is implemented to pass
the psums from the bottom PE to the top PE directly
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 27 / 37
38. Eyeriss Highlights Network-on-a-chip (NoC)
Populate Data with Global Input / Output Network
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 29 / 37
39. Eyeriss Highlights Compression and Data Gating
Run-Length Compression (RLC)
ReLU produces many zeros in activated ofmap, use RLC to save
power in DRAM R/W
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 30 / 37
40. Eyeriss Highlights Compression and Data Gating
Data Gating / Zero Skipping
Simply skip tasks when either pixel or weight is zero
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 31 / 37
42. Summary
Performance Summary and Comparison
Eyeriss[5] NVIDIA TK1
Technology 65nm 1P9M 28nm
Chip Size 4.0×4.0 N/A
Core Area 3.5×3.5 N/A
Gate Count 1176k N/A
Word Bit-Width 16b Fixed 32b Float
Core Clock(MHz) 200 852
On-Chip Buffer Size (kB) 108 64
Total Register Size (kB) 75.3 256
#MAC 168 192
Throughput(fps) 34.7 68
Measured Power Idle (mW) 3700
Measured Power Active (mW) 278 10002
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 33 / 37
43. Summary
Summary
RS optimizes for best overall energy efficiency while existing
CNN dataflows only focus on certain data types.
RS has higher energy efficiency than existing dataflows
1.4X ∼ 2.5X higher in CONV layers
at least 1.3X higher in FC layers. (batch size ≥ 16)
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 34 / 37
44. Appendix
Bibliography I
Lukas Cavigelli et al. “Origami: A Convolutional Network Accelerator”. In: Proceedings of the 25th Edition on Great
Lakes Symposium on VLSI. GLSVLSI ’15. Pittsburgh, Pennsylvania, USA: ACM, 2015, pp. 199–204. isbn:
978-1-4503-3474-7. doi: 10.1145/2742060.2743766. url: http://doi.acm.org/10.1145/2742060.2743766.
Srimat Chakradhar et al. “A Dynamically Configurable Coprocessor for Convolutional Neural Networks”. In:
Proceedings of the 37th Annual International Symposium on Computer Architecture. ISCA ’10. Saint-Malo, France:
ACM, 2010, pp. 247–257. isbn: 978-1-4503-0053-7.
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. ISCA 2016 Slides of Eyeriss: A Spatial Architecture for Energy-Efficient
Dataflow for Convolutional Neural Networks.
Tianshi Chen et al. “DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning”. In:
Proceedings of the 19th International Conference on Architectural Support for Programming Languages and
Operating Systems. ASPLOS ’14. Salt Lake City, Utah, USA: ACM, 2014, pp. 269–284. isbn: 978-1-4503-2305-5.
doi: 10.1145/2541940.2541967. url: http://doi.acm.org/10.1145/2541940.2541967.
Y. H. Chen et al. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural
Networks”. In: IEEE Journal of Solid-State Circuits 52.1 (Jan. 2017), pp. 127–138.
Y. Chen et al. “DaDianNao: A Machine-Learning Supercomputer”. In: 2014 47th Annual IEEE/ACM International
Symposium on Microarchitecture. Dec. 2014, pp. 609–622. doi: 10.1109/MICRO.2014.58.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 35 / 37
45. Appendix
Bibliography II
Z. Du et al. “ShiDianNao: Shifting vision processing closer to the sensor”. In: 2015 ACM/IEEE 42nd Annual
International Symposium on Computer Architecture (ISCA). June 2015, pp. 92–104. doi:
10.1145/2749469.2750389.
V. Gokhale et al. “A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks”. In: 2014 IEEE Conference on
Computer Vision and Pattern Recognition Workshops. June 2014, pp. 696–701. doi: 10.1109/CVPRW.2014.106.
Suyog Gupta et al. “Deep Learning with Limited Numerical Precision”. In: Proceedings of the 32Nd International
Conference on International Conference on Machine Learning - Volume 37. ICML’15. Lille, France: JMLR.org, 2015,
pp. 1737–1746. url: http://dl.acm.org/citation.cfm?id=3045118.3045303.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural
Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. 2012, pp. 1097–1105.
S. Park et al. “4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture
for big-data applications”. In: 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical
Papers. Feb. 2015, pp. 1–3. doi: 10.1109/ISSCC.2015.7062935.
M. Peemen et al. “Memory-centric accelerator design for Convolutional Neural Networks”. In: 2013 IEEE 31st
International Conference on Computer Design (ICCD). Oct. 2013, pp. 13–19. doi: 10.1109/ICCD.2013.6657019.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 36 / 37
46. Appendix
Bibliography III
Chen Zhang et al. “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks”. In:
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA ’15.
Monterey, California, USA: ACM, 2015, pp. 161–170. isbn: 978-1-4503-3315-3. doi: 10.1145/2684746.2689060.
url: http://doi.acm.org/10.1145/2684746.2689060.
Michael Lee (NTU) Introduction to Eyeriss October 24, 2017 37 / 37