The ISMVL (Int'l Symp. on Multiple-Valued Logic) presentation slide on May, 22nd, 2017 at Novi Sad, Serbia. It is a kind of machine learning to realize a high-performance and low power.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Deshanand Singh, Director of Software Engineering at Altera, presents the "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Convolutional neural networks (CNN) are becoming increasingly popular in embedded applications such as vision processing and automotive driver assistance systems. The structure of CNN systems is characterized by cascades of FIR filters and transcendental functions. FPGA technology offers a very efficient way of implementing these structures by allowing designers to build custom hardware datapaths that implement the CNN structure. One challenge of using FPGAs revolves around the design flow that has been traditionally centered around tedious hardware description languages.
In this talk, Deshanand gives a detailed explanation of how CNN algorithms can be expressed in OpenCL and compiled directly to FPGA hardware. He gives detail on code optimizations and provides comparisons with the efficiency of hand-coded implementations.
This slide is going to introduce the concept of TensorFlow based on the source code study, including tensor, operation, computation graph and execution.
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Altoros
1. The elements of Neural Networks: Weights, Biases, and Gating functions
2. MNIST (Hand writing recognition) using simple NN in TensorFlow (Introduce Tensors, Computation Graphs)
3. MNIST using Convolution NN in TensorFlow
4. Understanding words and sentences as Vectors
5. word2vec in TensorFlow
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, CEO and Founder of Auviz Systems, presents the "Trade-offs in Implementing Deep Neural Networks on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Video and images are a key part of Internet traffic—think of all the data generated by social networking sites such as Facebook and Instagram—and this trend continues to grow. Extracting usable information from video and images is thus a growing requirement in the data center. For example, object and face recognition are valuable for a wide range of uses, from social applications to security applications. Deep neural networks are currently the most popular form of convolutional neural networks (CNN) used in data centers for such applications. 3D convolutions are a core part of CNNs. Nagesh presents alternative implementations of 3D convolutions on FPGAs, and discusses trade-offs among them.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, Founder and CEO of Auviz Systems, presents the "Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial at the May 2016 Embedded Vision Summit.
Recent research in deep learning provides powerful tools that begin to address the daunting problem of automated scene understanding. Modifying deep learning methods, such as CNNs, to classify pixels in a scene with the help of the neighboring pixels has provided very good results in semantic segmentation. This technique provides a good starting point towards understanding a scene. A second challenge is how such algorithms can be deployed on embedded hardware at the performance required for real-world applications. A variety of approaches are being pursued for this, including GPUs, FPGAs, and dedicated hardware.
This talk provides insights into deep learning solutions for semantic segmentation, focusing on current state of the art algorithms and implementation choices. Gupta discusses the effect of porting these algorithms to fixed-point representation and the pros and cons of implementing them on FPGAs.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Deshanand Singh, Director of Software Engineering at Altera, presents the "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Convolutional neural networks (CNN) are becoming increasingly popular in embedded applications such as vision processing and automotive driver assistance systems. The structure of CNN systems is characterized by cascades of FIR filters and transcendental functions. FPGA technology offers a very efficient way of implementing these structures by allowing designers to build custom hardware datapaths that implement the CNN structure. One challenge of using FPGAs revolves around the design flow that has been traditionally centered around tedious hardware description languages.
In this talk, Deshanand gives a detailed explanation of how CNN algorithms can be expressed in OpenCL and compiled directly to FPGA hardware. He gives detail on code optimizations and provides comparisons with the efficiency of hand-coded implementations.
This slide is going to introduce the concept of TensorFlow based on the source code study, including tensor, operation, computation graph and execution.
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Altoros
1. The elements of Neural Networks: Weights, Biases, and Gating functions
2. MNIST (Hand writing recognition) using simple NN in TensorFlow (Introduce Tensors, Computation Graphs)
3. MNIST using Convolution NN in TensorFlow
4. Understanding words and sentences as Vectors
5. word2vec in TensorFlow
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, CEO and Founder of Auviz Systems, presents the "Trade-offs in Implementing Deep Neural Networks on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Video and images are a key part of Internet traffic—think of all the data generated by social networking sites such as Facebook and Instagram—and this trend continues to grow. Extracting usable information from video and images is thus a growing requirement in the data center. For example, object and face recognition are valuable for a wide range of uses, from social applications to security applications. Deep neural networks are currently the most popular form of convolutional neural networks (CNN) used in data centers for such applications. 3D convolutions are a core part of CNNs. Nagesh presents alternative implementations of 3D convolutions on FPGAs, and discusses trade-offs among them.
- POSTECH EECE695J, "딥러닝 기초 및 철강공정에의 활용", 2017-11-10
- Contents: introduction to reccurent neural networks, LSTM, variants of RNN, implementation of RNN, case studies
- Video: https://youtu.be/pgqiEPb4pV8
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, Founder and CEO of Auviz Systems, presents the "Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial at the May 2016 Embedded Vision Summit.
Recent research in deep learning provides powerful tools that begin to address the daunting problem of automated scene understanding. Modifying deep learning methods, such as CNNs, to classify pixels in a scene with the help of the neighboring pixels has provided very good results in semantic segmentation. This technique provides a good starting point towards understanding a scene. A second challenge is how such algorithms can be deployed on embedded hardware at the performance required for real-world applications. A variety of approaches are being pursued for this, including GPUs, FPGAs, and dedicated hardware.
This talk provides insights into deep learning solutions for semantic segmentation, focusing on current state of the art algorithms and implementation choices. Gupta discusses the effect of porting these algorithms to fixed-point representation and the pros and cons of implementing them on FPGAs.
Big Data and the Internet of Things (IoT) have the potential
to fundamentally shift the way we interact with our surroundings. The
challenge of deriving insights from the Internet of Things (IoT) has
been recognized as one of the most exciting and key opportunities for
both academia and industry. Advanced analysis of big data streams from
sensors and devices is bound to become a key area of data mining
research as the number of applications requiring such processing
increases. Dealing with the evolution over time of such data streams,
i.e., with concepts that drift or change completely, is one of the
core issues in stream mining. In this talk, I will present an
overview of data stream mining, and I will introduce
some popular open source tools for data stream mining.
"The DEBS Grand Challenge 2017" as presented in the The 11th ACM International Conference on Distributed and Event-Based Systems, 19 - 23 June, 2017 held in Barcelona, Spain
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Real Time Human Posture Detection with Multiple Depth SensorsWassim Filali
This thesis presents a comprehensive study of the state-of-the-art in human posture reconstruction, its contexts, and associated applications. The underlying research focuses on utilization of computer vision techniques for human activity recognition based on embedded system technologies and intelligent camera systems. It also focuses on human posture reconstruction as it plays a key role in subsequent activity recognition. In this work, we have relied on the latest technological advances in sensor technology, specifically on the advent of Kinect, an RGB-D sensor from Microsoft, to realize a low-level sensor fusion algorithm to fuse the outputs of multiple depth sensors for human posture reconstruction.
In this endeavor, the different challenges encountered are: (1) occlusions when using a single sensor; (2) the combinatorial complexity of learning a high dimensional space corresponding to human postures; and finally, (3) embedded systems constraints. The proposed system addresses and consequently resolves each of these challenges.
The fusion of multiple depth sensors gives better result than individual sensors as the fusion alleviates the majority of occlusions by resolving many incoherencies thus by guaranteeing improved robustness and completeness on the observed scene. In this manuscript, we have elaborated the low-level fusion strategy which makes up the main contribution of this thesis. We have adopted a learning technique based on decision forests. Our algorithm is applied on our own learning dataset acquired with our multi-platform kinect coupled to a commercial motion capture system.
The two main principal features are sensor data fusion and supervised learning. Specifically, the data fusion technique is described by acquisition, segmentation, and voxelization which generates a 3D reconstruction of the occupied space. The supervised learning is based on decision forests and uses appropriate descriptors extracted from the reconstructed data. Various experiments including specific parameter learning (tuning) runs have been realized.
Qualitative and quantitative comparative human articulation reconstruction precision evaluations against the state-of-the-art strategies have also been carried out.
The different algorithms have been implemented on a personal computer environment which helped to analyze the essential parts that needs hardware embedded integration. The hardware integration consisted of studying and comparing multiple approaches. FPGA is a platform that meets both the performance and embeddability criteria as it provides resources that reduce CPU cost. This allowed us to make a contribution which constitutes a hierarchically prioritized design via a layer of intermediary modules. Comparative studies have also been done using background subtraction implementation as a benchmark integrated on PC, GPU, and FPGA (the FPGA implementation has been presented in detail).
اسلایدهای کارگاه پردازش های موازی با استفاده از زیرساخت جی پی یو GPU
اولین کارگاه ملی رایانش ابری کشور
وحید امیری
vahidamiry.ir
دانشگاه صنعتی امیرکبیر - 1391
Efficient architecture to condensate visual information driven by attention ...Sara Granados Cabeza
This are the slides from my PhD dissertation. I developed a new representation map for visual information (such as disparity, optical flow, etc.) that I've called "semidense" representation. This novel representation reduces the memory and bandwidth needs for embedded platforms and real-time systems.
Interactive Latency in Big Data Visualizationbigdataviz_bay
Interactive Latency in Big Data Visualization
Zhicheng "Leo" Liu, Research Scientist at the Creative Technologies Lab at Adobe Research
January 22nd, 2014
Reducing interactive latency is a central problem in visualizing large datasets. I discuss two inter-related projects in this problem space. First, I present the imMens system and show how we can achieve real-time interaction at 50 frames per second for billions of data points by combining techniques such as data tiling and parallel processing. Second, I discuss an ongoing user study that aims to understand the effect of interactive latency on human cognitive behavior in exploratory visual analysis.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
This was presented by Louis Ledoux and Marc Casas at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/1a/presentation_louis_ledoux_posit.pdf
Similar to A Random Forest using a Multi-valued Decision Diagram on an FPGa (20)
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
A Random Forest using a Multi-valued Decision Diagram on an FPGa
1. A Random Forest using a Multi-valued
Decision Diagram on an FPGA
1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato,
2Tsutomu Sasao
1Tokyo Institute of Technology, JP, 2Meiji University, JP
May, 22nd, 2017
@ISMVL2017
3. Machine Learning
3
Much computation power, and Big data
(Left): “Single-Threaded Integer Performance,” 2016
(Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
5. Introduction
• Random Forest (RF)
• Ensemble learning method
• Consists of multiple decision trees (DTs)
• Applications: Segmentation, human pose
detection
• It is based on binary DTs (BDTs)
• A node is evaluated by an if-then-else
statement
• The same variable may appear several times
• Multiple-valued decision diagram (MDD)
• Each variable appears only once on a path
5
6. Introduction (Contʼd)
• Target platform
• CPU: Too slow
• GPU: Not suitable to the RF → slow, and
consumes much power
• FPGA: Faster, low power, long TAT
• High-level synthesis (HLS) for the RF using
MDDs on an FPGA
• Low power, high performance,
short design time
6
8. Classification by a Binary
Decision Tree (BDT)
• Partition of the feature map
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
8
9. Training of a BDT
• It is built by randomized samples
• Recursively partition the dataset to maximize its
entropy → The same variables may appear
9
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1 C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
10. Random Forest (RF)
• Ensemble learning
• Classification and regression
• Consists of multiple BDT
10
Tree 1 Tree 2 Tree n
C1
C2
C1
Voter
C1 (Class)
InputX1<0.53?
X3<0.71? X2<0.63?
X2<0.63? X3<0.72?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C3
C1
Tree 1
Binary Decision Tree (BDT) Random Forest
...
11. Applications
• Key point matching [Lepetit et al., 2006]
• Object detector [Shotton et al., 2008][Gall et al., 2011]
• Hand written character recognition [Amit&Geman, 1997]
• Visual word clustering
[Moosmann et al.,2006]
• Pose recognition
[Yamashita et al., 2010]
• Human detector
[Mitsui et al., 2011]
[Dahang et al., 2012]
• Human pose estimation
[Shotton 2011]
11
12. Known Problem
• Build BDTs from randomized samples
• The same variable may appear on a path
• Tend to be slow, even if we use the GPUs
12
X2<0.53?
X2<0.29? X2<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
if X2 < 0.09 then
output C1;
else
goto Child_node;
14. 14
Binary Decision Diagram (BDD)
• Recursively apply Shannon expansion to a
given logic function
• Non-terminal node: If-then-else statement
• Terminal node: Set functional value
0 1
x1
x2
x3
x4
x5
x6
Non‐terminal node
Terminal node
15. 15
Measurement of BDD
Memory size: # of nodes size of a node
Worst case performance: LPL (Longest Path Length)
→Dedicated fully pipeline hardware
0 1
x1
x2
x3
x4
x5
x6
16. 16
Multi-Valued Decision Diagram (MDD)
• MDD(k): 2k outgoing edges
• Evaluates k variables at a time
0 1
x1
x2
x3
x4
x5
x6
BDD
0 1
X3
X2
X1
{x5,x6}
{x3,x4}
{x1,x2}
MDD(2)
17. Comparison the BDT with the MDD
17
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
X2
X1 X1
C1 C2
<0.29
<0.53
<1.00
<1.00
<0.71
<0.71
<1.00
<0.63
BDT MDD
19. Complexities of the BDT
and the MDD
19
# Nodes LPL
BDT O(Σ|Xi|) O(Σ|Xi|)
MDD O(|Xi|k) O(n)
The RF prefers shallow decision trees for avoid
the overfitting
21. FPGA (Field Programmable
Gate Array)
• Reconfigurable architecture
• Look-up Table (LUT)
• Configurable channel
• Advantages
• Faster than CPU
• Dissipate lower power
than GPU
• Short time design
than ASIC
21
24. System Design Tool
24
①
②
④
③
1. Behavior design
+ pragmas
2. Profile analysis
3. IP core generation by HLS
4. Bitstream generation by
FPGA CAD tool
5. Middle ware generation
↓
Automatically done
28. Comparison of Platforms
• Implemented RF following devices
• CPU: Intel Core i7 650
• GPU: NVIDIA GeForce GTX Titan
• FPGA: Terasic DE5-NET
• Measure dynamic power including
the host PC
• Test bench: 10,000 random vectors
• Execution time including
communication time between
the host PC and devices
28
GPU
FPGA
30. Conclusion
• Proposed the RF using MDDs
• Reduced the path length
• Increased the column multiplicity
• # of nodes: O(|X|k)
• The shallow decision diagram is
recommended to avoid the overfitting
• Developed the high-level synthesis design
flow toward the FPGA realization
• 10.7x faster than the GPU
• 14.0x faster than the CPU
30