International Conference on AI and Mobile Services
Services Conference Federation (SCF)
San Diego, CA, USA
June 2019
Artificial Intelligence on the edge is a matter of great importance towards the enhancement of smart devices that rely on operations with real-time constraints. Despite the rapid growth of computational power in embedded systems, such as smartphones, wearable devices, drones and FPGAs, the deployment of highly complex and considerably big models remains challenging. Optimized execution requires managing memory allocation efficiently, to avoid overloading, and exploiting the available hardware resources for acceleration, which is not trivial given the non standardized access to such resources. We present PolimiDL, an open source framework for the acceleration of Deep Learning inference on mobile and embedded systems with limited resources and heterogeneous architectures. Experimental results show competitive results w.r.t. TensorFlow Lite for the execution of small models.
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
Scientific seminar at Politecnico di Milano
Como, Italy
September 2018
In recent years, Deep Learning has achieved outstanding results outperforming previous techniques and even humans, thus becoming the state-of-the-art in a wide range of tasks, among which Computer Vision has been one of the most benefited areas.
Nonetheless, most of this success is tightly coupled to strongly supervised learning tasks, which require highly accurate, expensive and labor-intensive defined ground truth labels.
In this presentation, we will introduce diverse alternatives to deal with this problem and support the training of Deep Learning models for Computer Vision tasks by simplifying the process of data labelling or exploiting the unlimited supply of publicly available data in Internet (such as user-tagged images from Flickr). Such alternatives rely on data comprising noisy and weak labels, which are much easier to collect but require special care to be used.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/imaging-systems-for-applied-reinforcement-learning-control-a-presentation-from-nanotronics/
Damas Limoge, Senior R&D Engineer at Nanotronics, presents the “Imaging Systems for Applied Reinforcement Learning Control” tutorial at the September 2020 Embedded Vision Summit.
Reinforcement learning has generated human-level decision-making strategies in highly complex game scenarios. But most industries, such as manufacturing, have not seen impressive results from the application of these algorithms, belying the utility hoped for by their creators. The limitations of reinforcement learning in real use cases intuitively manifest from the number of exploration examples needed to train the underlying models, but also from incomplete state representations for an artificial agent to act on.
In an effort to improve automated inspection for factory control through reinforcement learning, Nanotronics’ research is focused on improving the state representation of a manufacturing process using optical inspection as a basis for agent optimization. In this presentation, Limoge focuses on the imaging system: its design, implementation and utilization, in the context of a reinforcement agent.
Deep learning for image super resolutionPrudhvi Raj
Using Deep Convolutional Networks, the machine can learn end-to-end mapping between the low/high-resolution images. Unlike traditional methods, this method jointly optimizes all the layers of the image. A light-weight CNN structure is used, which is simple to implement and provides formidable trade-off from the existential methods.
Applying Deep Learning with Weak and Noisy labelsDarian Frajberg
Scientific seminar at Politecnico di Milano
Como, Italy
September 2018
In recent years, Deep Learning has achieved outstanding results outperforming previous techniques and even humans, thus becoming the state-of-the-art in a wide range of tasks, among which Computer Vision has been one of the most benefited areas.
Nonetheless, most of this success is tightly coupled to strongly supervised learning tasks, which require highly accurate, expensive and labor-intensive defined ground truth labels.
In this presentation, we will introduce diverse alternatives to deal with this problem and support the training of Deep Learning models for Computer Vision tasks by simplifying the process of data labelling or exploiting the unlimited supply of publicly available data in Internet (such as user-tagged images from Flickr). Such alternatives rely on data comprising noisy and weak labels, which are much easier to collect but require special care to be used.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/imaging-systems-for-applied-reinforcement-learning-control-a-presentation-from-nanotronics/
Damas Limoge, Senior R&D Engineer at Nanotronics, presents the “Imaging Systems for Applied Reinforcement Learning Control” tutorial at the September 2020 Embedded Vision Summit.
Reinforcement learning has generated human-level decision-making strategies in highly complex game scenarios. But most industries, such as manufacturing, have not seen impressive results from the application of these algorithms, belying the utility hoped for by their creators. The limitations of reinforcement learning in real use cases intuitively manifest from the number of exploration examples needed to train the underlying models, but also from incomplete state representations for an artificial agent to act on.
In an effort to improve automated inspection for factory control through reinforcement learning, Nanotronics’ research is focused on improving the state representation of a manufacturing process using optical inspection as a basis for agent optimization. In this presentation, Limoge focuses on the imaging system: its design, implementation and utilization, in the context of a reinforcement agent.
Deep learning for image super resolutionPrudhvi Raj
Using Deep Convolutional Networks, the machine can learn end-to-end mapping between the low/high-resolution images. Unlike traditional methods, this method jointly optimizes all the layers of the image. A light-weight CNN structure is used, which is simple to implement and provides formidable trade-off from the existential methods.
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general.
However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”.
The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them.
CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems.
HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain.
In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
A system was developed able to retrieve specific documents from a document collection. In this system the query is given in text by the user and then transformed into image. Appropriate features were in order to capture the general shape of the query, and ignore details due to noise or different fonts. In order to demonstrate the effectiveness of our system, we used a collection of noisy documents and we compared our results with those of a commercial OCR package.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/explainability-in-computer-vision-a-machine-learning-engineers-overview-a-presentation-from-altaml/
Navaneeth Kamballur Kottayil, Lead Machine Learning Developer at AltaML, presents the “Explainability in Computer Vision: A Machine Learning Engineer’s Overview” tutorial at the May 2021 Embedded Vision Summit.
With the increasing use of deep neural networks in computer vision applications, it has become more difficult for developers to explain how their algorithms work. This can make it difficult to establish trust and confidence among customers and other stakeholders, such as regulators. Lack of explainability also makes it more difficult for developers to improve their solutions.
In this talk, Kottayil introduces methods for enabling explainability in deep-learning-based computer vision solutions. He also illustrates some of these techniques via real-world examples, and shows how they can be used to improve customer trust in computer vision models, to debug computer vision models, to obtain additional insights about data and to detect bias in models.
Framework for Contextual Outlier Identification using Multivariate Analysis a...IJECEIAES
Majority of the existing commercial application for video surveillance system only captures the event frames where the accuracy level of captures is too poor. We reviewed the existing system to find that at present there is no such research technique that offers contextual-based scene identification of outliers. Therefore, we presented a framework that uses unsupervised learning approach to perform precise identification of outliers for a given video frames concerning the contextual information of the scene. The proposed system uses matrix decomposition method using multivariate analysis to maintain an equilibrium better faster response time and higher accuracy of the abnormal event/object detection as an outlier. Using an analytical methodology, the proposed system blocking operation followed by sparsity to perform detection. The study outcome shows that proposed system offers an increasing level of accuracy in contrast to the existing system with faster response time.
TRANSFER LEARNING BASED IMAGE VISUALIZATION USING CNNijaia
Image classification is a popular machine learning based applications of deep learning. Deep learning techniques are very popular because they can be effectively used in performing operations on image data in large-scale. In this paper CNN model was designed to better classify images. We make use of feature extraction part of inception v3 model for feature vector calculation and retrained the classification layer with these feature vector. By using the transfer learning mechanism the classification layer of the CNN model was trained with 20 classes of Caltech101 image dataset and 17 classes of Oxford 17 flower image dataset. After training, network was evaluated with testing dataset images from Oxford 17 flower dataset and Caltech101 image dataset. The mean testing precision of the neural network architecture with Caltech101 dataset was 98 % and with Oxford 17 Flower image dataset was 92.27 %.
Deep Convolutional Neural Network based Intrusion Detection SystemSri Ram
In the present era, the cyberspace is growing tremendously and Intrusion detection system (IDS) plays an key role in it to ensure the information security. The IDS, which works in network and host level, should be capable of identifying various malicious attacks. The job of network based IDS is to differentiate between normal and malicious traffic data and raise alert in case of an attack. Apart from the traditional signature and anomaly based approaches, many researchers have employed various Deep Learning (DL) techniques for detecting intrusion as DL models are capable of extracting salient features automatically from the input data. The application of Deep Convolutional Neural Network (DCNN), which is utilized quite often for solving research problems in image processing and vision fields, is not explored much for IDS. In this paper, a DCNN architecture for IDS which is trained on KDDCUP 99 data set is proposed. This work also shows that the DCNN-IDS model performs superior when compared with other existing works.
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
Lecture for Deep Learning 101 study group to be held on June 9th, 2017.
Reference book: https://www.deeplearningbook.org/
Past video archives: https://goo.gl/hxermB
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/)
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
In this deck from the GPU Technology Conference, Thorsten Kurth from Lawrence Berkeley National Laboratory and Josh Romero from NVIDIA present: Exascale Deep Learning for Climate Analytics.
"We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale."
Watch the video: https://wp.me/p3RLHQ-kgT
Learn more: https://ml4sci.lbl.gov/home
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 258번째 논문 review입니다.
이번 논문은 MIT에서 나온 From ImageNet to Image Classification: Contextualizing Progress on Benchmarks입니다.
Deep Learning 하시는 분들이면 ImageNet 모르시는 분들이 없을텐데요, 이 논문은 ImageNet의 labeling 방법의 한계와 문제점에 대해서 얘기하고 top-1 accuracy 기반의 평가 방법에도 문제가 있을 수 있음을 지적하고 있습니다.
ImageNet data의 20% 이상이 multi object를 포함하고 있지만 그 중에 하나만 정답으로 인정되는 문제가 있고, annotation 방법의 한계로 인하여 실제로 사람이 생각하는 것과 다른 class가 정답으로 labeling되어 있는 경우도 많았습니다. 또한 terrier만 20종이 넘는 등 전문가가 아니면 판단하기 어려운 label도 많다는 문제도 있었구요. 이 밖에도 다양한 실험을 통해서 정량적인 분석과 함께 human-in-the-loop을 이용한 평가로 현재 model들의 성능이 어디까지 와있는지, 그리고 앞으로 더 높은 성능을 내기 위해서 data labeling 측면에서 해결해야할 과제는 무엇인지에 대해서 이야기하고 있습니다. 논문이 양이 좀 많긴 하지만 기술적인 내용이 별로 없어서 쉽게 읽으실 수 있는데요, 자세한 내용이 궁금하신 분들은 영상을 참고해주세요!
논문링크: https://arxiv.org/abs/2005.11295
발표영상링크: https://youtu.be/CPMgX5ikL_8
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...Heechul Yun
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard distinguishes memory bandwidth as two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each core’s reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput.
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general.
However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”.
The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them.
CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems.
HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain.
In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
A system was developed able to retrieve specific documents from a document collection. In this system the query is given in text by the user and then transformed into image. Appropriate features were in order to capture the general shape of the query, and ignore details due to noise or different fonts. In order to demonstrate the effectiveness of our system, we used a collection of noisy documents and we compared our results with those of a commercial OCR package.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/explainability-in-computer-vision-a-machine-learning-engineers-overview-a-presentation-from-altaml/
Navaneeth Kamballur Kottayil, Lead Machine Learning Developer at AltaML, presents the “Explainability in Computer Vision: A Machine Learning Engineer’s Overview” tutorial at the May 2021 Embedded Vision Summit.
With the increasing use of deep neural networks in computer vision applications, it has become more difficult for developers to explain how their algorithms work. This can make it difficult to establish trust and confidence among customers and other stakeholders, such as regulators. Lack of explainability also makes it more difficult for developers to improve their solutions.
In this talk, Kottayil introduces methods for enabling explainability in deep-learning-based computer vision solutions. He also illustrates some of these techniques via real-world examples, and shows how they can be used to improve customer trust in computer vision models, to debug computer vision models, to obtain additional insights about data and to detect bias in models.
Framework for Contextual Outlier Identification using Multivariate Analysis a...IJECEIAES
Majority of the existing commercial application for video surveillance system only captures the event frames where the accuracy level of captures is too poor. We reviewed the existing system to find that at present there is no such research technique that offers contextual-based scene identification of outliers. Therefore, we presented a framework that uses unsupervised learning approach to perform precise identification of outliers for a given video frames concerning the contextual information of the scene. The proposed system uses matrix decomposition method using multivariate analysis to maintain an equilibrium better faster response time and higher accuracy of the abnormal event/object detection as an outlier. Using an analytical methodology, the proposed system blocking operation followed by sparsity to perform detection. The study outcome shows that proposed system offers an increasing level of accuracy in contrast to the existing system with faster response time.
TRANSFER LEARNING BASED IMAGE VISUALIZATION USING CNNijaia
Image classification is a popular machine learning based applications of deep learning. Deep learning techniques are very popular because they can be effectively used in performing operations on image data in large-scale. In this paper CNN model was designed to better classify images. We make use of feature extraction part of inception v3 model for feature vector calculation and retrained the classification layer with these feature vector. By using the transfer learning mechanism the classification layer of the CNN model was trained with 20 classes of Caltech101 image dataset and 17 classes of Oxford 17 flower image dataset. After training, network was evaluated with testing dataset images from Oxford 17 flower dataset and Caltech101 image dataset. The mean testing precision of the neural network architecture with Caltech101 dataset was 98 % and with Oxford 17 Flower image dataset was 92.27 %.
Deep Convolutional Neural Network based Intrusion Detection SystemSri Ram
In the present era, the cyberspace is growing tremendously and Intrusion detection system (IDS) plays an key role in it to ensure the information security. The IDS, which works in network and host level, should be capable of identifying various malicious attacks. The job of network based IDS is to differentiate between normal and malicious traffic data and raise alert in case of an attack. Apart from the traditional signature and anomaly based approaches, many researchers have employed various Deep Learning (DL) techniques for detecting intrusion as DL models are capable of extracting salient features automatically from the input data. The application of Deep Convolutional Neural Network (DCNN), which is utilized quite often for solving research problems in image processing and vision fields, is not explored much for IDS. In this paper, a DCNN architecture for IDS which is trained on KDDCUP 99 data set is proposed. This work also shows that the DCNN-IDS model performs superior when compared with other existing works.
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
Lecture for Deep Learning 101 study group to be held on June 9th, 2017.
Reference book: https://www.deeplearningbook.org/
Past video archives: https://goo.gl/hxermB
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/)
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
In this deck from the GPU Technology Conference, Thorsten Kurth from Lawrence Berkeley National Laboratory and Josh Romero from NVIDIA present: Exascale Deep Learning for Climate Analytics.
"We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale."
Watch the video: https://wp.me/p3RLHQ-kgT
Learn more: https://ml4sci.lbl.gov/home
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
TensorFlow Korea 논문읽기모임 PR12 258번째 논문 review입니다.
이번 논문은 MIT에서 나온 From ImageNet to Image Classification: Contextualizing Progress on Benchmarks입니다.
Deep Learning 하시는 분들이면 ImageNet 모르시는 분들이 없을텐데요, 이 논문은 ImageNet의 labeling 방법의 한계와 문제점에 대해서 얘기하고 top-1 accuracy 기반의 평가 방법에도 문제가 있을 수 있음을 지적하고 있습니다.
ImageNet data의 20% 이상이 multi object를 포함하고 있지만 그 중에 하나만 정답으로 인정되는 문제가 있고, annotation 방법의 한계로 인하여 실제로 사람이 생각하는 것과 다른 class가 정답으로 labeling되어 있는 경우도 많았습니다. 또한 terrier만 20종이 넘는 등 전문가가 아니면 판단하기 어려운 label도 많다는 문제도 있었구요. 이 밖에도 다양한 실험을 통해서 정량적인 분석과 함께 human-in-the-loop을 이용한 평가로 현재 model들의 성능이 어디까지 와있는지, 그리고 앞으로 더 높은 성능을 내기 위해서 data labeling 측면에서 해결해야할 과제는 무엇인지에 대해서 이야기하고 있습니다. 논문이 양이 좀 많긴 하지만 기술적인 내용이 별로 없어서 쉽게 읽으실 수 있는데요, 자세한 내용이 궁금하신 분들은 영상을 참고해주세요!
논문링크: https://arxiv.org/abs/2005.11295
발표영상링크: https://youtu.be/CPMgX5ikL_8
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...Heechul Yun
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard distinguishes memory bandwidth as two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each core’s reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput.
There are many challenges on FPGA design such as: FPGA Selection, System Design Challenges, Power and Resource optimization, Verification of Design etc.
Each and every FPGA Engineer face this challenges, so if they prepare for such challenges then they can accomplish and optimize FPGA based project or design in time and within budget.
For more details and consultation: www.digitronixnepal.com, email: digitronixnepali@gmail.com
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
Multi-die designs allow systems engineering to pack more functionality with different timing and power constraints into a single package. Older generation multi-die split the dies into high-speed and low speed. Newer, high-performance multi-die System-on-Chip (SoC) requires interaction between memories across the die-to-die interfaces. Connections between dies must be power efficient, have low latency, provide high bandwidth to transfer massive amounts of data, and deliver error-free operation. The distribution of cores, deep neural networks and AI engines across these dies makes it extremely hard to predict the expected end-to-end latency, power spikes and effective bandwidth. Moreover, Multi-die architectures have evolved from proprietary to industry standard UCIe.
This Webinar looks at the system-wide view of performance and power in a multi-die SOC. We will be showcasing a few use cases that combines various types of processing engines across PCIe and interconnected UCIe. This modeling effort will present the user with different system performance and system architecture models and a guide on how to best bring different aspects of their design together in a holistic way that is optimized for power, timing and functionality.
During the Webinar, users can follow along using VisualSim Cloud. To get started with VisualSim Cloud, users can register and receive a login at https://www.mirabilisdesign.com/visualsim-cloud-login/. Once you receive the login, follow the instructions, and open the models provided in the Template pull-down. More instructions will be provided at the start of the Webinar.
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
In this video from Switzerland HPC Conference, Martin Hilgeman from Dell presents: HPC Workload Efficiency and the Challenges for System Builders.
"With all the advances in massively parallel and multi-core computing with CPUs and accelerators it is often overlooked whether the computational work is being done in an efficient manner. This efficiency is largely being determined at the application level and therefore puts the responsibility of sustaining a certain performance trajectory into the hands of the user. It is observed that the adoption rate of new hardware capabilities is decreasing and lead to a feeling of diminishing returns. This presentation shows the well-known laws of parallel performance from the perspective of a system builder. It also covers through the use of real case studies, examples of how to program for energy efficient parallel application performance."
Watch the video: http://wp.me/p3RLHQ-gIS
Learn more: http://dell.com
and
http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...Numenta
Nick Ni (Xilinx) and Lawrence Spracklen (Numenta) presented a talk at the FGPA Conference Europe on July 8th, 2021. In this talk, they presented a neuroscience approach to optimize state-of-the-art deep learning networks into sparse topology and how it can unlock significant performance gains on FPGAs without major loss of accuracy. They then walked through the FPGA implementation where they exploited the advantage of sparse networks with a unique Domain Specific Architecture (DSA).
System on Chip is a an IC that integrates all the components of an electronic system. This presentation is based on the current trends and challenges in the IP based SOC design.
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsHannes Tschofenig
Position paper for the NIST Lightweight Cryptography Workshop, 20th and 21st July 2015, Gaithersburg, US.
The link to the workshop is available at: http://www.nist.gov/itl/csd/ct/lwc_workshop2015.cfm
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Accelerating Deep Learning Inference on Mobile Systems
1. Accelerating Deep Learning Inference
on Mobile Systems
Darian Frajberg
Carlo Bernaschina
Christian Marone
Piero Fraternali
June 27, 2019
2. 2
Typical implementations of Deep Learning (DL) models focus on
the maximization of accuracy for a given task.
Architectures to achieve such an objective have become
significantly deeper and more complex over time.
Top-5 error (%)
Introduction
3. 3
Artificial Intelligence (AI) on the edge is a
matter of great importance towards the
enhancement of smart devices that rely on
operations with real-time constraints.
Despite the rapid growth of computational
power in embedded systems, such as
smartphones, wearable devices, drones and
FPGAs, the deployment of highly complex and
considerably big DL models remains
challenging.
Introduction
5. 5
Related work
• Compression techniques.
– Quantization
– Pruning
– Knowledge distillation
– Tensor decomposition
• Optimized model architectures.
– SqueezeNet
– MobileNet v1
– MobileNet v2
– MnasNet
• Hardware acceleration.
– Neural Networks API
– OpenGL
– Vulkan
– Metal
6. 6
Related work
• Heterogeneous computing scheduling.
– Mobile GPU
– Custom implementations with access to hardware
primitives
• Mobile Deep Learning frameworks.
– TensorFlow Lite
– Caffe2
– CoreML
7. 7
Limitations
1. Hardware Acceleration primitives are still not
completely standardized and stable, but are
tightly dependent on SoC vendors.
2. Retraining or modifying the architecture of ready-
to-use models can be extremely time-consuming.
3. Post-training compression of already small
models can detriment accuracy.
8. 8
Use case
PeakLens is a real world mobile app that combines Augmented
Reality and Computer Vision (CV) for the identification of mountain
peaks.
It processes sensor readings and camera frames in real-time by
using an efficient on-board Deep Learning-powered CV module.
+400k installs
in Android
9. 9
Requirements
1. Focus on execution. It should be possible to train a model using tools already known
to the developer. The framework should focus just on execution concerns, without the
need of re-training.
2. Minimum dependencies. It should be possible to execute an optimized model
independently of the Operating System, hardware platform or model storage format.
3. Easy embedding. It should be possible to embed the framework and optimized models
into existing applications easily, without the need of ad-hoc integration procedures.
4. End-to-end optimization. Optimization should be applied as early as possible and
span the model life-cycle (generation, compilation, initialization, configuration,
execution).
5. Offline support. Computation should occur only on-board the embedded system,
without the need of a network connection for work off-loading.
6. No accuracy loss. The acceleration for constrained devices should not reduce
accuracy w.r.t. to the execution on a high performance infrastructure.
10. 10
The PolimiDL Framework
PolimiDL is an open source framework for
accelerating DL inference on mobile and embedded
systems, which was started when no efficient off-
the-shelf edge solutions were available.
Implementation is generic and aims at supporting
devices with limited power and heterogeneous
architectures.
12. 12
The PolimiDL Framework
• Generation-time optimizations.
– Layers fusion.
Consecutive in-place layers with identical filter size
can be fused into one single layer, thus reducing the
number of iterations over the cells of an input matrix.
Examples:
• Bias + ReLU = Bias_ReLU
• Batch_Normalization + ReLu6 =
BatchNormalization_ReLU6
13. 13
The PolimiDL Framework
• Generation-time optimizations.
– Weights fusion.
Layers applying functions with constant terms comprising multiple
weights can be pre-computed and encoded as unique constant weights,
thus reducing operations at run-time and potential temporary memory
allocation.
Example:
• Batch Normalization (BN)
14. 14
The PolimiDL Framework
• Generation-time optimizations.
– Weights rearrangement.
Weights associated to predefined Convolutional layer types are
stored in an order such that Eigen’s GEMM matrix operations
do not require any memory reshaping at run-time.
16. 16
The PolimiDL Framework
• Compile-time optimizations.
– Fixed network architecture.
The architecture of a model is fixed at compile-time,
which enables the compiler to perform per-layer
optimizations.
.SO
19. 19
The PolimiDL Framework
• Initialization-time optimizations.
– Memory pre-allocation.
Memory requirements can be reduced by fusing the 3
buffers into a single one. During initialization, each
layer is queried about its memory size requirements.
Layer
input
Layer
output
Temporary
data
20. 20
The PolimiDL Framework
• Initialization-time optimizations.
– Small tasks for low memory consumption.
The operation of certain layers is divided into smaller
tasks that can be executed independently, thus not
performing a complete input unroll, but maintaining a
fixed required size for the temporary memory.
Task
T0 T1 T2 T3 T4
T5 T6 T7 T8 T9
T10 T11 T12 T13 T14
T15 T16 T17 T18 T19
T20 T21 T22 T23 T24
22. 22
The PolimiDL Framework
• Configuration-time optimizations.
– Scheduling optimization.
The optimal size for a scheduled task may vary
depending on the specific layer, the underlying
architecture, or even on the input size for Fully
Convolutional Neural Networks.
The size can be:
• Set to a default value.
• Inferred by executing a profiling routine.
• Loaded from previous profiling routine executions.
24. 24
The PolimiDL Framework
• Run-time optimizations.
– Dynamic workload scheduling.
Dynamic multithreaded scheduling of tasks can adapt
well to different contexts such as ARM big.LITTLE
architecture and allows cores to be better exploited.
25. 25
The PolimiDL Framework
Layers coverage
Layer name In place Temp.
memory
Schedulable
Convolution X √ √
Depthwise convolution X √ √
Pointwise convolution
(out_channels <= in_channels)
√ √ √
Pointwise convolution
(out_channels > in_channels)
X X √
Max Pooling X √ X
Average Pooling X √ √
Batch normalization √ X √
Bias √ X X
ReLU/ReLU6 √ X X
29. 29
Experimental results
Device TensorFlow Lite (ms) PolimiDL (ms)
Asus Zenfone 2 1672.67 1138.00 (-31.96%)
Google Pixel 255.33 171.00 (-33.03%)
LG G5 SE 290.00 209.00 (-27.93%)
LG Nexus 5X 370.33 342.33 (-7.56%)
Motorola Nexus 6 505.33 215.67 (-57.32%)
One Plus 6T 144.33 91.00 (-36.95%)
Average (-32.46%)
PeakLens original
30. 30
Experimental results
Device TensorFlow Lite (ms) PolimiDL (ms)
Asus Zenfone 2 807.67 179.33 (-77.80%)
Google Pixel 95.00 35.33 (-62.81%)
LG G5 SE 138.33 68.00 (-50.84%)
LG Nexus 5X 193.00 80.33 (-58.38%)
Motorola Nexus 6 225.67 66.00 (-70.75%)
One Plus 6T 68.67 22.67 (-66.99%)
Average (-64.59%)
PeakLens optimized
31. 31
Experimental results
Device TensorFlow Lite (ms) PolimiDL (ms)
Asus Zenfone 2 775.33 377.33 (-51.33%)
Google Pixel 82.33 82.67 (+0.40%)
LG G5 SE 274.67 259.00 (-5.70%)
LG Nexus 5X 225.00 234.33 (+4.15%)
Motorola Nexus 6 298.33 176.00 (-41.01%)
One Plus 6T 56.67 51.67 (-8.82%)
Average (-17.05%)
MobileNet v1
32. Concept
– Open source framework for accelerating Deep Learning
inference on mobile and embedded systems, which has
proved competitive w.r.t. TensorFlow Lite.
Future work
– Extended support for more layers, quantization and
conversion from more DL frameworks.
– Extended evaluation with more configurations, metrics
and devices.
32
Conclusions
33. 33
Thanks For Your
Attention!
Accelerating Deep Learning
Inference on Mobile Systems
Darian Frajberg
Carlo Bernaschina
Christian Marone
Piero Fraternali
https://github.com/darianfrajberg/polimidldarian.frajberg@polimi.it
Editor's Notes
Compression techniques target large scale architectures and aim at reducing the number of parameters and floating point operations (FLOPs), possibly tolerating small accuracy drops in favor of execution acceleration and optimization of computational resources, storage, memory occupation and energy consumption.
Lightweight architectures with compact layers pursue the design of an optimized network topology, yielding small, fast and accurate models, suitable for resource-constrained devices.
HA is the use of dedicated hardware to complement general-purpose CPUs and perform computationally intensive work more efficiently, e.g. by favoring specific operations and data-parallel computation.
Heterogeneous computing scheduling comprises the design of strategies to efficiently coordinate and distribute the workload among processors of different types.
Frameworks for the execution of DL models on mobile and embedded systems pursue optimized deployment on devices with limited resources, by managing memory allocation efficiently and exploiting the available hardware resources at best.
Optimized execution requires managing memory allocation efficiently, to avoid overloading, and exploiting the available hardware resources for acceleration, which is not trivial given the non standardized access to such resources.
Evaluation exploits hardware with limited resources and models with a small-size architecture achieving a good trade-o between accuracy and latency. Three models with diverse characteristics, listed in Table 2, are evaluated.