For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
For real world application, convolutional neural network(CNN) model can take more than 100MB of space and can be computationally too expensive. Therefore, there are multiple methods to reduce this complexity in the state of art. Ristretto is a plug-in to Caffe framework that employs several model approximation methods. For this projects, first a CNN model is trained for Cifar-10 dataset with Caffe, then Ristretto will be use to generate multiple approximated version of the trained model using different schemes. The goal of this projects is comparison of the models in terms of execution performance, model size and cache utilizations in the test or inference phase. The same steps are done with Tensorflow and Quantisation tool. The quantisation schemes of Tensorflow and Ristretto are then compared.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/practical-dnn-quantization-techniques-and-tools-a-presentation-from-facebook/
Raghuraman Krishnamoorthi, Software Engineer at Facebook, presents the “Practical DNN Quantization Techniques and Tools” tutorial at the September 2020 Embedded Vision Summit.
Quantization is a key technique to enable the efficient deployment of deep neural networks. In this talk, Krishnamoorthi presents an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations.
Krishnamoorthi explores simple and advanced quantization approaches and examine their effects on latency and accuracy on various target processors. He also presents best practices for quantization-aware training to obtain high accuracy with quantized weights and activations.
Part 01 Linux Kernel Compilation (Ubuntu)Tushar B Kute
Presentation on "Linux Kernel Compilation" (Ubuntu based).
Presented at Army Institute of Technology, Pune for FDP on "Basics of Linux Kernel Programming". by Tushar B Kute (http://tusharkute.com).
K Nearest Neighbor V1.0 Supervised Machine Learning AlgorithmDataMites
Are you planning to learn machine learning algorithms?
Go through the slides for K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm information.
DataMites is providing a data science course with Machine learning algorithms. Join classroom training or ONLINE training for your course and get certified at the end of the course as a certified data scientist.
For more details visit: https://datamites.com/data-science-course-training-bangalore/
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
For real world application, convolutional neural network(CNN) model can take more than 100MB of space and can be computationally too expensive. Therefore, there are multiple methods to reduce this complexity in the state of art. Ristretto is a plug-in to Caffe framework that employs several model approximation methods. For this projects, first a CNN model is trained for Cifar-10 dataset with Caffe, then Ristretto will be use to generate multiple approximated version of the trained model using different schemes. The goal of this projects is comparison of the models in terms of execution performance, model size and cache utilizations in the test or inference phase. The same steps are done with Tensorflow and Quantisation tool. The quantisation schemes of Tensorflow and Ristretto are then compared.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/practical-dnn-quantization-techniques-and-tools-a-presentation-from-facebook/
Raghuraman Krishnamoorthi, Software Engineer at Facebook, presents the “Practical DNN Quantization Techniques and Tools” tutorial at the September 2020 Embedded Vision Summit.
Quantization is a key technique to enable the efficient deployment of deep neural networks. In this talk, Krishnamoorthi presents an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations.
Krishnamoorthi explores simple and advanced quantization approaches and examine their effects on latency and accuracy on various target processors. He also presents best practices for quantization-aware training to obtain high accuracy with quantized weights and activations.
Part 01 Linux Kernel Compilation (Ubuntu)Tushar B Kute
Presentation on "Linux Kernel Compilation" (Ubuntu based).
Presented at Army Institute of Technology, Pune for FDP on "Basics of Linux Kernel Programming". by Tushar B Kute (http://tusharkute.com).
K Nearest Neighbor V1.0 Supervised Machine Learning AlgorithmDataMites
Are you planning to learn machine learning algorithms?
Go through the slides for K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm information.
DataMites is providing a data science course with Machine learning algorithms. Join classroom training or ONLINE training for your course and get certified at the end of the course as a certified data scientist.
For more details visit: https://datamites.com/data-science-course-training-bangalore/
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/introduction-to-dnn-model-compression-techniques-a-presentation-from-xailient/
Sabina Pokhrel, Customer Success AI Engineer at Xailient, presents the “Introduction to DNN Model Compression Techniques” tutorial at the May 2021 Embedded Vision Summit.
Embedding real-time large-scale deep learning vision applications at the edge is challenging due to their huge computational, memory, and bandwidth requirements. System architects can mitigate these demands by modifying deep-neural networks to make them more energy efficient and less demanding of processing resources by applying various model compression approaches.
In this talk, Pokhrel provides an introduction to four established techniques for model compression. She discusses network pruning, quantization, knowledge distillation and low-rank factorization compression approaches.
BeagleBone Black is one of the most popular open hardware that is available to learn Embedded Linux. This versatile platform helps you to explore different set of peripherals and helps you to load a custom Embedded distribution. This presentation briefly introduces you with BeagleBone Black.
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, networks need to run on the end device within a tight power and thermal budget. One approach to help address these issues is quantization, which attempts to reduce the number of bits used for weight parameters and activation calculations without sacrificing model accuracy. This presentation covers: why quantization is important, existing quantization challenges, Qualcomm AI Research's existing quantization research, and how developers and researchers can take advantage of quantization on Qualcomm Snapdragon.
Small introduction to FPGA acceleration and the impact of the new High Level Synthesis toolchains to their programmability
Video here: https://www.linkedin.com/posts/marcobarbone_can-my-application-benefit-from-fpga-acceleration-activity-6848674747375460352-0fua
In this presentation, we approach a two-class classification problem. We try to find a plane that separates the class in the feature space, also called a hyperplane. If we can't find a hyperplane, then we can be creative in two ways: 1) We soften what we mean by separate, and 2) We enrich and enlarge the featured space so that separation is possible.
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
Explore how to build a unified framework based on FFmpeg and GStreamer to enable video analytics on all Intel® hardware, including CPUs, GPUs, VPUs, FPGAs, and in-circuit emulators.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2021/09/introduction-to-dnn-model-compression-techniques-a-presentation-from-xailient/
Sabina Pokhrel, Customer Success AI Engineer at Xailient, presents the “Introduction to DNN Model Compression Techniques” tutorial at the May 2021 Embedded Vision Summit.
Embedding real-time large-scale deep learning vision applications at the edge is challenging due to their huge computational, memory, and bandwidth requirements. System architects can mitigate these demands by modifying deep-neural networks to make them more energy efficient and less demanding of processing resources by applying various model compression approaches.
In this talk, Pokhrel provides an introduction to four established techniques for model compression. She discusses network pruning, quantization, knowledge distillation and low-rank factorization compression approaches.
BeagleBone Black is one of the most popular open hardware that is available to learn Embedded Linux. This versatile platform helps you to explore different set of peripherals and helps you to load a custom Embedded distribution. This presentation briefly introduces you with BeagleBone Black.
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, networks need to run on the end device within a tight power and thermal budget. One approach to help address these issues is quantization, which attempts to reduce the number of bits used for weight parameters and activation calculations without sacrificing model accuracy. This presentation covers: why quantization is important, existing quantization challenges, Qualcomm AI Research's existing quantization research, and how developers and researchers can take advantage of quantization on Qualcomm Snapdragon.
Small introduction to FPGA acceleration and the impact of the new High Level Synthesis toolchains to their programmability
Video here: https://www.linkedin.com/posts/marcobarbone_can-my-application-benefit-from-fpga-acceleration-activity-6848674747375460352-0fua
In this presentation, we approach a two-class classification problem. We try to find a plane that separates the class in the feature space, also called a hyperplane. If we can't find a hyperplane, then we can be creative in two ways: 1) We soften what we mean by separate, and 2) We enrich and enlarge the featured space so that separation is possible.
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
Explore how to build a unified framework based on FFmpeg and GStreamer to enable video analytics on all Intel® hardware, including CPUs, GPUs, VPUs, FPGAs, and in-circuit emulators.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/practical-approaches-to-dnn-quantization-a-presentation-from-magic-leap/
Dwith Chenna, Senior Embedded DSP Engineer for Computer Vision at Magic Leap, presents the “Practical Approaches to DNN Quantization” tutorial at the May 2023 Embedded Vision Summit.
Convolutional neural networks, widely used in computer vision tasks, require substantial computation and memory resources, making it challenging to run these models on resource-constrained devices. Quantization involves modifying CNNs to use smaller data types (e.g., switching from 32-bit floating-point values to 8-bit integer values).
Quantization is an effective way to reduce the computation and memory bandwidth requirements of these models, and their memory footprints, making it easier to run them on edge devices. However, quantization does degrade the accuracy of CNNs. In this talk, Chenna surveys practical techniques for CNN quantization and shares best practices, tools and recipes to enable you to get the best results from quantization, including ways to minimize accuracy loss.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Availability of large data sets like ImageNet and VGG has provided scopes of applying machine learning classifiers to train models. However high data dimensionality is an issue while training classifiers such as Support Vector Machine (SVM) and perceptron. To reduce data dimensionality and take advantage of parallel and distributed processing, we propose a framework with Convolutional Neural Network (CNN) as Feature extractor and SVM and perceptron as the classifier. MPI (Message passing interface) was used for programming clusters of CPUs. SVM showed 1.05x times improvement over perceptron in terms of run time and CNN reduced data dimensionality by 10x times.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/10/deep-neural-network-training-diagnosing-problems-and-implementing-solutions-a-presentation-from-sensor-cortek/
Fahed Hassanat, Chief Operating Officer and Head of Engineering at Sensor Cortek, presents the “Deep Neural Network Training: Diagnosing Problems and Implementing Solutions” tutorial at the May 2023 Embedded Vision Summit.
In this presentation, Hassanat delves into some of the most common problems that arise when training deep neural networks. He provides a brief overview of essential training metrics, including accuracy, precision, false positives, false negatives and F1 score.
Hassanat then explores training challenges that arise from problems with hyperparameters, inappropriately sized models, inadequate models, poor-quality datasets, imbalances within training datasets and mismatches between training and testing datasets. To help detect and diagnose training problems, he also covers techniques such as understanding performance curves, recognizing overfitting and underfitting, analyzing confusion matrices and identifying class interaction issues.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
Mx net image segmentation to predict and diagnose the cardiac diseases karp...KannanRamasamy25
Powerful open-source deep learning framework instrument
MXNet supports multiple languages like C++, Python, R, Julia, Perl etc
MXNet supported by Intel, Dato, Baidu, Microsoft, Wolfram Research, and research institutions such as Carnegie Mellon, MIT, the University of Washington, and the Hong Kong University of Science and Technology
Symbolic Execution: Static symbolic graph executor, which provides efficient symbolic graph execution and optimization.
Supports an efficient deployment of a trained model to low-end devices for inference, such as mobile devices, IoT devices (using AWS Greengrass), Serverless (Using AWS Lambda) or containers.
Buffer Allocation Problem is an important research issue in manufacturing system design.
Objective of this paper is to find optimum buffer allocation for closed queuing network with
multi servers at each node. Sum of buffers in closed queuing network is constant. Attempt is
made to find optimum number of pallets required to maximize throughput of manufacturing
system which has pre specified space for allocating pallets. Expanded Mean Value Analysis is
used to evaluate the performance of closed queuing network. Particle Swarm Optimization is
used as generative technique to optimize the buffer allocation. Numerical experiments are
shown to explain effectiveness of procedure
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Integer quantization for deep learning inference: principles and empirical evaluation
1. INTEGER QUANTIZATION FOR DEEP
LEARNING INFERENCE: PRINCIPLES
AND EMPIRICAL EVALUATION
NVIDIA, Georgia Tech.
Arxiv 2020
Presenter: Jemin Lee
https://leejaymin.github.io/index.html
7 Oct. 2020
Neural Acceleration Study Season #3
2. Contents
• Introduction and Background
• Quantization Fundamentals
• Post-Training Quantization
• Techniques to Recover Accuracy
• Recommended Workflow
• Conclusion
Neural Acceleration Study #2
3. Introduction
• Low-precision formats offer several performance benefits
• First, many processors provide higher throughput math pipelines the low-bit
formats, which can speed up math-intensive operations, such as convolutions and
matrix multiplications.
• Second, smaller word sizes reduce memory bandwidth pressure, improving
performance for bandwidth-limited computations.
• Third, smaller word sizes lead to lower memory size requirements, which can
improve cache utilization as well as other aspects of memory-system operation.
Neural Acceleration Study #3
4. Introduction
• Review the mathematical fundamentals underlying various
integer quantization choices (Section 3) as well as techniques
for recovering accuracy lost due to quantization (Section 5).
Section 6 combines this information into a recommended
workflow.
Neural Acceleration Study #3
5. Related Work
• Uniform [7,59] quantization enables the use of integer or fixed-
point math pipelines, allowing computation to be performed in the
quantized domain.
• Non-uniform[13,60,34,2] quantization requires dequantization, e.g.
a codebook lookup, before doing computation in higher precision,
limiting its benefits to model compression and bandwidth reduction.
• This paper focuses on leveraging quantization to accelerate
computation, so we will restrict our focus to uniform quantization
schemes.
• In this paper, we present an evaluation of int8 quantization on all of
the major network architectures with both PTQ and QAT.
Neural Acceleration Study #3
6. Quantization Fundamentals
• Uniform quantization steps
• 1) Choose the range of the real numbers to be quantized, clamping the
values outside this range.
• 2) Map the real values to integers representable by the bit-width of the
quantized representation (round each mapped real value to the closest
integer value).
• Quantize: convert a real number to a quantized integer
representation (e.g. from fp32 to int8).
• Dequantize: convert a number from quantized integer
representation to a real number (e.g. from int32 to fp16).
Neural Acceleration Study #3
7. Quantization Fundamentals: Affine Quantization
(Asymmetric)
Neural Acceleration Study #3
• 𝒇 𝒙 = 𝒔 % 𝒙 + 𝒛
• S is scale factor and z is zero-point
• Int8
• S=
𝟐𝟓𝟓
𝜷$𝜶
, z=-round(𝜷 " 𝒔)-128
9. Quantization Fundamentals: Tensor Quantization
Granularity
Neural Acceleration Study #3
• At the coarsest, per-tensor granularity, the same quantization
parameters are shared by all elements in the tensor.
• The finest granularity would have individual quantization
parameters per element.
• tensor - per row or per column for 2D matrices,
• per channel for 3D (image-like) tensors.
10. Quantization Fundamentals: Tensor Quantization
Granularity
Neural Acceleration Study #3
• Integer matrix multiplication
• per-row or per-tensor for activations
• per-column or per-tensor for weights
• The equation is as below:
11. Quantization Fundamentals: Tensor Quantization
Granularity
Neural Acceleration Study #3
• For activations, only per-tensor quantization is practical for
performance reasons.
• The number of rows depends on batch instances.
• Row count can vary at inference time.
• It prevents the per-row scaling factor from being computation
offline.
• (which would not be meaningful for different instances in a mini-
batch), whereas determining them online imposes a compute
overhead and in some cases results in poor accuracy (Dynamic
Quantization discussion in [58]).
12. Quantization Fundamentals: Tensor Quantization
Granularity
Neural Acceleration Study #3
• The corresponding granularity to per-column in convolutions is
per-kernel, or equivalently per-output-channel since each
kernel produces a separate output channel [27, 29].
• This is commonly referred to as “per-channel” weight
quantization in literature and we follow that convention [21,
25, 26, 38, 46]. We examine the granularity impact on accuracy
in Section 4.1.
13. Quantization Fundamentals: Computational Cost of
Affine Quantization
Neural Acceleration Study #3
• Breakdown three terms:
• the integer dot product, just as in scale quantization (Equation 10).
• only integer weights and zero-points can be computed offline, only adding an
element-wise addition at inference time.
• If the layer has a bias then this term can be folded in without increasing inference cost.
• involves the quantized input matrix Xq, and thus cannot be computed
offline.
• overhead, reducing or even eliminating the throughput advantage that integer math
pipelines have over reduced precision floating-point.
• extra computation is incurred only if affine quantization is used for the
weight matrix.
• Scale quantization for weights.
• Affine quantization could be used for activations
14.
15. Quantization Fundamentals: Calibration
Neural Acceleration Study #3
• Max: Use the maximum absolute value seen during calibration [52].
• Entropy: Use KL divergence to minimize information loss between the original floating-point
values and values that could be represented by the quantized format. This is the default method
used by TensorRT [36].
• Percentile: Set the range to a percentile of the distribution of absolute values seen during
calibration [33]. For example, 99% calibration would clip 1% of the largest magnitude values.
• Clipping the distribution trades off a large clipping error on a few outlier values for smaller
rounding errors on a majority of the values.
16. Post Training Quantization
Neural Acceleration Study #3
• Quantization parameters are calibrated offline by processing the trained model weights and
activations generated by running inference on a sample dataset, no further training is involved.
17. Post Training Quantization
Neural Acceleration Study #3
• refer to the relative accuracy change, computed by (accint8 − accfp32)/accfp32
18. Post Training Quantization: Weight Quantization
Neural Acceleration Study #3
• Int8 weights: max calibration
• However, as BN parameters are learned per channel, their folding can result in significantly
different weight value distributions across channels.
• Table 4 reports per-channel (per-column for linear layers) granularity and indicates that max
calibration is sufficient to maintain accuracy when quantizing weights to int8. The rest of the
experiments in this paper use per-channel max calibration for weights.
19. Post Training Quantization: Activation Quantization
Neural Acceleration Study #3
• For most of the networks, there is at least one activation calibration method that achieves
acceptable accuracy,
• except for MobileNets, EfficientNets, Transformer and BERT where the accuracy drop is larger than 1%.
• no single calibration is best for all networks.
20. Techniques to Recover Accuracy
Neural Acceleration Study #3
• Partial quantization: leaves the most sensitive layers unquantized
• One also has an option to train networks with quantization
• there are also approaches that jointly learn the model weights and quantization parameters.
21. Techniques to Recover Accuracy: Partial Quantization
Neural Acceleration Study #3
• Partial quantization: leaves the most sensitive layers unquantized
22. Techniques to Recover Accuracy: Quantization Aware
Training
Neural Acceleration Study #3
• Intuition for QAT
• the model has converged to a “narrow” minimum, since a small change in the weight leads to a large change in loss. By
training with quantization, we may potentially avoid these narrow minima by computing gradients with respect to the
quantized weights, as shown in Figure 6b. In doing so, narrow minima will result in larger gradients, potentially allowing
the model to explore the loss landscape for “wide” [22] or “flat” [16, 30] minima, where quantization points have lower
loss, and thus higher accuracy.
23. Techniques to Recover Accuracy: Quantization Aware
Training
Neural Acceleration Study #3
• Fake Quantization
• FakeQuant(x) = DQ(Q(x)) = INT8(s*x) / s
• Problem: The quantization function is non-differentiable
• Solution: Straight Through Estimation (STE)
24. Techniques to Recover Accuracy: Quantization Aware
Training
Neural Acceleration Study #3
• Result
25. Techniques to Recover Accuracy: Learning Quantization
Parameters
Neural Acceleration Study #3
• Apply the same fine-tuning schedule as before.
• but allow the ranges of each quantized activation tensor to be learned along with the weights, as opposed to keeping
them fixed throughout fine-tuning.
• Use PACT[1]
• Fixed best: best accuracy with PTQ as described in Table 5.
• Use max calibration for activation quantization
• Learning the range results in higher accuracy than keeping it fixed for most networks.
• In best calibration cases, learning the ranges generate very similar results.
• Not to offer extra benefit for int8 over QAT if activation ranges are already carefully calibrated.
• PACT could yield better results with longer fine-tuning or a separate process.
[1] Pact: Parameterized clipping activation for quantized neural networks, ICLR18
26. Techniques to Recover Accuracy: Learning Quantization
Parameters
Neural Acceleration Study #3
• Complex fine-tuning procedure
[1] Pact: Parameterized clipping activation for quantized neural networks, ICLR18
27. Recommended Workflow (Recap)
• Weights:
• Per-column (FC), per-channel (conv.)
• Symmetric (high performance)
• Max calibration (static)
• Activations:
• Per tensor, symmetric
• PTQ
• Activation calibration could select the method
among max, entropy, 99.99%, and 99.999%.
• If none, do Partial quant, or QAT
• Partial Quantization:
• Perform sensitivity analysis to identify the most sensitive layers
and leave them in floating-point.
• If none, do QAT.
• QAT:
• Start from the best calibrated quantized model.
• Use QAT to fine-tune for around 10% of the
original training schedule
with an annealing learning rate schedule starting at
1% of the initial training learning rate.
• For details, refer to Appendix A.2.
Neural Acceleration Study #3
28. Conclusion
• Review the mathematical background for integer quantization of neural networks, as
well as some performance-related reasons for choosing quantization parameters.
• Empirically evaluate various choices for int8 quantization of a variety of models, leading
to a quantization workflow proposal.
• MobileNets and BERT are challengeable for quantization.
• Workflow composes post-training quantization, partial quantization, and quantization-aware fine-tuning
techniques.
• Some more complex techniques, such as ADMM and distillation, were not required for
int8 quantization of these models.
• However, these techniques should be evaluated when quantizing to even lower-bit
integer representations, which we leave to future work.
Neural Acceleration Study #3