In this talk, an overview of current trends in machine learning will be discussed with an emphasize on challenges and opportunities facing this field. It will focus on deep learning methods and applications. Deep learning has emerged as one of the most promising research fields in artificial intelligence. The significant advancements that deep learning methods have brought about for large scale image classification tasks have generated a surge of excitement in applying the techniques to other problems in computer vision and more broadly into other disciplines of computer science. Moreover, the impact of machine learning on education, research, and economy will be briefly presented. The rapid growth of machine learning is positioned to impact our lives in a way that we have not been able to fully imagine. It behooves government leaders to take a lead in developing the necessary resources to ride the projected benefits of machine learning.
The field of artificial intelligence (AI) has witnessed tremendous growth in recent years with the advent of Deep Neural Networks (DNNs) that surpass humans in a variety of cognitive tasks.
This document summarizes a human brain-inspired circuit system developed by Stanford engineers. The system uses digital neurons and neurocore chips similar to human brain neurons to simulate up to 1 million neurons and billions of connections. While initially costly, the technology is now faster and more energy efficient than typical PCs. It offers potential applications in robotics and brain research.
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementDATAVERSITY
Leading edge AI applications have always been resource-intensive and known for stretching the limits of conventional (von Neumann architecture) computer performance. Specialized hardware, purpose built to optimize AI applications, is not new. In fact, it should be no surprise that the very first .com internet domain was registered to Symbolics - a company that built the Lisp Machine, a dedicated AI workstation - in 1985. In the last three decades, of course, the performance of conventional computers has improved dramatically with advances in chip density (Moore’s Law) leading to faster processor speeds, memory speeds, and massively parallel architectures. And yet, some applications - like machine vision for real time video analysis and deep machine learning - always need more power.
Participants in this webinar will learn the fundamentals of the three hardware approaches that are receiving significant investments and demonstrating significant promise for AI applications.
- neuromorphic/neurosynaptic architectures (brain-inspired hardware)
- GPUs (graphics processing units, optimized for AI algorithms), and
- quantum computers (based on principles and properties of quantum-mechanics rather than binary logic).
Note - This webinar requires no previous knowledge of hardware or computer architectures.
1) Blue Brain is a project to build a synthetic brain by simulating the human brain on a supercomputer. It involves scanning a person's brain with nanobots to upload their intelligence and memories.
2) The goal is to understand how the brain works by simulating its biological functions and neural connections on a highly powerful system. This could help cure brain diseases and allow people to live on digitally after death.
3) While Blue Brain may provide benefits like enhanced memory and intelligence, it also poses disadvantages such as risks to human cloning and potential dependency on computers to store knowledge.
FPGA Hardware Accelerator for Machine Learning
Machine learning publications and models are growing exponentially, outpacing Moore's law. Hardware acceleration using FPGAs, GPUs, and ASICs can provide performance gains over CPU-only implementations for machine learning workloads. FPGAs allow for reprogramming after manufacturing and can accelerate parts of machine learning algorithms through customized hardware while sharing computations between the FPGA and CPU. Vitis AI is a software stack that optimizes machine learning models for deployment on Xilinx FPGAs, providing pre-optimized models, tools for optimization and quantization, and high-level APIs.
Hardware Acceleration for Machine LearningCastLabKAIST
This document provides an overview of a lecture on hardware acceleration for machine learning. The lecture will cover deep neural network models like convolutional neural networks and recurrent neural networks. It will also discuss various hardware accelerators developed for machine learning, including those designed for mobile/edge and cloud computing environments. The instructor's background and the agenda topics are also outlined.
The document discusses the emergence of computation for interdisciplinary large data analysis. It notes that exponential increases in computational power and data are driving changes in science and engineering. Computational modeling is becoming a third pillar of science alongside theory and experimentation. However, continued increases in clock speeds are no longer feasible due to power constraints, necessitating the use of multi-core processors and parallelism. This is driving changes in software design to expose parallelism.
The field of artificial intelligence (AI) has witnessed tremendous growth in recent years with the advent of Deep Neural Networks (DNNs) that surpass humans in a variety of cognitive tasks.
This document summarizes a human brain-inspired circuit system developed by Stanford engineers. The system uses digital neurons and neurocore chips similar to human brain neurons to simulate up to 1 million neurons and billions of connections. While initially costly, the technology is now faster and more energy efficient than typical PCs. It offers potential applications in robotics and brain research.
Smart Data Slides: Emerging Hardware Choices for Modern AI Data ManagementDATAVERSITY
Leading edge AI applications have always been resource-intensive and known for stretching the limits of conventional (von Neumann architecture) computer performance. Specialized hardware, purpose built to optimize AI applications, is not new. In fact, it should be no surprise that the very first .com internet domain was registered to Symbolics - a company that built the Lisp Machine, a dedicated AI workstation - in 1985. In the last three decades, of course, the performance of conventional computers has improved dramatically with advances in chip density (Moore’s Law) leading to faster processor speeds, memory speeds, and massively parallel architectures. And yet, some applications - like machine vision for real time video analysis and deep machine learning - always need more power.
Participants in this webinar will learn the fundamentals of the three hardware approaches that are receiving significant investments and demonstrating significant promise for AI applications.
- neuromorphic/neurosynaptic architectures (brain-inspired hardware)
- GPUs (graphics processing units, optimized for AI algorithms), and
- quantum computers (based on principles and properties of quantum-mechanics rather than binary logic).
Note - This webinar requires no previous knowledge of hardware or computer architectures.
1) Blue Brain is a project to build a synthetic brain by simulating the human brain on a supercomputer. It involves scanning a person's brain with nanobots to upload their intelligence and memories.
2) The goal is to understand how the brain works by simulating its biological functions and neural connections on a highly powerful system. This could help cure brain diseases and allow people to live on digitally after death.
3) While Blue Brain may provide benefits like enhanced memory and intelligence, it also poses disadvantages such as risks to human cloning and potential dependency on computers to store knowledge.
FPGA Hardware Accelerator for Machine Learning
Machine learning publications and models are growing exponentially, outpacing Moore's law. Hardware acceleration using FPGAs, GPUs, and ASICs can provide performance gains over CPU-only implementations for machine learning workloads. FPGAs allow for reprogramming after manufacturing and can accelerate parts of machine learning algorithms through customized hardware while sharing computations between the FPGA and CPU. Vitis AI is a software stack that optimizes machine learning models for deployment on Xilinx FPGAs, providing pre-optimized models, tools for optimization and quantization, and high-level APIs.
Hardware Acceleration for Machine LearningCastLabKAIST
This document provides an overview of a lecture on hardware acceleration for machine learning. The lecture will cover deep neural network models like convolutional neural networks and recurrent neural networks. It will also discuss various hardware accelerators developed for machine learning, including those designed for mobile/edge and cloud computing environments. The instructor's background and the agenda topics are also outlined.
The document discusses the emergence of computation for interdisciplinary large data analysis. It notes that exponential increases in computational power and data are driving changes in science and engineering. Computational modeling is becoming a third pillar of science alongside theory and experimentation. However, continued increases in clock speeds are no longer feasible due to power constraints, necessitating the use of multi-core processors and parallelism. This is driving changes in software design to expose parallelism.
In this talk, after a brief overview of AI concepts in particular Machine Learning (ML) techniques, some of the well-known computer design concepts for high performance and power efficiency are presented. Subsequently, those techniques that have had a promising impact for computing ML algorithms are discussed. Deep learning has emerged as a game changer for many applications in various fields of engineering and medical sciences. Although the primary computation function is matrix vector multiplication, many competing efficient implementations of this primary function have been proposed and put into practice. This talk will review and compare some of those techniques that are used for ML computer design.
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
Deep learning is both computationally and memory intensive, necessitating enhancements in processor performance. In this issue, we explore how this has led to the rise of startups adopting alternative, innovative approaches and how it is expected to pave the way for different types of AI-optimized chipsets.
The primary reasons for using parallel computing:
Save time - wall clock time
Solve larger problems
Provide concurrency (do multiple things at the same time)
Cloud computing is a type of computing that allows for ubiquitous and on-demand access to shared computing resources like networks, servers, storage, applications and services. These resources can be rapidly provisioned and released with minimal management effort. Some key types of cloud computing include high-performance computing (HPC), parallel computing, distributed computing, cluster computing and grid computing. Distributed computing uses multiple computers connected through a network to work together as a single system. Examples include computer networks like the internet, intranets and applications like databases and online banking systems. Mobile computing allows transmission of data, voice and video via wireless devices without needing a fixed physical link.
Computational science and engineering (CSE) utilizes high performance computing, large-scale simulations, and scientific applications to enable data-driven discovery. The speaker discusses CSE initiatives at UC Berkeley and Lawrence Berkeley National Lab focused on areas like health, freshwater, food security, ecosystems, and urban metabolism using exascale computing and big data analytics.
Hard computing relies on precise analytical models and computations to solve problems, while soft computing uses approximate techniques inspired by the human brain to provide usable solutions to complex problems. The document then discusses hard computing techniques like algorithms and mathematical formulas. It notes soft computing's advantages like robustness, low cost, and ability to solve complex problems. Soft computing techniques discussed include fuzzy logic, genetic algorithms, and neural networks. Applications mentioned include internet search, robotics, and speech recognition.
The document discusses high performance computing and the path towards exascale systems. It covers key application requirements in areas like cancer research, climate modeling, and materials science. Technological challenges for exascale include power and resilience issues. The US Department of Energy is funding several exascale development programs through 2020, including the CANDLE project applying deep learning to precision cancer medicine. Reaching exascale will enable new capabilities in big data analytics, machine learning, and commercial applications.
The Implementing AI: Hardware Challenges, hosted by KTN and eFutures, is the first event of the Implementing AI webinar series to address the challenges and opportunities that realising AI for hardware present.
There will be presentations from hardware organisations and from solution providers in the morning; followed by Q&A. The afternoon session will consist of virtual breakout rooms, where challenges raised in the morning session can be workshopped.
Artificial Intelligence now impacts every aspect of modern life and is key to the generation of valuable business insights.
Implementing AI webinar series is designed for people involved in the management and implementation of AI based solutions – from developers to CTOs.
Find out more: https://ktn-uk.co.uk/news/just-launched-implementing-ai-webinar-series
The document discusses the future of high performance computing (HPC). It covers several topics:
- Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily.
- Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system.
- The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
In this video, Dr. Umit Catalyurek from Georgia Institute of Technology presents: Modern Computing: Cloud, Distributed, & High Performance.
Ümit V. Çatalyürek is a Professor in the School of Computational Science and Engineering in the College of Computing at the Georgia Institute of Technology. He received his Ph.D. in 2000 from Bilkent University. He is a recipient of an NSF CAREER award and is the primary investigator of several awards from the Department of Energy, the National Institute of Health, and the National Science Foundation. He currently serves as an Associate Editor for Parallel Computing, and as an editorial board member for IEEE Transactions on Parallel and Distributed Computing, and the Journal of Parallel and Distributed Computing.
Learn more: http://www.bigdatau.org/data-science-seminars
Watch the video presentation: http://wp.me/p3RLHQ-ghU
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document provides information about processors and CPU terminology. It defines terms like data bus, address bus, registers, instruction set, and cache. It describes how CPUs work using transistors and how manufacturers like Intel and AMD make CPUs. It outlines the components of CPUs like execution cores, arithmetic logic units, and memory controllers. The document provides a timeline of CPUs from the 1970s to recent years to show advances in processing power and core counts.
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...inside-BigData.com
Machine learning algorithms are increasingly being used across many domains and are affected by technology trends. Deep learning techniques have achieved human-level performance in tasks like speech recognition and face recognition. Training machine learning models requires massive parallelism and computational resources that are well-suited to GPU and multi-core architectures. Reduced precision computation can accelerate training but may impact convergence. Specialized hardware continues to evolve for both training and inference.
The document summarizes the major research areas in computer science, dividing them into theoretical and applied branches. Theoretical areas include theory of computation, algorithms and data structures, programming language theory, and formal methods. Applied areas include artificial intelligence, computer architecture, computer graphics, computer security, and software engineering. The document traces the history of computer science as a field and provides examples to illustrate key concepts within each research area.
This document provides an overview of parallel and distributed systems. It discusses that a parallel computer contains multiple processing elements that communicate and cooperate to solve problems quickly, while a distributed system contains independent computers that appear as a single system. It notes that parallel computers are implicitly distributed systems. It then discusses reasons for using parallel and distributed computing like Moore's law and limitations of sequential processing due to power and latency walls. Finally, it outlines some topics that will be covered in the course like different parallel computing platforms, programming paradigms, and challenges in parallel and distributed systems.
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence SpracklenNumenta
Numenta's Director of ML Architecture Lawrence Spracklen presented a talk at the SBMT Annual Congress on July 10th, 2021. He talked about how neuroscience principles can inspire better machine learning algorithms.
This document provides an overview of computer hardware and software topics, covering:
1. The background and generations of computers from the abacus to modern devices.
2. The types of computers based on size from supercomputers to smartphones.
3. Key components inside a computer like the CPU, memory, ports, and input/output devices.
4. How computers are connected in networks and different network topologies.
5. The different types of software including system software that manages hardware and application software for specific tasks.
"El álgebra lineal es una herramienta fundamental en muchos campos de la ciencia y la tecnología. Es particularmente importante en la física, la ingeniería, la informática y la estadística. La capacidad de manipular eficientemente grandes cantidades de datos y matrices complejas es esencial en estas áreas para la resolución de problemas y la toma de decisiones.
A priori, puede dar la sensación de que estamos muy lejos del uso del álgebra lineal en nuestro día a día. Sin embargo, algunas técnicas como la descomposición en valores singulares y la regresión lineal para entrenar modelos y hacer predicciones precisas están detrás de la inteligencia artificial y el aprendizaje automático. ¿Te suena ChatGPT? Puede no parecerlo, pero el álgebra lineal también está detrás en algunos de sus procesos. Por este motivo, debemos seguir trabajando en este campo, ya que su importancia seguirá creciendo a medida que se generen y analicen grandes cantidades de datos en el mundo actual.
"
La pandemia de COVID-19 ha supuesto una proliferación de mapas y contramapas. Por ello, organizaciones de la sociedad civil y movimientos sociales han generado sus propias interpretaciones y representaciones de los datos sobre la crisis. Estos también han contribuido a visibilizar aspectos, sujetos y temas que han sido desatendidos o infrarrepresentados en las visualizaciones hegemónicas y dominantes. En este contexto, la presente ponencia se centra en el análisis de los imaginarios sociales relacionados con la elaboración de mapas durante la pandemia. Es decir, trata de indagar en la importancia de los mapas para el activismo digital, las potencialidades que se extraen de esta tecnología y los valores asociados a las visualizaciones creadas con ellos. El objetivo último es reflexionar sobre la vía emergente del activismo de datos, así como sobre la intersección entre los imaginarios sociales y la geografía digital.
More Related Content
Similar to Machine Learning Challenges and Opportunities in Education, Industry, and Research
In this talk, after a brief overview of AI concepts in particular Machine Learning (ML) techniques, some of the well-known computer design concepts for high performance and power efficiency are presented. Subsequently, those techniques that have had a promising impact for computing ML algorithms are discussed. Deep learning has emerged as a game changer for many applications in various fields of engineering and medical sciences. Although the primary computation function is matrix vector multiplication, many competing efficient implementations of this primary function have been proposed and put into practice. This talk will review and compare some of those techniques that are used for ML computer design.
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
Deep learning is both computationally and memory intensive, necessitating enhancements in processor performance. In this issue, we explore how this has led to the rise of startups adopting alternative, innovative approaches and how it is expected to pave the way for different types of AI-optimized chipsets.
The primary reasons for using parallel computing:
Save time - wall clock time
Solve larger problems
Provide concurrency (do multiple things at the same time)
Cloud computing is a type of computing that allows for ubiquitous and on-demand access to shared computing resources like networks, servers, storage, applications and services. These resources can be rapidly provisioned and released with minimal management effort. Some key types of cloud computing include high-performance computing (HPC), parallel computing, distributed computing, cluster computing and grid computing. Distributed computing uses multiple computers connected through a network to work together as a single system. Examples include computer networks like the internet, intranets and applications like databases and online banking systems. Mobile computing allows transmission of data, voice and video via wireless devices without needing a fixed physical link.
Computational science and engineering (CSE) utilizes high performance computing, large-scale simulations, and scientific applications to enable data-driven discovery. The speaker discusses CSE initiatives at UC Berkeley and Lawrence Berkeley National Lab focused on areas like health, freshwater, food security, ecosystems, and urban metabolism using exascale computing and big data analytics.
Hard computing relies on precise analytical models and computations to solve problems, while soft computing uses approximate techniques inspired by the human brain to provide usable solutions to complex problems. The document then discusses hard computing techniques like algorithms and mathematical formulas. It notes soft computing's advantages like robustness, low cost, and ability to solve complex problems. Soft computing techniques discussed include fuzzy logic, genetic algorithms, and neural networks. Applications mentioned include internet search, robotics, and speech recognition.
The document discusses high performance computing and the path towards exascale systems. It covers key application requirements in areas like cancer research, climate modeling, and materials science. Technological challenges for exascale include power and resilience issues. The US Department of Energy is funding several exascale development programs through 2020, including the CANDLE project applying deep learning to precision cancer medicine. Reaching exascale will enable new capabilities in big data analytics, machine learning, and commercial applications.
The Implementing AI: Hardware Challenges, hosted by KTN and eFutures, is the first event of the Implementing AI webinar series to address the challenges and opportunities that realising AI for hardware present.
There will be presentations from hardware organisations and from solution providers in the morning; followed by Q&A. The afternoon session will consist of virtual breakout rooms, where challenges raised in the morning session can be workshopped.
Artificial Intelligence now impacts every aspect of modern life and is key to the generation of valuable business insights.
Implementing AI webinar series is designed for people involved in the management and implementation of AI based solutions – from developers to CTOs.
Find out more: https://ktn-uk.co.uk/news/just-launched-implementing-ai-webinar-series
The document discusses the future of high performance computing (HPC). It covers several topics:
- Next generation HPC applications will involve larger problems in fields like disaster simulation, urban science, and data-intensive science. Projects like the Square Kilometer Array will generate exabytes of data daily.
- Hardware trends include using many-core processors, accelerators like GPUs, and heterogeneous computing with CPUs and GPUs. Future exascale systems may use conventional CPUs with GPUs or innovative architectures like Japan's Post-K system.
- The top supercomputers in the world currently include Summit, a IBM system combining Power9 CPUs and Nvidia Voltas at Oak Ridge, and China's Sunway Taihu
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
In this video, Dr. Umit Catalyurek from Georgia Institute of Technology presents: Modern Computing: Cloud, Distributed, & High Performance.
Ümit V. Çatalyürek is a Professor in the School of Computational Science and Engineering in the College of Computing at the Georgia Institute of Technology. He received his Ph.D. in 2000 from Bilkent University. He is a recipient of an NSF CAREER award and is the primary investigator of several awards from the Department of Energy, the National Institute of Health, and the National Science Foundation. He currently serves as an Associate Editor for Parallel Computing, and as an editorial board member for IEEE Transactions on Parallel and Distributed Computing, and the Journal of Parallel and Distributed Computing.
Learn more: http://www.bigdatau.org/data-science-seminars
Watch the video presentation: http://wp.me/p3RLHQ-ghU
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document provides information about processors and CPU terminology. It defines terms like data bus, address bus, registers, instruction set, and cache. It describes how CPUs work using transistors and how manufacturers like Intel and AMD make CPUs. It outlines the components of CPUs like execution cores, arithmetic logic units, and memory controllers. The document provides a timeline of CPUs from the 1970s to recent years to show advances in processing power and core counts.
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...inside-BigData.com
Machine learning algorithms are increasingly being used across many domains and are affected by technology trends. Deep learning techniques have achieved human-level performance in tasks like speech recognition and face recognition. Training machine learning models requires massive parallelism and computational resources that are well-suited to GPU and multi-core architectures. Reduced precision computation can accelerate training but may impact convergence. Specialized hardware continues to evolve for both training and inference.
The document summarizes the major research areas in computer science, dividing them into theoretical and applied branches. Theoretical areas include theory of computation, algorithms and data structures, programming language theory, and formal methods. Applied areas include artificial intelligence, computer architecture, computer graphics, computer security, and software engineering. The document traces the history of computer science as a field and provides examples to illustrate key concepts within each research area.
This document provides an overview of parallel and distributed systems. It discusses that a parallel computer contains multiple processing elements that communicate and cooperate to solve problems quickly, while a distributed system contains independent computers that appear as a single system. It notes that parallel computers are implicitly distributed systems. It then discusses reasons for using parallel and distributed computing like Moore's law and limitations of sequential processing due to power and latency walls. Finally, it outlines some topics that will be covered in the course like different parallel computing platforms, programming paradigms, and challenges in parallel and distributed systems.
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence SpracklenNumenta
Numenta's Director of ML Architecture Lawrence Spracklen presented a talk at the SBMT Annual Congress on July 10th, 2021. He talked about how neuroscience principles can inspire better machine learning algorithms.
This document provides an overview of computer hardware and software topics, covering:
1. The background and generations of computers from the abacus to modern devices.
2. The types of computers based on size from supercomputers to smartphones.
3. Key components inside a computer like the CPU, memory, ports, and input/output devices.
4. How computers are connected in networks and different network topologies.
5. The different types of software including system software that manages hardware and application software for specific tasks.
Similar to Machine Learning Challenges and Opportunities in Education, Industry, and Research (20)
"El álgebra lineal es una herramienta fundamental en muchos campos de la ciencia y la tecnología. Es particularmente importante en la física, la ingeniería, la informática y la estadística. La capacidad de manipular eficientemente grandes cantidades de datos y matrices complejas es esencial en estas áreas para la resolución de problemas y la toma de decisiones.
A priori, puede dar la sensación de que estamos muy lejos del uso del álgebra lineal en nuestro día a día. Sin embargo, algunas técnicas como la descomposición en valores singulares y la regresión lineal para entrenar modelos y hacer predicciones precisas están detrás de la inteligencia artificial y el aprendizaje automático. ¿Te suena ChatGPT? Puede no parecerlo, pero el álgebra lineal también está detrás en algunos de sus procesos. Por este motivo, debemos seguir trabajando en este campo, ya que su importancia seguirá creciendo a medida que se generen y analicen grandes cantidades de datos en el mundo actual.
"
La pandemia de COVID-19 ha supuesto una proliferación de mapas y contramapas. Por ello, organizaciones de la sociedad civil y movimientos sociales han generado sus propias interpretaciones y representaciones de los datos sobre la crisis. Estos también han contribuido a visibilizar aspectos, sujetos y temas que han sido desatendidos o infrarrepresentados en las visualizaciones hegemónicas y dominantes. En este contexto, la presente ponencia se centra en el análisis de los imaginarios sociales relacionados con la elaboración de mapas durante la pandemia. Es decir, trata de indagar en la importancia de los mapas para el activismo digital, las potencialidades que se extraen de esta tecnología y los valores asociados a las visualizaciones creadas con ellos. El objetivo último es reflexionar sobre la vía emergente del activismo de datos, así como sobre la intersección entre los imaginarios sociales y la geografía digital.
Designing RISC-V-based Accelerators for next generation Computers (DRAC) is a 3-year project (2019-2022) funded by the ERDF Operational Program of Catalonia 2014-2020. DRAC will design, verify, implement and fabricate a high performance general purpose processor that will incorporate different accelerators based on the RISC-V technology, with specific applications in the field of post-quantum security, genomics and autonomous navigation. In this talk, we will provide an overview of the main achievements in the DRAC project, including the fabrication of Lagarto, the first RISC-V processor developed in Spain.
This talk will begin introducing the uElectronics section of ESA at ESTEC and the general activities the group is responsible for. Then, it will go through some of the R+D on-going activities that the group is involved with, hand in hand with universities and/or companies. One of the major ones is related to the European rad-hard FPGAs that have been partially founded by ESA for several years and that will be playing a major role in the sector in the upcoming years. It´s also worth talking about the RTL soft IPs that are currently under development and that will allow us to keep on providing the European ecosystem with some key capabilities. The latter will be an overview of RISC-V space hardened on-going activities that might be replacing the current SPARC based processors available for our missions.
El objetivo de esta charla es presentar las últimas novedades incorporadas en la arquitectura ARM y describir las tendencias en la microarquitectura de los procesadores con arquitectura ARM. ARM es una empresa relativamente pequeña en comparación con otros gigantes del sector tecnológico. Sin embargo, la amplia implantación de su arquitectura, siendo ampliamente dominante en algunos sectores, y sus microarquitecturas, hacen que la tecnología ARM ocupe un lugar central en el desarrollo tecnológico del mundo actual. La tecnología ARM está presente prácticamente en todo el espectro tecnológico, desde los dispositivos más sencillos hasta el HPC y Cloud computing, pasando por los smartphones, automoción electrónica de consumo, etc
"Formal verification has been used by computer scientists for decades to prevent
software bugs. However, with a few exceptions, it has not been used by researchers
working in most areas of mathematics (geometry, algebra, analysis, etc.). In this
talk, we will discuss how this has changed in the past few years, and the possible
implications to the future of mathematical research, teaching and communication.
We will focus on the theorem prover Lean and its mathematical library
mathlib, since this is currently the system most widely used by mathematicians.
Lean is a functional programming language and interactive theorem prover based
on dependent type theory, with proof irrelevance and non-cumulative universes.
The mathlib library, open-source and designed as a basis for research level
mathematics, is one of the largest collections of formalized mathematics. It allows
classical reasoning, uses large- and small-scale automation, and is characterized
by its decentralized nature with over 200 contributors, including both computer
scientists and mathematicians."
"Part of the research community thinks that it is still early to tackle the development of quantum software engineering techniques. The reason is that how the quantum computers of the future will look like is still unknown. However, there are some facts that we can affirm today: 1) quantum and classical computers will coexist, each dedicated to the tasks at which they are most efficient. 2) quantum computers will be part of the cloud infrastructure and will be accessible through the Internet. 3) complex software systems will be made up of smaller pieces that will collaborate with each other. 4) some of those pieces will be quantum, therefore the systems of the future will be hybrid. 5) the coexistence and interaction between the components of said hybrid systems will be supported by service composition: quantum services.
This talk analyzes the challenges that the integration of quantum services poses to Service Oriented Computing."
Tras una breve introducción a la informática médica y unas pinceladas sobre conceptos prácticos de Inteligencia Artificial (posible definición consensuada, strong VS weak AI y técnicas y métodos comúnmente empleados), el bloque central de la charla muestra ejemplos prácticos (en forma de casos de éxito) de distintos desarrollos llevados a cabo por el grupo de Sistemas Informáticos de Nueva Generación (SING: http//sing-group.org/) en los ámbitos de (i) Informática clínica (InNoCBR, PolyDeep), (ii) Informática para investigación clínica (PathJam, WhichGenes), (iii) bioinformática traslacional (Genómica: ALTER, Proteómica: DPD, BI, BS, Mlibrary, Mass-Up, e integración de datos ÓMICOS: PunDrugs) y (iv) Informática en salud pública (CURMIS4th). Finalmente, se comenta brevemente la importancia que se espera tenga en un futuro inmediato la IA interpretable (XAI, Explainable Artificial Intelligence) y la participación humana (HITL. Human-In-The-Loop). La charla termina con una breve reflexión sobre las lecciones aprendidas por el ponente después de más de 16 años de desarrollo de sistemas inteligentes en el ámbito de la informática médica.
Many emerging applications require methods tailored towards high-speed data acquisition and filtering of streaming data followed by offline event reconstruction and analysis. In this case, the main objective is to relieve the immense pressure on the storage and communication resources within the experimental infrastructure. In other applications, ultra low latency real time analysis is required for autonomous experimental systems and anomaly detection in acquired scientific data in the absence of any prior data model for unknown events. At these data rates, traditional computing approaches cannot carry out even cursory analyses in a time frame necessary to guide experimentation. In this talk, Prof. Ogrenci will present some examples of AI hardware architectures. She will discuss the concept of co-design, which makes the unique needs of an application domain transparent to the hardware design process and present examples from three applications: (1) An in-pixel AI chip built using the HLS methodology; (2) A radiation hardened ASIC chip for quantum systems; (3) An FPGA-based edge computing controller for real-time control of a High Energy Physics experiment.
En esta conferencia se presentará una revisión del concepto de autonomía para robots móviles de campo y la identificación de desafíos para lograr un verdadero sistema autónomo, además de sugerir posibles direcciones de investigación. Los sistemas robóticos inteligentes, por lo general, obtienen conocimiento de sus funciones y del entorno de trabajo en etapa de diseño y desarrollo. Este enfoque no siempre es eficiente, especialmente en entornos semiestructurados y complejos como puede ser el campo de cultivo. Un sistema robótico verdaderamente autónomo debería desarrollar habilidades que le permitan tener éxito en tales entornos sin la necesidad de tener a-priori un conocimiento ontológico del área de trabajo y la definición de un conjunto de tareas o comportamientos predefinidos. Por lo que en esta conferencia se presentarán posibles estrategias basadas en Inteligencia Artificial que permitan perfeccionar las capacidades de navegación de robots móviles y que sean capaces de ofrecer un nivel de autonomía lo suficientemente elevado para poder ejecutar todas las tareas dentro de una misión casa-a-casa (home-to-home).
Quantum computing has become a noteworthy topic in academia and industry. The multinational companies in the world have been obtaining impressive advances in all areas of quantum technology during the last two decades. These companies try to construct real quantum computers in order to exploit their theoretical preferences over today’s classical computers in practical applications. However, they are challenging to build a full-scale quantum computer because of their increased susceptibility to errors due to decoherence and other quantum noise. Therefore, quantum error correction (QEC) and fault-tolerance protocol will be essential for running quantum algorithms on large-scale quantum computers.
The overall effect of noise is modeled in terms of a set of Pauli operators and the identity acting on the physical qubits (bit flip, phase flip and a combination of bit and phase flips). In addition to Pauli errors, there is another error named leakage errors that occur when a qubit leaves the defined computational subspace. As the location of leakage errors is unknown, these can damage even more the quantum computations. Thus, this talk will briefly provide quantum error models.
Los chatbots son un elemento clave en la transformación digital de nuestra sociedad. Están por todas partes: eCommerce, salud digital, asistencia a clientes, turismo,... Pero si habéis usado alguno, probablemente os habrá decepcionado. Lo confieso, la mayoría de los chatbots que existen son muy malos. Y es que no es nada fácil hacer un chatbot que sea realmente útil e inteligente. Un chatbot combina toda la complejidad de la ingeniería de software con la del procesamiento de lenguaje natural. Pensad que muchos chatbots hay que desplegarlos en varios canales (web, telegram, slack,...) y a menudo tienen que utilizar APIs y servicios externos, acceder a bases de datos internas o integrar modelos de lenguaje preentrenados (por ej. detectores de toxicidad), etc. Y el problema no es sólo crear el bot, si no también probarlo y evolucionarlo. En esta charla veremos los mayores desafíos a los que hay que enfrentarse cuando nos encargan un proyecto de desarrollo que incluye un chatbot y qué técnicas y estrategias podemos ir aplicando en función de las necesidades del proyecto, para conseguir, esta vez sí un chatbot que sepa de lo que habla.
Many HPC applications are massively parallel and can benefit from the spatial parallelism offered by reconfigurable logic. While modern memory technologies can offer high bandwidth, designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. Addressing these challenges requires to combine compiler optimizations, high-level synthesis, and hardware design.
In this talk, I will present challenges, solutions, and trends for generating massively parallel accelerators on FPGA for high-performance computing. These architectures can provide performance comparable to software implementations on high-end processors, and much higher energy efficiency thanks to logic customization.
The main challenge of concurrent software verification has always been in achieving modularity, i.e., the ability to divide and conquer the correctness proofs with the goal of scaling the verification effort. Types are a formal method well-known for its ability to modularize programs, and in the case of dependent types, the ability to modularize and scale complex mathematical proofs.
In this talk I will present our recent work towards reconciling dependent types with shared memory concurrency, with the goal of achieving modular proofs for the latter. Applying the type-theoretic paradigm to concurrency has lead us to view separation logic as a type theory of state, and has motivated novel abstractions for expressing concurrency proofs based on the algebraic structure of a resource and on structure-preserving functions (i.e., morphisms) between resources.
Microarchitectural attacks, such as Spectre and Meltdown, are a class of
security threats that affect almost all modern processors. These attacks exploit the side-effects resulting from processor optimizations to leak sensitive information and compromise a system’s security.
Over the years, a large number of hardware and software mechanisms for
preventing microarchitectural leaks have been proposed. Intuitively, more
defensive mechanisms are less efficient, while more permissive mechanisms may offer more performance but require more defensive programming. Unfortunately, there are no
hardware-software contracts that would turn this intuition into a basis for
principled co-design.
In this talk, we present a framework for specifying hardware/software security
contracts, an abstraction that captures a processor’s security guarantees in a
simple, mechanism-independent manner by specifying which program executions a
microarchitectural attacker can distinguish.
La aparición de vulnerabilidades por la falta de controles de seguridad es una de las causas por las que se demandan nuevos marcos de trabajo que produzcan software seguro de forma predeterminada. En la conferencia se abordará cómo transformar el proceso de desarrollo de software dando la importancia que merece la seguridad desde el inicio del ciclo de vida. Para ello se propone un nuevo modelo de desarrollo – modelo Viewnext-UEx – que incorpora prácticas de seguridad de forma preventiva y sistemática en todas las fases del proceso de ciclo de vida del software. El propósito de este nuevo modelo es anticipar la detección de vulnerabilidades aplicando la seguridad desde las fases más tempranas, a la vez que se optimizan los procesos de construcción del software. Se exponen los resultados de un escenario preventivo, tras la aplicación del modelo Viewnext-UEx, frente al escenario reactivo tradicional de aplicar la seguridad a partir de la fase de testing.
This document discusses trusting artificial intelligence systems. It begins with an overview of trust in social and computing contexts. It then discusses artificial intelligence, including machine learning, deep learning, and natural language processing. It details how AI systems can be attacked, including adversarial inputs, data poisoning, and model stealing. It raises important discussions around using AI in contexts like cybersecurity, medicine, transportation, and sentiment analysis, and the challenges of ensuring systems can be trusted.
El uso de energías renovables es clave para cumplir los objetivos de desarrollo sostenible de la Agenda 2030. Entre estas energías, la eólica es la segunda más utilizada debido a su alta eficiencia. Algunos estudios sugieren que la energía eólica será la principal fuente de generación en 2050. Por ello es conveniente seguir investigando en la aplicación de técnicas de control avanzadas en estos sistemas.
Entre estas técnicas avanzadas cabe destacar las redes neuronales y el aprendizaje por refuerzo combinadas con estrategias clásicas de control. Estas técnicas ya se han empleado con éxito en el modelado y el control de sistemas complejos.
Esta conferencia presentará la aplicación de redes neuronales y aprendizaje por refuerzo al control de aerogeneradores, centrándolo especialmente en el control de pitch. Se detallarán diferentes configuraciones con redes neuronales y otras técnicas aplicadas al control de pitch. Finalmente se propondrán algunas técnicas híbridas que combinen lógica difusa, tablas de búsqueda y redes neuronales, mostrando resultados que han permitido probar su utilidad para mejorar la eficiencia de las turbinas eólicas.
As the world's energy demand rises, so does the amount of renewable energy, particularly wind energy, in the supply. The life cycle of wind farms starting from manufacturing the components to decommission stage involve significant involvement of cost and the application of AI and data analytics are on reducing these costs are limited. With this conference talk, the audience expected to know some of the interesting applications of AI and data analytics on offshore wind. And, also highlight the future challenges and opportunities. This conference could be useful for students, academics and researcher who want to make next career in offshore wind but yet know where to start.
This document discusses the evolution of edge AI systems and architectures for the Internet of Things (IoT) era. It describes how IoT has transitioned from simple wireless sensor networks to complex systems that converge digitized enterprise data with edge AI sensors and deep learning analytics. Edge AI moves intelligence closer to IoT devices by enabling real-time data processing and filtering at the network edge. This reduces data transmission costs and latency. The document outlines several examples of edge AI applications in healthcare, smart homes, and industry that analyze sensor data in real-time to provide personalized and energy efficient services. It also discusses how new edge AI hardware platforms and open-source systems are enabling more customized and affordable IoT solutions.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Mechanical Engineering on AAI Summer Training Report-003.pdf
Machine Learning Challenges and Opportunities in Education, Industry, and Research
1. Machine Learning Challenges and
Opportunities
in Education, Industry, and Research
Nader Bagherzadeh
University of California, Irvine
EECS Dept.
2. AI, ML, and Brain-Like
• Artificial Intelligence (AI): The science and engineering of creating
intelligent machines. (John McCarthy, 1956)
• Machine Learning (ML): Field of study that gives computers the
ability to learn without being explicitly programmed. (Arthur
Samuel, 1959); requires large data sets
• Brain-Like: A machine that its operation and design are
strongly inspired by how the human brain functions.
• Neural Networks
• Deep Learning: many layers used for data processing
• Spiking
3. Major Technologies Impacted
by Machine Learning
• Data Centers
• Heterogenous
• Power
• Thermal
• Machine Learning
• Autonomous Vehicles
• Reliability
• Cost
• Safety
• Machine Learning
4. Global Market Impact of AI
• The global market for memory and processing semiconductors used in
artificial intelligence (AI) applications will soar to $128.9 billion in
2025, three times the $42.8 billion total in 2019, according to IHS.
• The AI hardware market will expand at a comparable rate, hitting
$68.5 billion by the mid-2020s, IHS said.
5. Intuition vs. Computation
•“A self-driving car powered by one of the more popular
artificial intelligence techniques may need to crash into
a tree 50,000 times in virtual simulations before
learning that it’s a bad idea. But baby wild goats
scrambling around on incredibly steep mountainsides do
not have the luxury of living and dying millions of
times before learning how to climb with sure footing
without falling to their deaths.”
• “Will the Future of AI Learning Depend More on Nature or Nurture?” IEEE Spectrum, October 2017
6. Data Quality Impacts ML Performance
•Garbage-in Garbage-out
•Data must be correct and properly labeled
•Data must be the “right” one; unbiased over the input
dynamic range
•Data is training a predictive model, and must meet
certain requirements
7. Fun Facts about Your Brain
• 1.3 Kg neural tissue that consumes 20% of your body metabolism
• A supercomputer running at 20 Ws, instead of 20 MW for exascale
• Computation and storage is done together locally
• Network of 100 Billion neurons and 100 Trillion synapses
(connections)
• Neurons accumulate charge like a capacitor (analog), but brain also uses
spikes for communication (digital); brain is mixed signal computing
• There is no centralized clock for processing synchronization
• Simulating the brain is very time consuming and energy inefficient
• Direct implementation in electronics is more plausible:
• 10 femtojules for brain; CMOS gate 0.5 femtojules; synaptic transmission = 20
transistors
• PIM is closer to the neuron; coexistence of data and computing
9. Deep Learning Limitation and Advantages
•Better than K-means, liner regression, and others,
because it does not require data scientists to identify
the features in the data they want to model.
•Related features are identified by the deep learning
model itself.
•Deep learning is excellent for language translation but
not good at the meaning of the translation.
10. Training and Inference
• Learning Step: Weights are produced by training, initially random,
using successive approximation that includes backpropagation with
gradient descent. Mostly floating-point operations. Time consuming.
• Inference Step: Recognition and classifications. More frequently
invoked step. Fixed point operation.
• Both steps are many dense matrix vector operations
11. ML Computation is Mostly Matrix Multiplications
• M by N matrix of weights multiplied by N by 1 vector of inputs
• Need an activation function after this matrix operation: Rectifier, Sigmoid,
and etc.
• Matrices are dense
12. Biological Neurons are not Multiplying and
Adding
• We can model them by performing multiply and add operations
• Efficient direct implementation is not impossible but still far away
• Computer architecture in terms of development of accelerators and
approximate computation are some of the current solutions
• We are far below in capability what neurons can do in terms of
connectivity and the number of active neurons
• VLSI scaling is a limiting factor
• Computing with an accelerators is making a come back
13. • Moore’s law:
• Doubling of the number of transistors
every about 18 months is slowing down.
• Dennard’s law:
• Transistors shrink =>
• L, W are reduced.
• Delay is reduced (T=RC), frequency
increased (1/f).
• I & V reduced since they are proportional to
L, W.
• Power consumption (P = C * V2* f) stays
the same is no longer valid.
VLSI Laws are not Scaling Anymore
14. Computer Architecture is Making a Come
Back
• Past success stories:
• Clock speed
• Instruction Level Parallelism (ILP); spatial and temporal; branch prediction
• Memory hierarchy; cache optimizations
• Multicores
• Reconfigurable computing
• Current efforts: TPU-1
• Accelerators; domain specific ASICs
• Exotic memories; STT-RAM, RRAM
• In memory processing
• Systolic resurrected; non-von Neuman memory access
• ML is not just algorithms; HW/SW innovations
15. Computation
Variety
• Graphics
• Vertex Processing; floating point;
• Pixel processing and rasterization;
integer
• ML
• Training; floating point
• Inference; integer
21. v A common machine learning algorithm or deep neural networks (DNNs) have two phases:
Weight updating
Training phase
• Training phase in neural networks is a one-time process based on two main functions: feedforward and
back-propagation. The network goes through all instances of the training set and iteratively update the
weights. At the end, this procedure yields a trained model.
22. • Inference phase uses the learned model to classify new data samples. In this phase
only feed-forward path is performed on input data.
Max pooling
convolution
Fully connected
layers
⋯
Bird
Dog
Butter
fly
cat
⋯
𝑃!
𝑃"
𝑃#
𝑃$
Classification
23. The accuracy of CNN models have been increased at the price of high computational cost, because in
these networks, there are a huge number of parameters and computational operations (MACs).
In AlexNet:
!"#$ %"&'(")*(,-.)
!"#$ %"&'(")* ,-. 01. %"&'(")*(,-.)
=
222,
222,034.2,
×100 = 91%
24. • CPUs have a few but complex cores. They are fast on
sequential processing.
• CPUs are good at fetching small amount of data quickly,
but they are not suitable for big chunk of data.
Deep learning involves lots of matrix multiplications and
convolutions, so it would take a long time to apply
sequential computational approach on them.
We need to utilize architectures with high data bandwidth
which can takes advantage of parallelism in DNNs.
Thus the trend is toward other three devices rather than
CPUs to accelerate the training and inferencing.
CPU
25. GPUs are designed for high parallel computations. They
contain hundreds of cores that can handle many
threads simultaneously. These threads execute in SIMD
manner.
They have high memory bandwidth, and they can fetch
a large amount of data.
They can fetch high dimensional matrices in
DNNs and perform the calculations in parallel.
Designed for low-latency
operation:
• Large caches
• Sophisticated control
• Powerful ALUs
Designed for high-
throughput:
• Small caches
• Simple control
• Energy efficient ALUs
• Latencies
compensated by
large number of
threads
26. v FPGA are more power-efficient than GPUs. GPUs’ computing resources are more complicated than FPGAs’ to
facilitate software programming. (programming a GPU is usually easier than developing a FPGA accelerator)
v According to flexibility of FPGAs, they can support various data type like binary or ternary data type.
v Datapath in GPUs is SIMD while in FPGA user can configure a specific data path.
Field Programmable Gate Arrays (FPGAs) are semiconductor devices
consist of configurable logic blocks connected via programmable
interconnects. Because of their high energy-efficiency, computing
capabilities and reconfigurability they are becoming the platform of
choice for DNNs accelerators.
GPU vs. FPGA :
27. • GPUs and FPGAs perform better than CPUs for DNNs’ applications, but more efficiency can still be gained via an
Application-Specific Integrated Circuit (ASIC).
• ASICs are the least flexible but the most high-performance options. They can be designed for either training or
inference.
• They are most efficient in terms of performance/dollar and performance/watt but require huge investment costs
that make them cost-effective only in very high-volume designs.
• The first generation of Google’s Tensor Processing Unit (TPU) is a machine learning device which focuses on 8-bit
integers for inference workloads. Tensor computations in TPU take advantage of systolic array.
Tensor Processing Unit (TPU) CPU, GPU and TPU performance on six reference workloads
28. The principal for systolic array:
• Idea: Data flows from the computer memory, passing through many processing elements before it
returns to memory.
29. 𝑀𝑎𝑡𝑟𝑖𝑥 𝐴
𝑎!! 𝑎!" 𝑎!#
𝑎"! 𝑎"" 𝑎"#
𝑎#! 𝑎#" 𝑎##
×
𝑏!! 𝑏!" 𝑏!#
𝑏"! 𝑏"" 𝑏"#
𝑏#! 𝑏#" 𝑏##
𝑀𝑎𝑡𝑟𝑖𝑥 𝐵
Assume we want to perform the matrix multiplication by a Systolic array:
38. SRAM on chip
High bandwidth memory on interposer
DDR4 modules
320pJ/B
256GB @ 64GB/s
20W
64pJ/B
16GB @ 900GB/s
60W
1pJ/B
1000×256kB @60TB/s
60W
10pJ/B
256MB @ 6TB/s
60W
Memory power density is ~25% of logic power
density
We need to feed these floating-point units from memory, and we have four choices for the memory architecture.
40. • If we unroll a loop in a convolutional layer, we can accelerate the execution time at the expense of resource
utilization (PEs).
For (int i=0; i<N; i+=2){
a[i]=b[i]+c[i];
a[i+1]=b[i+1]+c[i+1];
}
Unrolled factor =2
Latency= N/2 cycles (N/2
iterations)
With loop unrolling
Without loop unrolling
For (int i=0; i<N; i++){
a[i]=b[i]+c[i];
}
Latency= N cycles (N iterations)
• loop tiling is used to divide the input data into multiple blocks, which can be accommodated in the on-chip
buffers. It exploits the data locality which results in reducing DRAM accesses, latency and power consumption.
• Loop unrolling exploit parallelism
between loop iterations by
utilizing FPGA resources. (multiple
iteration can be executed
simultaneously)
41. Original matrix multiplication:
Input matrix A Input matrix B Output matrix C
…
…
…
⋱
⋮ ⋮ ⋮
…
…
…
⋱
⋮ ⋮ ⋮
…
…
…
⋱
⋮ ⋮ ⋮
Tiled Matrix Multiplication:
42. C = A . B
int i, j, k;
for (i = 0; i < N; ++i)
{
for (j = 0; j < N; ++j)
{
C[i][j] = 0;
for (k = 0; k < N; ++k)
C[i][j] += A[i][k] * B[k][j];
}
}
for (i = 0; i < N; i += 2)
{
for (j = 0; j < N; j += 2)
{
acc00 = acc01 = acc10 = acc11 = 0;
for (k = 0; k < N; k++)
{
acc00 += B[k][j + 0] * A[i + 0][k];
acc01 += B[k][j + 1] * A[i + 0][k];
acc10 += B[k][j + 0] * A[i + 1][k];
acc11 += B[k][j + 1] * A[i + 1][k];
}
C[i + 0][j + 0] = acc00;
C[i + 0][j + 1] = acc01;
C[i + 1][j + 0] = acc10;
C[i + 1][j + 1] = acc11;
}
}
With tiling
Without tiling
43. • How can we have energy efficient device?
• DNNs have lots of parameters and MAC operations. The parameters
have to be stored in external memory (DRAM).
• Each MAC operation requires three memory accesses to be performed.
• DRAM accesses require up to several orders of magnitude higher energy
consumption than MAC computation.
• Thus, If all accesses go to the DRAM, the latency and energy
consumption of the data transfer may exceed the computation itself.
energy cost of data movement in different level of
memory hierarchy.
44. 1.Convolutional reuse:
The same input feature map activations
and filter weights are used within a given
channel, just in different combinations
for different weighted sums.
v To reduce energy consumption of data movement, every time a piece of data is moved from
an expensive level to a lower cost memory level , the system should reuse the data as much as
possible.
v In convolutional neural network, we can consider three forms of data reuse:
Reuse: )
𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝐹𝑖𝑙𝑡𝑒𝑟 𝑤𝑒𝑖𝑔ℎ𝑡𝑠
45. 2.Feature map reuse
When multiple filters are applied to the
same feature map, the input feature map
activations are used multiple times across
filters.
3.Filter reuse
When the same filter weights are used multiple times across input
features maps.
Multiple input frames/images can be simultaneously processed.
Reuse: 𝐴𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Reuse: 𝐹𝑖𝑙𝑡𝑒𝑟 𝑤𝑒𝑖𝑔ℎ𝑡𝑠
46. • There are various related works in the literature that take advantage of different data reuse and dataflow approaches.
Weight Stationary dataflow (WS):
• The main idea is to minimize the energy consumption of
reading weights.
• The weights store in register file, input pixels and partial sum
move through network.
Output Stationary (OS):
• It keeps the partial sum locally in PE register file and access
input pixels and weights through global buffer.
• For every partial sum, we need two memory accesses (R/W).
No Local Reuse (NLR):
Instead of register file it uses a large global buffer, so it
does not keep data locally in RF and access them
through global buffer.
47. Row stationary data flow:
• Row stationary dataflow maximizes the reuse and accumulation at the RF level for all types of data for the
overall energy efficiency.
• It keeps a row of filter weights stationary inside the RF of a PE and then streams the input activations into the PE.
Since there are overlaps of input activations between different sliding windows the input activations can then
be kept in the RF and get reused.
49. v Compression methods try to reduce the number of weights or reduce the number of bits used for
each activation or weight. This technique lowers down the computation and storage requirements.
Quantization
Quantization or reduced precision approach allocates the smaller number of bits for
representing weight and activations.
Ø Uniformed quantization:
It uses a mapping function with uniform
distance between each quantization level.
Ø Nonuniformed quantization: The distribution of the weights and
activations are not uniform so nonuniform quantization, where the
spacing between the quantization levels vary, can improve accuracy in
comparison to uniformed quantization.
1. Log domain quantization :
quantization levels are
assigned based on logarithmic
distribution.
2. Learned quantization or
weight sharing
50. Ø Learned quantization or weight sharing: weight sharing forces several weights to share a single
value so it will reduce the number of unique weights that we need to store. Weights can be
grouped together using a k-means algorithm, and one value assigned to each group.
2.09
0.05
-0.91
1.87
-0.98
-0.14
1.92
0
1.48
-1.08
0
1.53
0.09
2.12
-1.03
1.49
0.00
2.00
1.50
-1.00
0:
1:
2:
3:
weights (32-bit float) cluster index (2-bit float)
0
3
1
3
3
0
1
1
1
2
0
2
0
1
3
2
clustering
Bitwidth of the group index = 𝑙𝑜𝑔" (𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑛𝑖𝑞𝑢𝑒 𝑤𝑒𝑖𝑔ℎ𝑡𝑠)
Note: the quantization can be fixed or variable.
51.
52. What is neural network pruning?
Procedure of pruning
Consider a network
and train it
Prune Connections
Train the remaining
weights
Synapses and neurons before and after pruning
v Pruning algorithms reduce the size of the neural networks by removing unnecessary weights and activations.
v It can make neural network more power and memory efficient, and faster at inference with minimum loss in accuracy.
It gives smaller model without losing accuracy
Repeat this
step iteratively
53. Comparing l1 and l2 regularization with and without retraining.
Network pruning can reduce parameters without drop in
performance.
55. • Positron Emission Tomography (PET) is an emerging imaging technology that can
reveal metabolic activities of a tissue or an organ. Unlike other imaging technologies
like CT and MRI that capture anatomical changes. PET scans detect biochemical and
physiological changes.
• PET has a wide range of clinical applications, such as cancer diagnosis, tumor detection
and early diagnosis of neuro diseases.
PET Images
3D PET Image from ADNI dataset, (a) standard dose (b) low dose
(a) (b)
56. PET Denoising
• The noise in PET images is caused by the low coincident photon counts detected
during a given scan time and various physical degradation factors.
• In order to acquire high quality PET image for diagnostic purpose, a standard dose
of radioactive tracer should be injected to the subject which will lead to higher risk
of radiation exposure damage. So, to address this problem, many DL algorithms
and networks were proposed to improve the image quality.
• Some of the denoising conventional methods like Gaussian filter smooths out
important image structures during the denoising process.
57. • Supervised: Machine learning methods utilize paired low-dose and standard-dose images to train
models that can predicts standard-dose images from low-dose inputs.
• Unsupervised: DNNs can learn intrinsic structures from corrupted images without pre-training.
No prior training pairs are needed, and random noise can be employed as the network input to generate
clean images.
• Deep Image Prior (DIP): This is an unsupervised learning approach, which has no
requirement for large data sets and high-quality label images. The original DIP approach learns using a
single pair of random-noise input and noisy image.
PET Denoising Methods
58. Denoising Autoencoder
§ One of the important denoising architecture is autoencoder. Autoencoders are Neural
Networks which are commonly used for feature selection and extraction. It has two
steps: encode for extracting most important image features and decoder for
constructing denoised image based on those features.
60. PET Classification
PET imaging together with Convolutional Neural Networks helps in the early detection and
automated classification of Alzheimer’s disease.
PET scans from two representative subjects:
a) normal subject, and
b) AD subject.
61. Academia Plays a Key Role to Address AI
• Revisit degree required courses, not just offering AI related electives
• Introduce specializations in AI for undergraduates. A sequence of
courses that cover all subject areas: relevant mathematics, hardware
techniques, tools and modeling environments, and capstone projects in
collaboration with local industry.
• At the research and graduate studies level establish centers focused on
specific topics of AI research:
• Medical Imaging
• Tools and modeling of low power high performance AI platforms
• AI broad domain project development ( social sciences, humanities, Arts, etc.)
62. Conclusions
• New applications related to Machine Learning are having a major
impact on the design of future computer systems
• Industry is heavily invested in ALL aspects of ML, prominently in
medical applications. MSF acquired Nuance ($20B)
• Universities have integrated ML into curriculum and continue do so;
including specializations in ML
• It is a very crowded field with many players: startups, major
corporations, government agencies, and academia.
• Government agencies are actively seeking ideas beyond Deep
Neural Networks, such as the DARPA Next AI initiative.
• QC is following ML but has far more challenges and as yet to make
it to the main stream, but that is potential next horizon for novel and
exotic computing that is totally different from classical computers
based on binary switches