The document summarizes key topics from a lecture on computational linguistics, including encoder-decoder networks, attention mechanisms, and transformers. It discusses how encoder-decoder networks can be used for machine translation by generating a target sequence conditioned on the encoded source sequence. It also explains how attention allows the decoder to attend to different parts of the encoded source at each time step. Finally, it provides a high-level overview of transformers, noting that they replace the encoder-decoder architecture with self-attention and parallelization.
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
The document discusses various techniques for implementing reflection in object-oriented programming languages like Smalltalk. It describes approaches for controlling message passing through minimal objects, anonymous classes, and method substitution. Minimal objects intercept messages using doesNotUnderstand:, while anonymous classes are inserted between instances and their classes to control lookup. Method substitution directly replaces methods to add control. The document analyzes the tradeoffs of different reflective approaches.
The document discusses the VLSI lab and its goals of designing and simulating CMOS inverter circuits using CAD tools. It describes the necessary hardware, software, and foundry resources needed. The design steps are outlined as schematic creation, layout design, DRC checks, parasitic extraction, and post-layout simulation. A list of experiments is provided focusing on logic gates, flip flops, multiplexers, and sequential circuits. The document also discusses the Microwind tool for circuit layout and simulation and provides tutorials on MOS devices and design rules for the layout process.
Neural machine translation by jointly learning to align and translate.pptxssuser2624f71
The document discusses machine translation techniques including rule-based machine translation (RBMT), statistical machine translation (SMT), and neural machine translation (NMT). It then focuses on neural network approaches, explaining recurrent neural networks (RNNs) and variants like long short-term memory (LSTM) and gated recurrent units (GRU). Finally, it presents a new methodology called RNNsearch that uses an attention mechanism to overcome limitations of fixed-length encodings in encoder-decoder NMT models, showing improved translation performance especially on longer sentences.
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
Lab seminar introduces Ting Chen's recent 3 works:
- Pix2seq: A Language Modeling Framework for Object Detection (ICLR’22)
- A Unified Sequence Interface for Vision Tasks (NeurIPS’22)
- A Generalist Framework for Panoptic Segmentation of Images and Videos (submitted to ICLR’23)
Transformer Mods for Document Length InputsSujit Pal
The Transformer architecture is responsible for many state of the art results in Natural Language Processing. A central feature behind its superior performance over Recurrent Neural Networks is its multi-headed self-attention mechanism. However, the superior performance comes at a cost, an O(n2) time and memory complexity, where n is the size of the input sequence. Because of this, it is computationally infeasible to feed large documents to the standard transformer. To overcome this limitation, a number of approaches have been proposed, which involve modifying the self-attention mechanism in interesting ways.
In this presentation, I will describe the transformer architecture, and specifically the self-attention mechanism, and then describe some of the approaches proposed to address the O(n2) complexity. Some of these approaches have also been implemented in the HuggingFace transformers library, and I will demonstrate some code for doing document level operations using one of these approaches.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of the Digital System Design and Labs course taught by Professor Ming Ouhyoung at National Taiwan University. The course covers digital logic design principles like Boolean algebra and finite state machines. Students learn to design combinational and sequential logic circuits using hardware description languages like VHDL. They also complete a digital design project implementing an integrated circuit using FPGAs or application-specific integrated circuits. The goals are for students to gain experience designing and implementing complex digital systems as engineers. On completing the course, students will be able to analyze, design, prototype, and communicate digital circuit designs.
The document describes the sequence-to-sequence (seq2seq) model with an encoder-decoder architecture. It explains that the seq2seq model uses two recurrent neural networks - an encoder RNN that processes the input sequence into a fixed-length context vector, and a decoder RNN that generates the output sequence from the context vector. It provides details on how the encoder, decoder, and training process work in the seq2seq model.
The document discusses various techniques for implementing reflection in object-oriented programming languages like Smalltalk. It describes approaches for controlling message passing through minimal objects, anonymous classes, and method substitution. Minimal objects intercept messages using doesNotUnderstand:, while anonymous classes are inserted between instances and their classes to control lookup. Method substitution directly replaces methods to add control. The document analyzes the tradeoffs of different reflective approaches.
The document discusses the VLSI lab and its goals of designing and simulating CMOS inverter circuits using CAD tools. It describes the necessary hardware, software, and foundry resources needed. The design steps are outlined as schematic creation, layout design, DRC checks, parasitic extraction, and post-layout simulation. A list of experiments is provided focusing on logic gates, flip flops, multiplexers, and sequential circuits. The document also discusses the Microwind tool for circuit layout and simulation and provides tutorials on MOS devices and design rules for the layout process.
Neural machine translation by jointly learning to align and translate.pptxssuser2624f71
The document discusses machine translation techniques including rule-based machine translation (RBMT), statistical machine translation (SMT), and neural machine translation (NMT). It then focuses on neural network approaches, explaining recurrent neural networks (RNNs) and variants like long short-term memory (LSTM) and gated recurrent units (GRU). Finally, it presents a new methodology called RNNsearch that uses an attention mechanism to overcome limitations of fixed-length encodings in encoder-decoder NMT models, showing improved translation performance especially on longer sentences.
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
Lab seminar introduces Ting Chen's recent 3 works:
- Pix2seq: A Language Modeling Framework for Object Detection (ICLR’22)
- A Unified Sequence Interface for Vision Tasks (NeurIPS’22)
- A Generalist Framework for Panoptic Segmentation of Images and Videos (submitted to ICLR’23)
Transformer Mods for Document Length InputsSujit Pal
The Transformer architecture is responsible for many state of the art results in Natural Language Processing. A central feature behind its superior performance over Recurrent Neural Networks is its multi-headed self-attention mechanism. However, the superior performance comes at a cost, an O(n2) time and memory complexity, where n is the size of the input sequence. Because of this, it is computationally infeasible to feed large documents to the standard transformer. To overcome this limitation, a number of approaches have been proposed, which involve modifying the self-attention mechanism in interesting ways.
In this presentation, I will describe the transformer architecture, and specifically the self-attention mechanism, and then describe some of the approaches proposed to address the O(n2) complexity. Some of these approaches have also been implemented in the HuggingFace transformers library, and I will demonstrate some code for doing document level operations using one of these approaches.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of the Digital System Design and Labs course taught by Professor Ming Ouhyoung at National Taiwan University. The course covers digital logic design principles like Boolean algebra and finite state machines. Students learn to design combinational and sequential logic circuits using hardware description languages like VHDL. They also complete a digital design project implementing an integrated circuit using FPGAs or application-specific integrated circuits. The goals are for students to gain experience designing and implementing complex digital systems as engineers. On completing the course, students will be able to analyze, design, prototype, and communicate digital circuit designs.
Dhi uk 2015 - water resources - beyond hydrodynamics - securedStephen Flood
2015 DHI UK & Ireland Symposium
Training seminars
Wednesday 22 April 2015
Beyond Hydrodynamics
Content:
CH2M Flood Modeller / ISIS to MIKE conversion
MIKE11 to MIKE FLOOD (MIKE 11 + MIKE21 FM)
MIKE FLOOD to MIKE 21 FM
A User perspective
Tools
Sediment Transport and Advection Dispersion
This document summarizes the second training session for ASUFE Juniors. It discusses revising functions, analyzing time complexity using Big O notation, different problem types like brute force and divide-and-conquer, techniques for reading problem statements, and differences between stack and heap memory. It also provides examples of time complexity calculations and overviews competition rules and problem difficulties on Codeforces.
The following resources come from the 2009/10 BEng in Digital Systems and Computer Engineering (course number 2ELE0065) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The objectives of this module are to demonstrate, within an embedded development environment:
Processor – to – processor communication
Multiple processors to perform one computation task using parallel processing
This project requires the establishment of a communication protocol between two 68000-based microcomputer systems. Using ‘C’, students will write software to control all aspects of complex data transfer system, demonstrating knowledge of handshaking, transmission protocols, transmission overhead, bandwidth, memory addressing. Students will then demonstrate and analyse parallel processing of a mathematical problem using two processors. This project requires two students working as a team.
The document discusses efficient codebook design for image compression using vector quantization. It introduces data compression techniques, including lossless compression methods like dictionary coders and entropy coding, as well as lossy compression methods like scalar and vector quantization. Vector quantization maps vectors to codewords in a codebook to compress data. The LBG algorithm is described for generating an optimal codebook by iteratively clustering vectors and updating codebook centroids.
This document introduces logic design and digital circuits. It discusses how logic design uses abstraction layers like gates, modules, and systems to solve problems. Key points covered include Boolean logic, basic gate designs, sequential vs combinational circuits, design methodologies like top-down and bottom-up approaches, and constraints in digital design like size, cost and power. Logic design is presented as an engineering process that uses abstraction and optimization to implement solutions meeting given constraints.
The slides for the techniques to use Convolutional Neural Networks (CNN) for the sequence modeling tasks, including image captioning and natural machine translation (NMT). The slides contain the main building blocks from different papers. Used in group paper reading in University of Sydney.
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
This presentation shows that code coverage guided fuzzing is possible in the context of network daemon fuzzing.
Some fuzzers are blackbox while others are protocol aware. Even ones which are made protocol aware, fuzzer writers typically model the protocol specification and implement packet awareness logic in the fuzzer. Unfortunately, just because the fuzzer is protocol aware, it does not guarantee that sufficient code paths have been reached.
The presentation deals with specific scenarios where the target protocol is completely unknown (proprietary) and no source code or protocol specs are accessible. The tool developed builds a feedback loop between the client and the server components using the concept of "gate functions". A gate function triggers monitoring. The pintool component tracks the binary code coverage for all the functions untill it reaches an exit gate. By instrumenting such gated functions, the tool is able to measure code coverage during packet processing.
H.264, also known as MPEG-4 Part 10 or AVC, is a video compression standard that provides significantly better compression than previous standards such as MPEG-2. It achieves this through spatial and temporal redundancy reduction techniques including intra-frame prediction, inter-frame prediction, and entropy coding. Motion estimation, which finds motion vectors between frames to enable inter-frame prediction, is the most computationally intensive part of H.264 encoding. Previous GPU implementations of H.264 motion estimation have sacrificed quality for parallelism or have not fully addressed dependencies between blocks. This document proposes a pyramid motion estimation approach on GPU that can better address dependencies while maintaining quality.
The document describes an LDPC (Low Density Parity Check) codes project done by a group of students. Key points:
- The group generated a sparse parity check matrix H for LDPC encoding that avoids cycles of length 4.
- They implemented LDPC encoding in MATLAB and Verilog, calculating parity bits from the input message bits using the formula p = (B^-1) * (Au^T).
- The Verilog implementation was tested on a Nexys-2 FPGA board, with input bits entered via switches and parity bits output to LEDs.
- The project was completed over 8 weeks. While it demonstrated LDPC encoding, the group noted the encoder has high delay and
FastBFT is a scalable Byzantine fault tolerant consensus protocol that uses hardware-assisted secret sharing to achieve high performance. It uses a trusted execution environment to implement a lightweight secret sharing scheme and assign unique sequence numbers to requests. Replicas are organized in a tree topology to distribute communication and computation costs, allowing the protocol to reach consensus in a constant number of message rounds regardless of the number of replicas. The protocol takes an optimistic approach where a subset of replicas participate in agreement while others passively update their state.
This document provides an overview of VLSI physical design automation. It begins with introducing the intended audience for VLSI CAD, which includes VLSI students, circuit designers, process engineers, and those interested in solving hard computational problems. The objectives of VLSI layout design are then outlined, which are to review fabrication materials and processes, understand the basic algorithm concepts used in layout design, and learn about state-of-the-art academic and commercial physical design automation techniques. The document then describes the basic steps in the physical design cycle, including partitioning, floorplanning, placement, routing, and compaction. Circuit partitioning is discussed in more detail, including definitions, formulations, representation, iterative algorithms like Kernighan-Lin, and other
The document discusses different types of attention mechanisms used in neural machine translation and image captioning models. It describes global attention which considers all encoder hidden states when deriving context vectors, and local attention which selectively focuses on a small window of context. Hard attention selects a single location to focus on, while soft attention takes a weighted average over locations. The document also discusses input feeding which makes the model aware of previous alignment choices.
BDVe Webinar Series - Toreador Intro - Designing Big Data pipelines (Paolo Ce...Big Data Value Association
In the Internet of Everything, huge volumes of multimedia data are generated at very high rates by heterogeneous sources in various formats, such as sensors readings, process logs, structured data from RDBMS, etc. The need of the hour is setting up efficient data pipelines that can compute advanced analytics models on data and use results to customize services, predict future needs or detect anomalies. This Webinar explores the TOREADOR conversational, service-based approach to the easy design of efficient and reusable analytics pipelines to be automatically deployed on a variety of cloud-based execution platforms.
This document outlines homework assignments from chapters 1, 3, and 4 that are due in two weeks. The homework includes reviewing definitions and questions, solving cryptography problems, and programming problems involving the Caesar cipher, affine cipher, and DES algorithms. Students are instructed to submit handwritten work in class and programming assignments online by uploading source code and instructions.
This document provides a tutorial on pointers in C. It begins by explaining that a variable in C refers to a block of memory allocated by the compiler to hold the value of that variable. The address of that memory location is the variable's lvalue. A pointer is a variable that holds the lvalue (memory address) of another variable. The tutorial provides examples of basic variable usage and initialization to demonstrate lvalues and rvalues before introducing pointers.
This document provides a tutorial on pointers and arrays in C. It begins with an introduction to variables in C and how they relate to memory addresses. It then defines pointers as variables that store memory addresses and explains pointer operators like dereferencing (*) and address of (&). The document demonstrates these concepts through examples using integer variables and pointers. It also discusses how pointer arithmetic allows incrementing and decrementing pointers based on the size of the type they point to.
This document provides a tutorial on pointers and arrays in C. It begins with an introduction to pointers, explaining that a pointer variable stores a memory address. It describes how the address-of operator (&) and dereference operator (*) work. It then discusses how declaring a pointer variable specifies the type of data being pointed to. This determines the size of data copied by the dereference operator. The document introduces pointer arithmetic, where incrementing a pointer advances it by the size of the pointed-to type. It explains how this allows pointers to traverse arrays.
This document provides an introduction to C++ programming including problem solving skills, software evolution, procedural and object oriented programming concepts, basic C++ programs, operators, header files, conditional statements, loops, functions, pointers, structures and arrays. It discusses topics such as analyzing problems, planning algorithms, coding solutions, evaluating results, procedural and object oriented paradigms, inheritance, polymorphism, flowcharts, basic syntax examples, and more. Various examples are provided to illustrate key concepts in C++.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Dhi uk 2015 - water resources - beyond hydrodynamics - securedStephen Flood
2015 DHI UK & Ireland Symposium
Training seminars
Wednesday 22 April 2015
Beyond Hydrodynamics
Content:
CH2M Flood Modeller / ISIS to MIKE conversion
MIKE11 to MIKE FLOOD (MIKE 11 + MIKE21 FM)
MIKE FLOOD to MIKE 21 FM
A User perspective
Tools
Sediment Transport and Advection Dispersion
This document summarizes the second training session for ASUFE Juniors. It discusses revising functions, analyzing time complexity using Big O notation, different problem types like brute force and divide-and-conquer, techniques for reading problem statements, and differences between stack and heap memory. It also provides examples of time complexity calculations and overviews competition rules and problem difficulties on Codeforces.
The following resources come from the 2009/10 BEng in Digital Systems and Computer Engineering (course number 2ELE0065) from the University of Hertfordshire. All the mini projects are designed as level two modules of the undergraduate programmes.
The objectives of this module are to demonstrate, within an embedded development environment:
Processor – to – processor communication
Multiple processors to perform one computation task using parallel processing
This project requires the establishment of a communication protocol between two 68000-based microcomputer systems. Using ‘C’, students will write software to control all aspects of complex data transfer system, demonstrating knowledge of handshaking, transmission protocols, transmission overhead, bandwidth, memory addressing. Students will then demonstrate and analyse parallel processing of a mathematical problem using two processors. This project requires two students working as a team.
The document discusses efficient codebook design for image compression using vector quantization. It introduces data compression techniques, including lossless compression methods like dictionary coders and entropy coding, as well as lossy compression methods like scalar and vector quantization. Vector quantization maps vectors to codewords in a codebook to compress data. The LBG algorithm is described for generating an optimal codebook by iteratively clustering vectors and updating codebook centroids.
This document introduces logic design and digital circuits. It discusses how logic design uses abstraction layers like gates, modules, and systems to solve problems. Key points covered include Boolean logic, basic gate designs, sequential vs combinational circuits, design methodologies like top-down and bottom-up approaches, and constraints in digital design like size, cost and power. Logic design is presented as an engineering process that uses abstraction and optimization to implement solutions meeting given constraints.
The slides for the techniques to use Convolutional Neural Networks (CNN) for the sequence modeling tasks, including image captioning and natural machine translation (NMT). The slides contain the main building blocks from different papers. Used in group paper reading in University of Sydney.
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
This presentation shows that code coverage guided fuzzing is possible in the context of network daemon fuzzing.
Some fuzzers are blackbox while others are protocol aware. Even ones which are made protocol aware, fuzzer writers typically model the protocol specification and implement packet awareness logic in the fuzzer. Unfortunately, just because the fuzzer is protocol aware, it does not guarantee that sufficient code paths have been reached.
The presentation deals with specific scenarios where the target protocol is completely unknown (proprietary) and no source code or protocol specs are accessible. The tool developed builds a feedback loop between the client and the server components using the concept of "gate functions". A gate function triggers monitoring. The pintool component tracks the binary code coverage for all the functions untill it reaches an exit gate. By instrumenting such gated functions, the tool is able to measure code coverage during packet processing.
H.264, also known as MPEG-4 Part 10 or AVC, is a video compression standard that provides significantly better compression than previous standards such as MPEG-2. It achieves this through spatial and temporal redundancy reduction techniques including intra-frame prediction, inter-frame prediction, and entropy coding. Motion estimation, which finds motion vectors between frames to enable inter-frame prediction, is the most computationally intensive part of H.264 encoding. Previous GPU implementations of H.264 motion estimation have sacrificed quality for parallelism or have not fully addressed dependencies between blocks. This document proposes a pyramid motion estimation approach on GPU that can better address dependencies while maintaining quality.
The document describes an LDPC (Low Density Parity Check) codes project done by a group of students. Key points:
- The group generated a sparse parity check matrix H for LDPC encoding that avoids cycles of length 4.
- They implemented LDPC encoding in MATLAB and Verilog, calculating parity bits from the input message bits using the formula p = (B^-1) * (Au^T).
- The Verilog implementation was tested on a Nexys-2 FPGA board, with input bits entered via switches and parity bits output to LEDs.
- The project was completed over 8 weeks. While it demonstrated LDPC encoding, the group noted the encoder has high delay and
FastBFT is a scalable Byzantine fault tolerant consensus protocol that uses hardware-assisted secret sharing to achieve high performance. It uses a trusted execution environment to implement a lightweight secret sharing scheme and assign unique sequence numbers to requests. Replicas are organized in a tree topology to distribute communication and computation costs, allowing the protocol to reach consensus in a constant number of message rounds regardless of the number of replicas. The protocol takes an optimistic approach where a subset of replicas participate in agreement while others passively update their state.
This document provides an overview of VLSI physical design automation. It begins with introducing the intended audience for VLSI CAD, which includes VLSI students, circuit designers, process engineers, and those interested in solving hard computational problems. The objectives of VLSI layout design are then outlined, which are to review fabrication materials and processes, understand the basic algorithm concepts used in layout design, and learn about state-of-the-art academic and commercial physical design automation techniques. The document then describes the basic steps in the physical design cycle, including partitioning, floorplanning, placement, routing, and compaction. Circuit partitioning is discussed in more detail, including definitions, formulations, representation, iterative algorithms like Kernighan-Lin, and other
The document discusses different types of attention mechanisms used in neural machine translation and image captioning models. It describes global attention which considers all encoder hidden states when deriving context vectors, and local attention which selectively focuses on a small window of context. Hard attention selects a single location to focus on, while soft attention takes a weighted average over locations. The document also discusses input feeding which makes the model aware of previous alignment choices.
BDVe Webinar Series - Toreador Intro - Designing Big Data pipelines (Paolo Ce...Big Data Value Association
In the Internet of Everything, huge volumes of multimedia data are generated at very high rates by heterogeneous sources in various formats, such as sensors readings, process logs, structured data from RDBMS, etc. The need of the hour is setting up efficient data pipelines that can compute advanced analytics models on data and use results to customize services, predict future needs or detect anomalies. This Webinar explores the TOREADOR conversational, service-based approach to the easy design of efficient and reusable analytics pipelines to be automatically deployed on a variety of cloud-based execution platforms.
This document outlines homework assignments from chapters 1, 3, and 4 that are due in two weeks. The homework includes reviewing definitions and questions, solving cryptography problems, and programming problems involving the Caesar cipher, affine cipher, and DES algorithms. Students are instructed to submit handwritten work in class and programming assignments online by uploading source code and instructions.
This document provides a tutorial on pointers in C. It begins by explaining that a variable in C refers to a block of memory allocated by the compiler to hold the value of that variable. The address of that memory location is the variable's lvalue. A pointer is a variable that holds the lvalue (memory address) of another variable. The tutorial provides examples of basic variable usage and initialization to demonstrate lvalues and rvalues before introducing pointers.
This document provides a tutorial on pointers and arrays in C. It begins with an introduction to variables in C and how they relate to memory addresses. It then defines pointers as variables that store memory addresses and explains pointer operators like dereferencing (*) and address of (&). The document demonstrates these concepts through examples using integer variables and pointers. It also discusses how pointer arithmetic allows incrementing and decrementing pointers based on the size of the type they point to.
This document provides a tutorial on pointers and arrays in C. It begins with an introduction to pointers, explaining that a pointer variable stores a memory address. It describes how the address-of operator (&) and dereference operator (*) work. It then discusses how declaring a pointer variable specifies the type of data being pointed to. This determines the size of data copied by the dereference operator. The document introduces pointer arithmetic, where incrementing a pointer advances it by the size of the pointed-to type. It explains how this allows pointers to traverse arrays.
This document provides an introduction to C++ programming including problem solving skills, software evolution, procedural and object oriented programming concepts, basic C++ programs, operators, header files, conditional statements, loops, functions, pointers, structures and arrays. It discusses topics such as analyzing problems, planning algorithms, coding solutions, evaluating results, procedural and object oriented paradigms, inheritance, polymorphism, flowcharts, basic syntax examples, and more. Various examples are provided to illustrate key concepts in C++.
Similar to 15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx (20)
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
3. Encoder-Decoder
• RNN: input sequence is transformed into
output sequence in a one-to-one fashion.
• Goal: Develop an architecture capable of generating
contextually appropriate, arbitrary length, output sequences
• Applications:
• Machine translation,
• Summarization,
• Question answering,
• Dialogue modeling.
4. Simple recurrent neural network illustrated as
a feed-forward network
Most significant change: new set of weights, U
• connect the hidden layer from the previous time step to the current hidden layer.
• determine how the network should make use of past context in calculating the output
for the current input.
ℎ𝑡 = g(Uℎ𝑡−1+W𝑥𝑡)
𝑦𝑡 = f(Vℎ𝑡)
𝑦𝑡 = softmax(Vℎ𝑡)
7. Sentence Completion using an RNN
• Trained Neural Language Model can be used to generate novel sequences
• Or to complete a given sequence (until end of sentence token <s> is generated)
ℎ𝑡 = g(ℎ𝑡−1+W𝑥𝑡)
𝑦𝑡 = softmax(Vℎ𝑡)
8. Extending (autoregressive) generation to Machine
Translation
• Build an RNN language model on the concatenation of source
and target
• Training data are parallel text e.g., English / French
there lived a hobbit vivait un hobbit
……..
there lived a hobbit <s> vivait un hobbit <s>
……..
word generated at each time step is
conditioned on word from previous step.
10. (simple) Encoder Decoder Networks
• Encoder generates a contextualized representation of the input (last state).
• Decoder takes that state and autoregressively generates a sequence of outputs
Limiting design
choices
• E and D assumed to
have the same
internal structure
(here RNNs)
• Final state of the E is
the only context
available to D
• this context is only
available to D as its
initial hidden state.
11. h1
h1
h2
h2
hn
hm
General Encoder Decoder Networks
Abstracting away from these choices
1. Encoder: accepts an input sequence, x1:n and
generates a corresponding sequence of
contextualized representations, h1:n
2. Context vector c: function of h1:n and
conveys the essence of the input to the
decoder.
3. Decoder: accepts c as input and generates
an arbitrary length sequence of hidden
states h1:m from which a corresponding
sequence of output states y1:m can be
obtained.
12. Popular architectural choices: Encoder
Widely used encoder
design: stacked Bi-LSTMs
• Contextualized
representations for each
time step: hidden states
from top layers from the
forward and backward
passes
13. Decoder Basic Design
Last hidden
state of the
encoder
First hidden state
of the decoder
z1 z2
• produce an output sequence
an element at a time
(Vℎ𝑑
𝑡)
15. z1 z2
Decoder: How output y is chosen
• Sample soft-max distribution (OK for generating
novel output, not OK for e.g. MT or Summ)
• Most likely output (doesn’t guarantee individual
choices being made make sense together)
For sequence labeling we used
Viterbi – here not possible
16. • 4 most likely “words” decoded from initial state
• Feed each of those in decoder and keep most likely 4
sequences of two words
• Feed most recent word in decoder and keep most likely 4
sequences of three words …….
• When EOS is generated. Stop sequence and reduce Beam by 1
18. Flexible context: Attention
Context vector c: function of
h1:n and conveys the essence of
the input to the decoder.
h1
h1
h2
h2
hn
hm
Flexible?
• Different for each hi
• Flexibly combining the hj
19. • Replace static context vector with dynamic ci
• derived from the encoder hidden states at
each point i during decoding
Attention (1): dynamically derived context
Ideas:
• should be a linear
combination of those
states
• should depend on ?
20. • Compute a vector of scores that capture
the relevance of each encoder hidden
state to the decoder state
Attention (2): computing ci
• Just the similarity
• Give network the ability to learn which aspects of
similarity between the decoder and encoder states are
important to the current application.
21. • Create vector of weights by normalizing
scores
Attention (3): computing ci
From scores to weights
• Goal achieved: compute a fixed-length context vector for the
current decoder state by taking a weighted average over all the
encoder hidden states.
26. Transformers (Attention is all you need 2017)
• Just an introduction: These are two valuable resources to
learn more details on the architecture and implementation
• Also Assignment 4 will help you learn more about
Transformers
• http://nlp.seas.harvard.edu/2018/04/03/attention.html
• https://jalammar.github.io/illustrated-transformer/ (slides come from
this source)
27. • Will only
look at the
ENCODER(s)
part in detail
High-level architecture
28. The encoders are all identical in structure
(yet they do not share weights). Each one
is broken down into two sub-layers
helps the encoder look at other
words in the input sentence as it
encodes a specific word.
outputs of the self-attention are fed to
a feed-forward neural network. The
exact same one is independently
applied to each position.
29. Key property of
Transformer: word in
each position flows
through its own path in
the encoder.
• There are
dependencies
between these paths
in the self-attention
layer.
• Feed-forward layer
does not have those
dependencies =>
various paths can be
executed in parallel !
Word embeddings
30. Visually clearer on
two words
Word embeddings
• dependencies in
self-attention layer.
• No dependencies in
Feed-forward layer
31. Self-Attention
Step1: create three vectors
from each of the encoder’s
input vectors:
Query, a Key, Value (typically
smaller dimension).
by multiplying the
embedding by three matrices
that we trained during the
training process.
While processing each word it allows to look at other positions in the
input sequence for clues to build a better encoding for this word.
32. Self-Attention
Step 2: calculate a score (like
we have seen for regular
attention!) how much focus to
place on other parts of the
input sentence as we encode a
word at a certain position.
Take dot product of the query
vector with the key vector of
the respective word we’re
scoring.
E.g., Processing the self-attention for word “Thinking” in position #1, the
first score would be the dot product of q1 and k1. The second score would
be the dot product of q1 and k2.
33. Self Attention
• Step 3 divide scores by
the square root of the
dimension of the key
vectors (more stable
gradients).
• Step 4 pass result
through a softmax
operation. (all positive
and add up to 1)
Intuition: softmax score determines how much each word will be
expressed at this position.
34. Self Attention
• Step6 : sum up the weighted
value vectors. This produces
the output of the self-
attention layer at this position
More details:
• What we have seen for a word is
done for all words (using matrices)
• Need to encode position of words
• And improved using a mechanism
called “multi-headed” attention
(kind of like multiple filters for CNN)
see
https://jalammar.github.io/illustrated
-transformer/
35. The Decoder Side
• Relies on most of the concepts on the encoder side
• See animation on https://jalammar.github.io/illustrated-transformer/
36. 11/20/2022 CPSC503 Winter 2020 36
Next class: Mon Nov. 9
• Project proposal (submit your write-up and copy of your slides on Canvas; Write-up: 1-
2 pages single project, 3-4 pages for group project)
• Project proposal Presentation
• Approx. 3.5 min presentation + 1.5 min for questions (8 tot. mins if you are in a group)
• For content, follow instructions at course project web page
• Please have your presentation ready on your laptop to minimize transition delays
• We will start in the usual zoom room @noon (sharp)
Assignment-4 out tonight
Due Nov 13
Read carefully!
Will send out two G-Forms
• On title and possibly group composition of the
project (fill out by Friday)
• On your preferences for which paper to present
38. 11/20/2022 CPSC503 Winter 2020 38
Pragmatics: Example
(i) A: So can you please come over here again right now
(ii) B: Well, I have to go to Edinburgh today sir
(iii) A: Hmm. How about this Thursday?
What information can we infer about the
context in which this (short and
insignificant) exchange occurred ?
we can make a great number of detailed (Pragmatic) inferences about the nature of the
context in which it occurred
39. 11/20/2022 CPSC503 Winter 2020 39
Pragmatics: Conversational Structure
(i) A: So can you please come over here again right now
(ii) B: Well, I have to go to Edinburgh today sir
(iii) A: Hmm. How about this Thursday?
Not the end of a conversation (nor the beginning)
Pragmatic knowledge: Strong expectations about
the structure of conversations
• Pairs e.g., request <-> response
• Closing/Opening forms
40. 11/20/2022 CPSC503 Winter 2020 40
Pragmatics: Dialog Acts
• A is requesting B to come at time of speaking,
• B implies he can’t (or would rather not)
• A repeats the request for some other time.
Pragmatic assumptions relying on:
• mutual knowledge (B knows that A knows that…)
• co-operation (must be a response… triggers inference)
• topical coherence (who should do what on Thur?)
(i) A: So can you please come over here again right now?
(ii) B: Well, I have to go to Edinburgh today sir
(iii) A: Hmm. How about this Thursday?
Not a Y/N info
seeking
question like
“can you run for
1h?”
It is a request
for an action
41. 11/20/2022 CPSC503 Winter 2020 41
Pragmatics: Specific Act (Request)
• A wants B to come over
• A believes it is possible for B to come over
• A believes B is not already there
• A believes he is not in a position to order B to…
Assumption: A behaving rationally and sincerely
(i) A: So can you please come over here again right now
(ii) B: Well, I have to go to Edinburgh today sir
(iii) A: Hmm. How about this Thursday?
Pragmatic knowledge: speaker beliefs and
intentions underlying the act of requesting
42. 11/20/2022 CPSC503 Winter 2020 42
Pragmatics: Deixis
• A assumes B knows where A is
• Neither A nor B are in Edinburgh
• The day in which the exchange is taking place is
not Thur., nor Wed. (or at least, so A believes)
Pragmatic knowledge: References to space and
time wrt space and time of speaking
(i) A: So can you please come over here again right now
(ii) B: Well, I have to go to Edinburgh today sir
(iii) A: Hmm. How about this Thursday?
44. From Yoav Artzi (these are links)
Contextualized
word
representations
Annotated
Transformer,
Illustrated
Transformer,
ELMo, BERT, The
Illustrated BERT,
ELMo, and co.
45. Transformers
• WIKIPEDIA:
• However, unlike RNNs, Transformers do not require that the sequence
be processed in order. So, if the data in question is natural language,
the Transformer does not need to process the beginning of a
sentence before it processes the end. Due to this feature, the
Transformer allows for much more parallelization than RNNs during
training.[1]
46. The Transformer uses multi-head attention in three different ways:
• In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and the memory keys and values come from the output of the encoder. This allows every
position in the decoder to attend over all positions in the input sequence. This mimics the
typical encoder-decoder attention mechanisms in sequence-to-sequence models such as
[38, 2, 9].
• The encoder contains self-attention layers. In a self-attention layer all of the keys, values
and queries come from the same place, in this case, the output of the previous layer in the
encoder. Each position in the encoder can attend to all positions in the previous layer of the
encoder.
• Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that position. We need to prevent leftward
information flow in the decoder to preserve the auto-regressive property. We implement this
inside of scaled dot-product attention by masking out (setting to infinity) all values in the input
of the softmax which correspond to illegal connections. See Figure 2.
At each step the model is auto-
regressive
[10], consuming the previously
generated symbols as additional
input when generating the next.
The autoregressive model
specifies that the output
variable depends linearly on its
own previous values and on
a stochastic term (an
imperfectly predictable term);
47. An attention function can be described as mapping a query
and a set of key-value pairs to an output, where the query,
keys, values, and output are all vectors. The output is
computed as a weighted sum of the values, where the
weight assigned to each value is computed by a
compatibility function of the query with the corresponding
key.
48. • Positional Encodings
In geometry, an affine transformation, affine map[1] or an affinity (from the
Latin, affinis, "connected with") is a function between affine spaces which
preserves points, straight lines and planes. Also, sets of parallel lines remain
parallel after an affine transformation. An affine transformation does not
necessarily preserve angles between lines or distances between points,
though it does preserve ratios of distances between points lying on a straight
line.
49. Additional resources !
References/Resources to explain transformers in cpsc503. And possible question for assignment
http://jalammar.github.io/illustrated-transformer/
Combined with this http://nlp.seas.harvard.edu/2018/04/03/attention.html
Medium article that I read a while back and thought that s a nice intro to the
transformer: https://medium.com/@adityathiruvengadam/transformer-architecture-attention-is-
all-you-need-aeccd9f50d09
They first start with motivating attention in general and show problems of RNN/CNN
architectures, then leading to the transformer. I especially liked some of the visualizations they
have. But it is a relatively long read and unfortunately, at some points, it's not super consistent.
I thought it might be still useful.