This document presents an introduction to Kneser-Ney smoothing for next word prediction using generalized language models. It discusses language models, generalized language models, and different smoothing techniques including backoff smoothing, interpolation smoothing, and Kneser-Ney smoothing. It also provides an overview of the progress made in applying these techniques for next word prediction, including building generalized language models, implementing Kneser-Ney and modified Kneser-Ney smoothing, and indexing the data with MySQL. The document is intended as an introduction for an seminar on applying smoothing methods to generalized language models for next word prediction tasks.
L03 ai - knowledge representation using logicManjula V
The document discusses knowledge representation using predicate logic. It begins by reviewing propositional logic and its semantics using truth tables. It then introduces predicate logic, which can represent properties and relations using predicates with arguments. It discusses representing knowledge in predicate logic using quantifiers, predicates, and variables. It also covers inferencing in predicate logic using techniques like forward chaining, backward chaining, and resolution. An example problem is presented to illustrate representing a problem and solving it using resolution refutation in predicate logic.
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersMohammed Bennamoun
This document provides an overview of single layer perceptrons (SLPs) and classification. It defines a perceptron as the simplest form of neural network consisting of adjustable weights and a bias. SLPs can perform binary classification of linearly separable patterns by adjusting weights during training. The document outlines limitations of SLPs, including their inability to represent non-linearly separable functions like XOR. It introduces Bayesian decision theory and how it can be used for optimal classification by comparing posterior probabilities given prior probabilities and likelihood functions. Decision boundaries are defined for dividing a feature space into non-overlapping regions to classify patterns.
This document discusses k-nearest neighbor (k-NN) machine learning algorithms. It explains that k-NN is an instance-based, lazy learning method that stores all training data and classifies new examples based on their similarity to stored examples. The key steps are: (1) calculate the distance between a new example and all stored examples, (2) find the k nearest neighbors, (3) assign the new example the most common class of its k nearest neighbors. Important considerations include the distance metric, value of k, and voting scheme for classification.
1. This document outlines an introduction to machine learning lecture by Dr. Varun Kumar. It discusses examples of machine learning, attributes in machine learning applications, and examples such as classification, regression, supervised vs unsupervised learning.
2. Machine learning can analyze large amounts of data from sciences, the world wide web, and adapt to changes without needing every situation predefined. It involves programming computers to optimize performance using example data.
3. Attributes in machine learning applications map the input to the output through mathematical functions. Examples given include factors that influence data transmission rates in wireless communication.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
The document discusses greedy algorithms and their application to optimization problems. It provides examples of problems that can be solved using greedy approaches, such as fractional knapsack and making change. However, it notes that some problems like 0-1 knapsack and shortest paths on multi-stage graphs cannot be solved optimally with greedy algorithms. The document also describes various greedy algorithms for minimum spanning trees, single-source shortest paths, and fractional knapsack problems.
L03 ai - knowledge representation using logicManjula V
The document discusses knowledge representation using predicate logic. It begins by reviewing propositional logic and its semantics using truth tables. It then introduces predicate logic, which can represent properties and relations using predicates with arguments. It discusses representing knowledge in predicate logic using quantifiers, predicates, and variables. It also covers inferencing in predicate logic using techniques like forward chaining, backward chaining, and resolution. An example problem is presented to illustrate representing a problem and solving it using resolution refutation in predicate logic.
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersMohammed Bennamoun
This document provides an overview of single layer perceptrons (SLPs) and classification. It defines a perceptron as the simplest form of neural network consisting of adjustable weights and a bias. SLPs can perform binary classification of linearly separable patterns by adjusting weights during training. The document outlines limitations of SLPs, including their inability to represent non-linearly separable functions like XOR. It introduces Bayesian decision theory and how it can be used for optimal classification by comparing posterior probabilities given prior probabilities and likelihood functions. Decision boundaries are defined for dividing a feature space into non-overlapping regions to classify patterns.
This document discusses k-nearest neighbor (k-NN) machine learning algorithms. It explains that k-NN is an instance-based, lazy learning method that stores all training data and classifies new examples based on their similarity to stored examples. The key steps are: (1) calculate the distance between a new example and all stored examples, (2) find the k nearest neighbors, (3) assign the new example the most common class of its k nearest neighbors. Important considerations include the distance metric, value of k, and voting scheme for classification.
1. This document outlines an introduction to machine learning lecture by Dr. Varun Kumar. It discusses examples of machine learning, attributes in machine learning applications, and examples such as classification, regression, supervised vs unsupervised learning.
2. Machine learning can analyze large amounts of data from sciences, the world wide web, and adapt to changes without needing every situation predefined. It involves programming computers to optimize performance using example data.
3. Attributes in machine learning applications map the input to the output through mathematical functions. Examples given include factors that influence data transmission rates in wireless communication.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
The document discusses greedy algorithms and their application to optimization problems. It provides examples of problems that can be solved using greedy approaches, such as fractional knapsack and making change. However, it notes that some problems like 0-1 knapsack and shortest paths on multi-stage graphs cannot be solved optimally with greedy algorithms. The document also describes various greedy algorithms for minimum spanning trees, single-source shortest paths, and fractional knapsack problems.
Setting the lower order bit plane to zero would have the effect of reducing the number of distinct gray levels by half. This would cause the histogram to become more peaked, with more pixels concentrated in fewer bins.
KNN algorithm is one of the simplest classification algorithm and it is one of the most used learning algorithms. KNN is a non-parametric, lazy learning algorithm. Its purpose is to use a database in which the data points are separated into several classes to predict the classification of a new sample point.
Course: Intro to Computer Science (Malmö Högskola):
knowledge representation and abstraction, decision making, generalization, data acquistion (abstraction), machine learning, similarity
another version of abstraction
Compression: Video Compression (MPEG and others)danishrafiq
This document provides an overview of video compression techniques used in standards like MPEG and H.261. It discusses how uncompressed video data requires huge storage and bandwidth that compression aims to address. It explains that lossy compression methods are needed to achieve sufficient compression ratios. The key techniques discussed are intra-frame coding using DCT and quantization similar to JPEG, and inter-frame coding using motion estimation and compensation to remove temporal redundancy between frames. Motion vectors are found using techniques like block matching and sum of absolute differences. MPEG and other standards use a combination of these intra and inter-frame coding techniques to efficiently compress video for storage and transmission.
The document discusses the Least-Mean Square (LMS) algorithm. It begins by introducing LMS as the first linear adaptive filtering algorithm developed by Widrow and Hoff in 1960. It then describes the filtering structure of LMS, modeling an unknown dynamic system using a linear neuron model and adjusting weights based on an error signal. Finally, it summarizes the LMS algorithm, outlines its virtues like computational simplicity and robustness, and notes its primary limitation is slow convergence for high-dimensional problems.
The document discusses the perceptron, which is a single processing unit of a neural network that was first proposed by Rosenblatt in 1958. A perceptron uses a step function to classify its input into one of two categories, returning +1 if the weighted sum of inputs is greater than or equal to 0 and -1 otherwise. It operates as a linear threshold unit and can be used for binary classification of linearly separable data, though it cannot model nonlinear functions like XOR. The document also outlines the single layer perceptron learning algorithm.
This document provides an overview of data compression techniques. It discusses lossless compression algorithms like Huffman encoding and LZW encoding which allow for exact reconstruction of the original data. It also discusses lossy compression techniques like JPEG and MPEG which allow for approximate reconstruction for images and video in order to achieve higher compression rates. JPEG divides images into 8x8 blocks and applies discrete cosine transform, quantization, and run length encoding. MPEG spatially compresses each video frame using JPEG and temporally compresses frames by removing redundant frames.
word sense disambiguation, wsd, thesaurus-based methods, dictionary-based methods, supervised methods, lesk algorithm, michael lesk, simplified lesk, corpus lesk, graph-based methods, word similarity, word relatedness, path-based similarity, information content, surprisal, resnik method, lin method, elesk, extended lesk, semcor, collocational features, bag-of-words features, the window, lexical semantics, computational semantics, semantic analysis in language technology.
This document provides an introduction to the theory of computation, including definitions of key concepts like automata theory, symbols, alphabets, strings, languages, and sets. It discusses how automata theory deals with formal models of computation and is used in areas like text processing and programming languages. Mathematical terminology is introduced, such as symbols, alphabets, strings, languages, sets, and the power and Cartesian product of alphabets. Examples are given to illustrate concepts like strings, languages, and valid versus invalid computations based on whether a string is contained within a language.
This document provides an introduction to the Theory of Computation course offered at Mutah University. The course will cover three main topics: automata, computability theory, and complexity theory. It will examine fundamental capabilities and limitations of computers. Required reading includes Sipser's textbook Introduction to the Theory of Computation. Key concepts to be discussed include formal models of computation, problems that cannot be solved by computers, and distinguishing between easy and hard computational problems.
Unit III Knowledge Representation in AI K.Sundar,AP/CSE,VECsundarKanagaraj1
1) The document discusses knowledge representation in artificial intelligence. It covers first-order predicate logic, propositional logic, syntax, semantics, inference rules, and resolution.
2) Propositional logic represents statements as propositional variables combined with logical connectives. First-order logic adds terms, predicates, and quantifiers to represent objects, relations, and inference over variables.
3) Inference rules like modus ponens, resolution, and instantiation allow logical implications to be derived from a knowledge base. Resolution is extended from propositional logic to handle variable unification in first-order logic.
The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph.
Representation learning in limited-data settingsGael Varoquaux
A 4-hour long didactic course on simple notions of representations and how to use them in limited-data settings:
- A supervised learning point of view, giving intuitions and math on what are representations are why they matter
- Building simple unsupervised learning models to extract representation: from matrix decomposition for signals to embeddings of entities
- Evaluating models in limited-data settings, often a bottleneck
This slide-deck was given as a course at the 2021 DeepLearn summer school.
Theory of automata and formal languageRabia Khalid
KleenE Star Closure, Plus operation, recursive definition of languages, INTEGER, EVEN, factorial, PALINDROME, languages of strings, cursive definition of RE, defining languages by RE,Examples
1) Bayesian decision theory provides a framework for making optimal classifications by quantifying the tradeoffs between classification decisions using probabilities and costs.
2) It assumes all relevant probability values are known, including prior probabilities of different states and conditional probabilities of observations given states.
3) The optimal Bayesian decision rule is to minimize the total expected cost or risk by choosing the classification with the lowest conditional risk given an observation.
Spatial domain filtering involves modifying an image by applying a filter or kernel to pixels within a neighborhood region. There are two main types of spatial filters - smoothing/low-pass filters which blur an image, and sharpening/high-pass filters which enhance edges and details. Smoothing filters replace each pixel value with the average of neighboring pixels, reducing noise. Sharpening filters use derivatives of Gaussian kernels to highlight areas of rapid intensity change, increasing contrast along edges. The effects of filtering depend on the size and shape of the kernel, with larger kernels producing more blurring or sharpening.
Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute.
These patterns are then utilized to predict the values of the target attribute in future data instances.
Unsupervised learning: The data have no target attribute.
We want to explore the data to find some intrinsic structures in them.
The document describes the Rough K-Means clustering algorithm. It takes a dataset as input and outputs lower and upper approximations of K clusters. It works as follows:
1. Objects are randomly assigned to initial clusters. Cluster centroids are then computed.
2. Objects are assigned to clusters based on the ratio of their distance to closest versus second closest centroid. Objects on the boundary may belong to multiple clusters.
3. Cluster centroids are recomputed based on the new cluster assignments. The process repeats until cluster centroids converge.
An example is provided to illustrate the algorithm on a sample dataset with 6 objects and 2 features.
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
Setting the lower order bit plane to zero would have the effect of reducing the number of distinct gray levels by half. This would cause the histogram to become more peaked, with more pixels concentrated in fewer bins.
KNN algorithm is one of the simplest classification algorithm and it is one of the most used learning algorithms. KNN is a non-parametric, lazy learning algorithm. Its purpose is to use a database in which the data points are separated into several classes to predict the classification of a new sample point.
Course: Intro to Computer Science (Malmö Högskola):
knowledge representation and abstraction, decision making, generalization, data acquistion (abstraction), machine learning, similarity
another version of abstraction
Compression: Video Compression (MPEG and others)danishrafiq
This document provides an overview of video compression techniques used in standards like MPEG and H.261. It discusses how uncompressed video data requires huge storage and bandwidth that compression aims to address. It explains that lossy compression methods are needed to achieve sufficient compression ratios. The key techniques discussed are intra-frame coding using DCT and quantization similar to JPEG, and inter-frame coding using motion estimation and compensation to remove temporal redundancy between frames. Motion vectors are found using techniques like block matching and sum of absolute differences. MPEG and other standards use a combination of these intra and inter-frame coding techniques to efficiently compress video for storage and transmission.
The document discusses the Least-Mean Square (LMS) algorithm. It begins by introducing LMS as the first linear adaptive filtering algorithm developed by Widrow and Hoff in 1960. It then describes the filtering structure of LMS, modeling an unknown dynamic system using a linear neuron model and adjusting weights based on an error signal. Finally, it summarizes the LMS algorithm, outlines its virtues like computational simplicity and robustness, and notes its primary limitation is slow convergence for high-dimensional problems.
The document discusses the perceptron, which is a single processing unit of a neural network that was first proposed by Rosenblatt in 1958. A perceptron uses a step function to classify its input into one of two categories, returning +1 if the weighted sum of inputs is greater than or equal to 0 and -1 otherwise. It operates as a linear threshold unit and can be used for binary classification of linearly separable data, though it cannot model nonlinear functions like XOR. The document also outlines the single layer perceptron learning algorithm.
This document provides an overview of data compression techniques. It discusses lossless compression algorithms like Huffman encoding and LZW encoding which allow for exact reconstruction of the original data. It also discusses lossy compression techniques like JPEG and MPEG which allow for approximate reconstruction for images and video in order to achieve higher compression rates. JPEG divides images into 8x8 blocks and applies discrete cosine transform, quantization, and run length encoding. MPEG spatially compresses each video frame using JPEG and temporally compresses frames by removing redundant frames.
word sense disambiguation, wsd, thesaurus-based methods, dictionary-based methods, supervised methods, lesk algorithm, michael lesk, simplified lesk, corpus lesk, graph-based methods, word similarity, word relatedness, path-based similarity, information content, surprisal, resnik method, lin method, elesk, extended lesk, semcor, collocational features, bag-of-words features, the window, lexical semantics, computational semantics, semantic analysis in language technology.
This document provides an introduction to the theory of computation, including definitions of key concepts like automata theory, symbols, alphabets, strings, languages, and sets. It discusses how automata theory deals with formal models of computation and is used in areas like text processing and programming languages. Mathematical terminology is introduced, such as symbols, alphabets, strings, languages, sets, and the power and Cartesian product of alphabets. Examples are given to illustrate concepts like strings, languages, and valid versus invalid computations based on whether a string is contained within a language.
This document provides an introduction to the Theory of Computation course offered at Mutah University. The course will cover three main topics: automata, computability theory, and complexity theory. It will examine fundamental capabilities and limitations of computers. Required reading includes Sipser's textbook Introduction to the Theory of Computation. Key concepts to be discussed include formal models of computation, problems that cannot be solved by computers, and distinguishing between easy and hard computational problems.
Unit III Knowledge Representation in AI K.Sundar,AP/CSE,VECsundarKanagaraj1
1) The document discusses knowledge representation in artificial intelligence. It covers first-order predicate logic, propositional logic, syntax, semantics, inference rules, and resolution.
2) Propositional logic represents statements as propositional variables combined with logical connectives. First-order logic adds terms, predicates, and quantifiers to represent objects, relations, and inference over variables.
3) Inference rules like modus ponens, resolution, and instantiation allow logical implications to be derived from a knowledge base. Resolution is extended from propositional logic to handle variable unification in first-order logic.
The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph.
Representation learning in limited-data settingsGael Varoquaux
A 4-hour long didactic course on simple notions of representations and how to use them in limited-data settings:
- A supervised learning point of view, giving intuitions and math on what are representations are why they matter
- Building simple unsupervised learning models to extract representation: from matrix decomposition for signals to embeddings of entities
- Evaluating models in limited-data settings, often a bottleneck
This slide-deck was given as a course at the 2021 DeepLearn summer school.
Theory of automata and formal languageRabia Khalid
KleenE Star Closure, Plus operation, recursive definition of languages, INTEGER, EVEN, factorial, PALINDROME, languages of strings, cursive definition of RE, defining languages by RE,Examples
1) Bayesian decision theory provides a framework for making optimal classifications by quantifying the tradeoffs between classification decisions using probabilities and costs.
2) It assumes all relevant probability values are known, including prior probabilities of different states and conditional probabilities of observations given states.
3) The optimal Bayesian decision rule is to minimize the total expected cost or risk by choosing the classification with the lowest conditional risk given an observation.
Spatial domain filtering involves modifying an image by applying a filter or kernel to pixels within a neighborhood region. There are two main types of spatial filters - smoothing/low-pass filters which blur an image, and sharpening/high-pass filters which enhance edges and details. Smoothing filters replace each pixel value with the average of neighboring pixels, reducing noise. Sharpening filters use derivatives of Gaussian kernels to highlight areas of rapid intensity change, increasing contrast along edges. The effects of filtering depend on the size and shape of the kernel, with larger kernels producing more blurring or sharpening.
Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute.
These patterns are then utilized to predict the values of the target attribute in future data instances.
Unsupervised learning: The data have no target attribute.
We want to explore the data to find some intrinsic structures in them.
The document describes the Rough K-Means clustering algorithm. It takes a dataset as input and outputs lower and upper approximations of K clusters. It works as follows:
1. Objects are randomly assigned to initial clusters. Cluster centroids are then computed.
2. Objects are assigned to clusters based on the ratio of their distance to closest versus second closest centroid. Objects on the boundary may belong to multiple clusters.
3. Cluster centroids are recomputed based on the new cluster assignments. The process repeats until cluster centroids converge.
An example is provided to illustrate the algorithm on a sample dataset with 6 objects and 2 features.
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
Unsupervised sentence-embeddings by manifold approximation and projectionDeep Kayal
The concept of unsupervised universal sentence encoders has gained traction recently, wherein pre-trained models generate effective task-agnostic fixed-dimensional representations for phrases, sentences and paragraphs. Such methods are of varying complexity, from simple weighted-averages of word vectors to complex language-models based on bidirectional transformers. In this work we propose a novel technique to generate sentence-embeddings in an unsupervised fashion by projecting the sentences onto a fixed-dimensional manifold with the objective of preserving local neighbourhoods in the original space. To delineate such neighbourhoods we experiment with several set-distance metrics, including the recently proposed Word Mover’s distance, while the fixed-dimensional projection is achieved by employing a scalable and efficient manifold approximation method rooted in topological data analysis. We test our approach, which we term EMAP or Embeddings by Manifold Approximation and Projection, on six publicly available text-classification datasets of varying size and complexity. Empirical results show that our method consistently performs similar to or better than several alternative state-of-the-art approaches.
This document provides an overview of NP-completeness and polynomial time reductions. It defines the classes P and NP, and explains that the core question is whether P=NP. NP-complete problems are the hardest problems in NP, and to prove a problem is NP-complete it must be shown to be in NP and there must be a polynomial time reduction from a known NP-complete problem like 3-SAT. Examples of NP-complete problems discussed include Clique, Independent Set, and Minesweeper. The document outlines the method for proving a problem is NP-complete using a reduction from 3-SAT.
This document discusses n-gram language models. It provides an introduction to language models and their role in applications like speech recognition. Simple n-gram models are described that estimate word probabilities based on prior context. Parameter estimation and smoothing techniques are covered to address data sparsity issues from rare word combinations. Evaluation of language models on held-out test data is also mentioned.
The slide covers a few state of the art models of word embedding and deep explanation on algorithms for approximation of softmax function in language models.
Similar to Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction (8)
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Session 1 - Intro to Robotic Process Automation.pdf
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction
1. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Introduction to Kneser-Ney
Smoothing on Top of Generalized Language
Models for Next Word Prediction
Martin Körner
Oberseminar
25.07.2013
4. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
4 of 30
WeST
Introduction: Motivation
Next word prediction: What is the next word a user will
type?
Use cases for next word prediction:
Augmentative and Alternative
Communication (AAC)
Small keyboards (Smartphones)
5. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
5 of 30
WeST
Introduction to next word prediction
How do we predict words?
1. Rationalist approach
• Manually encoding information about language
• “Toy” problems only
2. Empiricist approach
• Statistical, pattern recognition, and machine learning
methods applied on corpora
• Result: Language models
7. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
7 of 30
WeST
Language models in general
Language model: How likely is a sentence 𝑠?
Probability distribution: 𝑃 𝑠
Calculate 𝑃 𝑠 by multiplying conditional probabilities
Example:
𝑃 If you′
re going to San Francisco , be sure …
=
𝑃 you′
re | If ∗ 𝑃 going | If you′
re ∗
𝑃 to | If you′
re going ∗ 𝑃 San | If you′
re going to ∗
𝑃 Francisco | If you′
re going to San ∗ ⋯
Empirical approach would fail
8. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
8 of 30
WeST
Conditional probabilities simplified
Markov assumption [JM80]:
Only the last n-1 words are relevant for a prediction
Example with n=5:
𝑃 sure | If you′re going to San Francisco , be
≈ 𝑃 sure | San Francisco , be
Counts as a word
9. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
9 of 30
WeST
Definitions and Markov assumption
n-gram: Sequence of length n with a count
E.g.: 5-gram:
If you′re going to San 4
Sequence naming:
𝑤1
𝑖−1
≔ 𝑤1 𝑤2 … 𝑤𝑖−1
Markov assumption formalized:
𝑃 𝑤𝑖 𝑤1
𝑖−1
≈ 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
n-1 words
10. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
10 of 30
WeST
Formalizing next word prediction
Instead of 𝑃(𝑠):
Only one conditional probability 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
• Simplify 𝑃 𝑤𝑖 𝑤𝑖−𝑛+1
𝑖−1
to 𝑃 𝑤 𝑛 𝑤1
𝑛−1
NWP 𝑤1
𝑛−1
= arg max 𝑤 𝑛∈𝑊 𝑃 𝑤 𝑛 𝑤1
𝑛−1
How to calculate the probability 𝑃 𝑤 𝑛 𝑤1
𝑛−1
?
Set of all words in the corpus
n-1 words n-1 words
Conditional probability with Markov assumption
11. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
11 of 30
WeST
How to calculate 𝑃(𝑤 𝑛|𝑤1
𝑛−1
)
The easiest way:
Maximum likelihood:
𝑃ML 𝑤 𝑛 𝑤1
𝑛−1
=
𝑐(𝑤1
𝑛
)
𝑐(𝑤1
𝑛−1
)
Example:
𝑃 San | If you′
re going to =
𝑐 If you′re going to San
𝑐 If you′re going to
13. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
13 of 30
WeST
Intro Generalized Language Models (GLMs)
Main idea:
Insert wildcard words (∗) into sequences
Example:
Instead of 𝑃 San | If you′re going to :
• 𝑃 San | If ∗ ∗ ∗
• 𝑃 San | If ∗ ∗ to
• 𝑃 San | If ∗ going ∗
• 𝑃 San | If ∗ going to
• 𝑃 San | If you′re ∗ ∗
• …
Separate different types of GLMs based on:
1. Sequence length
2. Number of wildcard words
Aggregate results
Length: 5, Wildcard words: 2
14. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
14 of 30
WeST
Why Generalized Language Models?
Data sparsity of n-grams
“If you′re going to San” is seen less often than for example
“If ∗ ∗ to San”
Question: Does that really improve the prediction?
Result of evaluation: Yes
… but we should use smoothing for language models
16. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
16 of 30
WeST
Smoothing
Problem: Unseen sequences
Try to estimate probabilities of unseen sequences
Probabilities of seen sequences need to be reduced
Two approaches:
1. Backoff smoothing
2. Interpolation smoothing
17. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
17 of 30
WeST
Backoff smoothing
If sequence unseen: use shorter sequence
E.g.: if 𝑃 San | going to = 0 use 𝑃 San | to
𝑃𝑏𝑎𝑐𝑘 𝑤 𝑛 𝑤𝑖
𝑛−1
=
𝜏 𝑤 𝑛 𝑤𝑖
𝑛−1
𝑖𝑓 𝑐 𝑤𝑖
𝑛
> 0
𝛾 ∗ 𝑃𝑏𝑎𝑐𝑘 𝑤 𝑛 𝑤𝑖+1
𝑛−1
𝑖𝑓 𝑐 𝑤𝑖
𝑛
= 0
Weight Lower order
probability (recursive)
Higher order
probability
18. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
18 of 30
WeST
Interpolated Smoothing
Always use shorter sequence for calculation
𝑃𝑖𝑛𝑡𝑒𝑟 𝑤 𝑛 𝑤𝑖
𝑛−1
= 𝜏 𝑤 𝑛 𝑤𝑖
𝑛−1
+ 𝛾 ∗ 𝑃𝑖𝑛𝑡𝑒𝑟 𝑤 𝑛 𝑤𝑖+1
𝑛−1
Seems to work better than backoff smoothing
Higher order
probability
Weight Lower order
probability (recursive)
19. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
19 of 30
WeST
Kneser-Ney smoothing [KN95] intro
Interpolated smoothing
Idea: Improve lower order calculation
Example: Word visiting unseen in corpus
𝑃 Francisco | visiting = 0
Normal interpolation: 0 + γ ∗ 𝑃 Francisco
𝑃 San | visiting = 0
Normal interpolation: 0 + γ ∗ 𝑃 San
Result: Francisco is as likely as San at that position
Is that correct?
Difference between Francisco and San?
Answer: Number of different contexts
20. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
20 of 30
WeST
Kneser-Ney smoothing idea
For lower order calculation:
Don’t use 𝑐 𝑤 𝑛
Instead: Number of different bigrams the word completes:
𝑁1+ • 𝑤 𝑛 ≔ 𝑤 𝑛−1: 𝑐 𝑤 𝑛−1
𝑛
> 0
Or in general:
𝑁1+ • 𝑤𝑖+1
𝑛
= 𝑤𝑖: 𝑐 𝑤𝑖
𝑛
> 0
In addition:
𝑁1+ • 𝑤𝑖+1
𝑛−1
• = 𝑤 𝑛
𝑁1+ • 𝑤𝑖+1
𝑛
𝑁1+ 𝑤𝑖
𝑛−1
• = 𝑤 𝑛: 𝑐 𝑤𝑖
𝑛
> 0
Count
21. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
21 of 30
WeST
Kneser-Ney smoothing equation (highest)
Highest order calculation:
𝑃KN 𝑤 𝑛 𝑤𝑖
𝑛−1
=
max{𝑐 𝑤𝑖
𝑛
− 𝐷, 0}
𝑐 𝑤𝑖
𝑛−1
+
𝐷
𝑐 𝑤𝑖
𝑛−1
𝑁1+ 𝑤𝑖
𝑛−1
• 𝑃KN 𝑤 𝑛 𝑤𝑖+1
𝑛−1
count
Total counts
Assure positive value
Discount value
0 ≤ 𝐷 ≤ 1
Lower order probability
(recursion)
Lower order weight
22. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
22 of 30
WeST
Kneser-Ney smoothing equation
Lower order calculation:
𝑃KN 𝑤 𝑛 𝑤𝑖
𝑛−1
=
max{𝑁1+ • 𝑤𝑖
𝑛
− 𝐷, 0}
𝑁1+ • 𝑤𝑖
𝑛−1
•
+
𝐷
𝑁1+ • 𝑤𝑖
𝑛−1
•
𝑁1+ 𝑤𝑖
𝑛−1
• 𝑃KN 𝑤 𝑛 𝑤𝑖+1
𝑛−1
Lowest order calculation: 𝑃KN 𝑤 𝑛 =
𝑁1+ •𝑤𝑖
𝑛
𝑁1+ •𝑤𝑖
𝑛−1•
Continuation count
Total continuation counts
Assure positive value
Discount value
Lower order probability
(recursion)
Lower order weight
23. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
23 of 30
WeST
Modified Kneser-Ney smoothing [CG98]
Different discount values for different absolute counts
Lower order calculation:
𝑃KN 𝑤 𝑛 𝑤𝑖
𝑛−1
=
max{𝑁1+ • 𝑤𝑖
𝑛
− 𝐷(𝑐 𝑤𝑖
𝑛
), 0}
𝑁1+ • 𝑤𝑖
𝑛−1
•
+
𝐷1 𝑁1 𝑤𝑖
𝑛−1
• + 𝐷2 𝑁2 𝑤𝑖
𝑛−1
• + 𝐷3+ 𝑁3+ 𝑤𝑖
𝑛−1
•
𝑁1+ • 𝑤𝑖
𝑛−1
•
𝑃KN 𝑤 𝑛 𝑤𝑖+1
𝑛−1
State of the art (since 15 years!)
24. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
24 of 30
WeST
Smoothing of GLMs
We can use all smoothing techniques on GLMs as well!
Small modification:
E.g: 𝑃 San | If ∗ going ∗
Lower order sequence :
– Normally: 𝑃 San | ∗ going ∗
– Instead use 𝑃 San | going ∗
26. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
26 of 30
WeST
Progress
Done Yet:
Extract text from XML files
Building GLMs
Kneser-Ney and modified Kneser-Ney smoothing
Indexing with MySQL
ToDo’s
Finish evaluation program
Run evaluation
Analyze results
30. Martin Körner
mkoerner@uni-koblenz.de
Oberseminar 25.07.2013
30 of 30
WeST
Sources
Images:
Wheelchair Joystick (Slide 4):
http://i01.i.aliimg.com/img/pb/741/422/527/527422741_355.jpg
Smartphone Keyboard (Slide 4):
https://activecaptain.com/articles/mobilePhones/iPhone/iPhone_Keyboard.jpg
References:
[CG98]: Stanley Chen and Joshua Goodman. An empirical study of smoothing
techniques for language modeling. Technical report, Technical Report TR-10-
98, Harvard University, August, 1998.
[JM80]: F. Jelinek and R.L. Mercer. Interpolated estimation of markov source
parameters from sparse data. In Proceedings of the Workshop on Pattern
Recognition in Practice, pages 381–397, 1980.
[KN95]: Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram
language modeling. In Acoustics, Speech, and Signal Processing, 1995.
ICASSP-95., 1995 International Conference on, volume 1, pages 181–184.
IEEE, 1995.