Hand-outs of a lecture on Aho-Corarick string matching algorithm on Biosequence Algorithms course at University of Eastern Finland, Kuopio, in Spring 2012
The document summarizes and provides code examples for four pattern matching algorithms:
1. The brute force algorithm checks each character position in the text to see if the pattern starts there, running in O(mn) time in worst case.
2. The Boyer-Moore algorithm uses a "bad character" shift and "good suffix" shift to skip over non-matching characters in the text, running faster than brute force.
3. The Knuth-Morris-Pratt algorithm uses a failure function to determine the maximum shift of the pattern on a mismatch, avoiding wasteful comparisons.
4. The failure function allows KMP to skip portions of the text like Boyer-Moore, running
Python programming language provides the following types of loops to handle looping requirements:
1. While
2. Do While
3. For loop
Python provides three ways for executing the loops. While all the ways provide similar basic functionality, they differ in their syntax and condition-checking time.
This document provides an overview of the Knuth-Morris-Pratt substring search algorithm. It defines the algorithm, describes its history and key components including the prefix function and KMP matcher. An example showing the step-by-step workings of the algorithm on a text and pattern is provided. The algorithm's linear runtime complexity of O(n+m) is compared to other string matching algorithms. Real-world applications including DNA sequence analysis and search engines are discussed.
The document discusses hash tables and hashing techniques. It describes how hash tables use a hash function to map keys to positions in a table. Collisions can occur if multiple keys map to the same position. There are two main approaches to handling collisions - open hashing which stores collided items in linked lists, and closed hashing which uses techniques like linear probing to find alternate positions in the table. The document also discusses various hash functions and their properties, like minimizing collisions.
The document discusses string operations and storage in programming languages. It describes the basic character sets used which include alphabets, digits, and special characters. It then discusses three methods of storing strings: fixed-length storage, variable-length storage with a fixed maximum, and linked storage. The document proceeds to define common string operations like length, substring, indexing, concatenation, insertion, deletion, replacement, and pattern matching. Algorithms for implementing some of these operations are provided.
This document discusses tuples in Python. Some key points:
- Tuples are ordered sequences of elements that can contain different data types. They are defined using parentheses.
- Elements can be accessed using indexes like lists and strings. Tuples are immutable - elements cannot be changed.
- Common tuple methods include count, index, sorting, finding min, max and sum.
- Nested tuples can store related data like student records with roll number, name and marks.
- Examples demonstrate swapping numbers without a temporary variable, returning multiple values from a function, and finding max/min from a user-input tuple.
The document discusses strings in C programming. It defines strings as sequences of characters stored as character arrays that are terminated with a null character. It covers string literals, declaring and initializing string variables, reading and writing strings, and common string manipulation functions like strlen(), strcpy(), strcmp(), and strcat(). These functions allow operations on strings like getting the length, copying strings, comparing strings, and concatenating strings.
This document contains information about a mentoring program run by Baabtra-Mentoring Partner. It includes:
- A disclaimer that this is not an official Baabtra document
- A table showing a mentee's typing speed progress over 5 weeks
- An empty table to track jobs applied to by the mentee
- An explanation of sets in Python, including how to construct, manipulate, perform operations on, and iterate over sets
- Contact information for Baabtra
The document summarizes and provides code examples for four pattern matching algorithms:
1. The brute force algorithm checks each character position in the text to see if the pattern starts there, running in O(mn) time in worst case.
2. The Boyer-Moore algorithm uses a "bad character" shift and "good suffix" shift to skip over non-matching characters in the text, running faster than brute force.
3. The Knuth-Morris-Pratt algorithm uses a failure function to determine the maximum shift of the pattern on a mismatch, avoiding wasteful comparisons.
4. The failure function allows KMP to skip portions of the text like Boyer-Moore, running
Python programming language provides the following types of loops to handle looping requirements:
1. While
2. Do While
3. For loop
Python provides three ways for executing the loops. While all the ways provide similar basic functionality, they differ in their syntax and condition-checking time.
This document provides an overview of the Knuth-Morris-Pratt substring search algorithm. It defines the algorithm, describes its history and key components including the prefix function and KMP matcher. An example showing the step-by-step workings of the algorithm on a text and pattern is provided. The algorithm's linear runtime complexity of O(n+m) is compared to other string matching algorithms. Real-world applications including DNA sequence analysis and search engines are discussed.
The document discusses hash tables and hashing techniques. It describes how hash tables use a hash function to map keys to positions in a table. Collisions can occur if multiple keys map to the same position. There are two main approaches to handling collisions - open hashing which stores collided items in linked lists, and closed hashing which uses techniques like linear probing to find alternate positions in the table. The document also discusses various hash functions and their properties, like minimizing collisions.
The document discusses string operations and storage in programming languages. It describes the basic character sets used which include alphabets, digits, and special characters. It then discusses three methods of storing strings: fixed-length storage, variable-length storage with a fixed maximum, and linked storage. The document proceeds to define common string operations like length, substring, indexing, concatenation, insertion, deletion, replacement, and pattern matching. Algorithms for implementing some of these operations are provided.
This document discusses tuples in Python. Some key points:
- Tuples are ordered sequences of elements that can contain different data types. They are defined using parentheses.
- Elements can be accessed using indexes like lists and strings. Tuples are immutable - elements cannot be changed.
- Common tuple methods include count, index, sorting, finding min, max and sum.
- Nested tuples can store related data like student records with roll number, name and marks.
- Examples demonstrate swapping numbers without a temporary variable, returning multiple values from a function, and finding max/min from a user-input tuple.
The document discusses strings in C programming. It defines strings as sequences of characters stored as character arrays that are terminated with a null character. It covers string literals, declaring and initializing string variables, reading and writing strings, and common string manipulation functions like strlen(), strcpy(), strcmp(), and strcat(). These functions allow operations on strings like getting the length, copying strings, comparing strings, and concatenating strings.
This document contains information about a mentoring program run by Baabtra-Mentoring Partner. It includes:
- A disclaimer that this is not an official Baabtra document
- A table showing a mentee's typing speed progress over 5 weeks
- An empty table to track jobs applied to by the mentee
- An explanation of sets in Python, including how to construct, manipulate, perform operations on, and iterate over sets
- Contact information for Baabtra
PPT On Sorting And Searching Concepts In Data Structure | In Programming Lang...Umesh Kumar
The document discusses various sorting and searching algorithms:
- Bubble sort, selection sort, merge sort, quicksort are sorting algorithms that arrange data in a particular order like numerical or alphabetical.
- Linear search and binary search are searching algorithms where linear search sequentially checks each item while binary search divides the data set in half with each comparison.
- Examples are provided to illustrate how each algorithm works step-by-step on sample data sets.
An array allows you to store multiple values of the same type. It is like a spreadsheet with columns. You specify the data type of the array and the number of elements (size). Each element has an index position starting from 0. You can assign values to each position using arrayName[index] = value. Arrays make it easy to work with multiple values of the same type.
A tuple is an ordered collection of elements that is immutable. Tuples are created using parentheses, with elements separated by commas. Elements within a tuple can be accessed using indexing with integers or slicing to access ranges. Tuples can be concatenated using the + operator or repeated using the * operator. The len(), count(), index(), min(), max() and sorted() functions can be used to operate on tuples. Tuples can be traversed using for loops or while loops to access each element.
What is Dictionary In Python? Python Dictionary Tutorial | EdurekaEdureka!
YouTube Link: https://youtu.be/rZjhId0VkuY
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Dictionary In Python' will help you understand the concept of dictionary, why and how we can use dictionary in python and various operations that we can perform on a dictionary. Below are the topics covered in this PPT:
What Is A Dictionary In Python?
Why Use A Python Dictionary?
Lists vs Dictionary
How To Implement A Dictionary In Python?
Operations In Python Dictionary
Use Case - Nested Dictionary
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This is presentation, that covers all the important topics related to strings in python. It covers storing, slicing, format, concatenation, modification, escape characters and string methods.
The file attatched also includes examples related to the slides shown.
Other than some generic containers like list, Python in its definition can also handle containers with specified data types. Array can be handled in python by module named “array“. They can be useful when we have to manipulate only a specific data type values.
The Boyer-Moore string matching algorithm was developed in 1977 and is considered one of the most efficient string matching algorithms. It works by scanning the pattern from right to left and shifting the pattern by multiple characters if a mismatch is found, using preprocessing tables. The algorithm constructs a bad character shift table during preprocessing that stores the maximum number of positions a mismatched character can shift the pattern. It then aligns the pattern with the text and checks for matches, shifting the pattern right by the value in the table if a mismatch occurs.
Trie is an efficient data structure for storing and retrieving strings. It stores strings in a tree structure, with each node representing a character. Common operations on a trie like insertion, deletion and searching of strings can be performed in O(M) time where M is the length of the string. The document then provides details on the node structure used to implement a trie, along with pseudocode for inserting strings like "Apple" and "Army" into an empty trie.
Arrays are a data structure that allow the storage of multiple elements of the same type. They can have one or more dimensions. One-dimensional arrays use a single subscript, while two-dimensional arrays use two subscripts to reference rows and columns. Arrays can be passed as arguments to functions in several ways, including as a pointer, sized array, or unsized array. Arrays are useful for storing and sorting data, performing matrix operations, and storing temporary values in recursive functions. However, arrays have limitations such as a static size, requiring elements to be of the same type, and potential memory issues if not sized correctly.
NumPy is a Python library that provides multidimensional array and matrix objects to perform scientific computing. It contains efficient functions for operations on arrays like arithmetic, aggregation, copying, indexing, slicing, and reshaping. NumPy arrays have advantages over native Python sequences like fixed size and efficient mathematical operations. Common NumPy operations include elementwise arithmetic, aggregation functions, copying and transposing arrays, changing array shapes, and indexing/slicing arrays.
This document discusses string matching algorithms. It defines string matching as finding a pattern within a larger text or string. It then summarizes two common string matching algorithms: the naive algorithm and Rabin-Karp algorithm. The naive algorithm loops through all possible shifts of the pattern and directly compares characters. Rabin-Karp also shifts the pattern but compares hash values of substrings first before checking individual characters to reduce comparisons. The document provides examples of how each algorithm works on sample strings.
The document discusses heap sort, which is a sorting algorithm that uses a heap data structure. It works in two phases: first, it transforms the input array into a max heap using the insert heap procedure; second, it repeatedly extracts the maximum element from the heap and places it at the end of the sorted array, reheapifying the remaining elements. The key steps are building the heap, processing the heap by removing the root element and allowing the heap to reorder, and doing this repeatedly until the array is fully sorted.
Design and Analysis of Algorithm help to design the algorithms for solving different types of problems in Computer Science. It also helps to design and analyze the logic of how the program will work before developing the actual code for a program.
The document discusses sorting algorithms. It begins by defining sorting as arranging data in logical order based on a key. It then discusses internal and external sorting methods. For internal sorting, all data fits in memory, while external sorting handles data too large for memory. The document covers stability, efficiency, and time complexity of various sorting algorithms like bubble sort, selection sort, insertion sort, and merge sort. Merge sort uses a divide-and-conquer approach to sort arrays with a time complexity of O(n log n).
The document discusses linear and binary search algorithms. Linear search is a sequential search where each element of a collection is checked sequentially to find a target value. Binary search improves on this by checking the middle element first and narrowing the search space in half each time based on the comparison. It allows searching sorted data more efficiently in logarithmic time as opposed to linear time for sequential search. The document provides pseudocode to implement binary search and an example to search an integer in a sorted array.
The document discusses various sorting algorithms. It begins by defining a sorting algorithm as arranging elements of a list in a certain order, such as numerical or alphabetical order. It then discusses popular sorting algorithms like insertion sort, bubble sort, merge sort, quicksort, selection sort, and heap sort. For each algorithm, it provides examples to illustrate how the algorithm works step-by-step to sort a list of numbers. Code snippets are also included for insertion sort and bubble sort.
Arrays in Python can hold multiple values and each element has a numeric index. Arrays can be one-dimensional (1D), two-dimensional (2D), or multi-dimensional. Common operations on arrays include accessing elements, adding/removing elements, concatenating arrays, slicing arrays, looping through elements, and sorting arrays. The NumPy library provides powerful capabilities to work with n-dimensional arrays and matrices.
This document provides an overview of lists in Python. It defines lists as ordered and changeable collections that allow duplicate elements. It describes how to create, access, modify, loop through, add and remove elements from lists. Built-in list methods like append(), pop(), sort(), count() and their usage are explained. The document also shows examples of sorting lists, removing failed grades, shuffling words to create anagrams, and reading/writing lists from user input.
This document discusses various techniques for text normalization and pattern matching using regular expressions. It covers tokenization, lemmatization, edit distance, and basic regular expression patterns including character classes, quantifiers, anchors, precedence, and disjunction. The key goals of text normalization are to standardize text into a convenient form for processing, while regular expressions provide a language for specifying text search patterns.
This document provides an overview of mathematical preliminaries including sets, functions, relations, graphs, and proof techniques. It defines sets, set operations, and set representations. It also covers functions, relations, Cartesian products, graphs including walks, paths, cycles, and trees. Finally, it discusses proof techniques like induction and contradiction and provides examples of proofs using these techniques.
This document discusses formal languages and grammars. It begins by defining key concepts related to languages, including alphabets, strings, length, and empty strings. It then discusses how to represent languages using grammars. A grammar is defined as G = (VN, VT, P, S) where VN is a set of nonterminal symbols, VT is a set of terminal symbols, P is a set of production rules, and S is the start symbol. Derivations using these production rules generate the language from the start symbol. The document also discusses the Chomsky hierarchy of formal languages based on the types of production rules allowed in their grammars. These include unrestricted, context-sensitive, context-free, and regular languages
PPT On Sorting And Searching Concepts In Data Structure | In Programming Lang...Umesh Kumar
The document discusses various sorting and searching algorithms:
- Bubble sort, selection sort, merge sort, quicksort are sorting algorithms that arrange data in a particular order like numerical or alphabetical.
- Linear search and binary search are searching algorithms where linear search sequentially checks each item while binary search divides the data set in half with each comparison.
- Examples are provided to illustrate how each algorithm works step-by-step on sample data sets.
An array allows you to store multiple values of the same type. It is like a spreadsheet with columns. You specify the data type of the array and the number of elements (size). Each element has an index position starting from 0. You can assign values to each position using arrayName[index] = value. Arrays make it easy to work with multiple values of the same type.
A tuple is an ordered collection of elements that is immutable. Tuples are created using parentheses, with elements separated by commas. Elements within a tuple can be accessed using indexing with integers or slicing to access ranges. Tuples can be concatenated using the + operator or repeated using the * operator. The len(), count(), index(), min(), max() and sorted() functions can be used to operate on tuples. Tuples can be traversed using for loops or while loops to access each element.
What is Dictionary In Python? Python Dictionary Tutorial | EdurekaEdureka!
YouTube Link: https://youtu.be/rZjhId0VkuY
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Dictionary In Python' will help you understand the concept of dictionary, why and how we can use dictionary in python and various operations that we can perform on a dictionary. Below are the topics covered in this PPT:
What Is A Dictionary In Python?
Why Use A Python Dictionary?
Lists vs Dictionary
How To Implement A Dictionary In Python?
Operations In Python Dictionary
Use Case - Nested Dictionary
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
This is presentation, that covers all the important topics related to strings in python. It covers storing, slicing, format, concatenation, modification, escape characters and string methods.
The file attatched also includes examples related to the slides shown.
Other than some generic containers like list, Python in its definition can also handle containers with specified data types. Array can be handled in python by module named “array“. They can be useful when we have to manipulate only a specific data type values.
The Boyer-Moore string matching algorithm was developed in 1977 and is considered one of the most efficient string matching algorithms. It works by scanning the pattern from right to left and shifting the pattern by multiple characters if a mismatch is found, using preprocessing tables. The algorithm constructs a bad character shift table during preprocessing that stores the maximum number of positions a mismatched character can shift the pattern. It then aligns the pattern with the text and checks for matches, shifting the pattern right by the value in the table if a mismatch occurs.
Trie is an efficient data structure for storing and retrieving strings. It stores strings in a tree structure, with each node representing a character. Common operations on a trie like insertion, deletion and searching of strings can be performed in O(M) time where M is the length of the string. The document then provides details on the node structure used to implement a trie, along with pseudocode for inserting strings like "Apple" and "Army" into an empty trie.
Arrays are a data structure that allow the storage of multiple elements of the same type. They can have one or more dimensions. One-dimensional arrays use a single subscript, while two-dimensional arrays use two subscripts to reference rows and columns. Arrays can be passed as arguments to functions in several ways, including as a pointer, sized array, or unsized array. Arrays are useful for storing and sorting data, performing matrix operations, and storing temporary values in recursive functions. However, arrays have limitations such as a static size, requiring elements to be of the same type, and potential memory issues if not sized correctly.
NumPy is a Python library that provides multidimensional array and matrix objects to perform scientific computing. It contains efficient functions for operations on arrays like arithmetic, aggregation, copying, indexing, slicing, and reshaping. NumPy arrays have advantages over native Python sequences like fixed size and efficient mathematical operations. Common NumPy operations include elementwise arithmetic, aggregation functions, copying and transposing arrays, changing array shapes, and indexing/slicing arrays.
This document discusses string matching algorithms. It defines string matching as finding a pattern within a larger text or string. It then summarizes two common string matching algorithms: the naive algorithm and Rabin-Karp algorithm. The naive algorithm loops through all possible shifts of the pattern and directly compares characters. Rabin-Karp also shifts the pattern but compares hash values of substrings first before checking individual characters to reduce comparisons. The document provides examples of how each algorithm works on sample strings.
The document discusses heap sort, which is a sorting algorithm that uses a heap data structure. It works in two phases: first, it transforms the input array into a max heap using the insert heap procedure; second, it repeatedly extracts the maximum element from the heap and places it at the end of the sorted array, reheapifying the remaining elements. The key steps are building the heap, processing the heap by removing the root element and allowing the heap to reorder, and doing this repeatedly until the array is fully sorted.
Design and Analysis of Algorithm help to design the algorithms for solving different types of problems in Computer Science. It also helps to design and analyze the logic of how the program will work before developing the actual code for a program.
The document discusses sorting algorithms. It begins by defining sorting as arranging data in logical order based on a key. It then discusses internal and external sorting methods. For internal sorting, all data fits in memory, while external sorting handles data too large for memory. The document covers stability, efficiency, and time complexity of various sorting algorithms like bubble sort, selection sort, insertion sort, and merge sort. Merge sort uses a divide-and-conquer approach to sort arrays with a time complexity of O(n log n).
The document discusses linear and binary search algorithms. Linear search is a sequential search where each element of a collection is checked sequentially to find a target value. Binary search improves on this by checking the middle element first and narrowing the search space in half each time based on the comparison. It allows searching sorted data more efficiently in logarithmic time as opposed to linear time for sequential search. The document provides pseudocode to implement binary search and an example to search an integer in a sorted array.
The document discusses various sorting algorithms. It begins by defining a sorting algorithm as arranging elements of a list in a certain order, such as numerical or alphabetical order. It then discusses popular sorting algorithms like insertion sort, bubble sort, merge sort, quicksort, selection sort, and heap sort. For each algorithm, it provides examples to illustrate how the algorithm works step-by-step to sort a list of numbers. Code snippets are also included for insertion sort and bubble sort.
Arrays in Python can hold multiple values and each element has a numeric index. Arrays can be one-dimensional (1D), two-dimensional (2D), or multi-dimensional. Common operations on arrays include accessing elements, adding/removing elements, concatenating arrays, slicing arrays, looping through elements, and sorting arrays. The NumPy library provides powerful capabilities to work with n-dimensional arrays and matrices.
This document provides an overview of lists in Python. It defines lists as ordered and changeable collections that allow duplicate elements. It describes how to create, access, modify, loop through, add and remove elements from lists. Built-in list methods like append(), pop(), sort(), count() and their usage are explained. The document also shows examples of sorting lists, removing failed grades, shuffling words to create anagrams, and reading/writing lists from user input.
This document discusses various techniques for text normalization and pattern matching using regular expressions. It covers tokenization, lemmatization, edit distance, and basic regular expression patterns including character classes, quantifiers, anchors, precedence, and disjunction. The key goals of text normalization are to standardize text into a convenient form for processing, while regular expressions provide a language for specifying text search patterns.
This document provides an overview of mathematical preliminaries including sets, functions, relations, graphs, and proof techniques. It defines sets, set operations, and set representations. It also covers functions, relations, Cartesian products, graphs including walks, paths, cycles, and trees. Finally, it discusses proof techniques like induction and contradiction and provides examples of proofs using these techniques.
This document discusses formal languages and grammars. It begins by defining key concepts related to languages, including alphabets, strings, length, and empty strings. It then discusses how to represent languages using grammars. A grammar is defined as G = (VN, VT, P, S) where VN is a set of nonterminal symbols, VT is a set of terminal symbols, P is a set of production rules, and S is the start symbol. Derivations using these production rules generate the language from the start symbol. The document also discusses the Chomsky hierarchy of formal languages based on the types of production rules allowed in their grammars. These include unrestricted, context-sensitive, context-free, and regular languages
This document discusses nondeterministic finite automata (NFAs) and how they can be converted into equivalent deterministic finite automata (DFAs) through a process called subset construction. It provides examples of NFAs, including one modeling moves on a chessboard, and walks through the subset construction process to convert the chessboard NFA into an equivalent DFA. The document also discusses NFAs with epsilon transitions and how they are equivalent in power to ordinary NFAs without epsilon transitions.
Computing the volume of a convex body is a fundamental problem in computational geometry and optimization. In this talk we discuss the computational complexity of this problem from a theoretical as well as practical point of view. We show examples of how volume computation appear in applications ranging from combinatorics to algebraic geometry.
Next, we design the first practical algorithm for polytope volume approximation in high dimensions (few hundreds).
The algorithm utilizes uniform sampling from a convex region and efficient boundary polytope oracles.
Interestingly, our software provides a framework for exploring theoretical advances since it is believed, and our experiments provide evidence for this belief, that the current asymptotic bounds are unrealistically high.
I am Britney P. I love exploring new topics. Academic writing seemed an interesting option for me. After working for many years with progamminghomeworkhelp.com, I have assisted many students with their Design and Analysis of Algorithms Assignments. I can proudly say, each student I have served is happy with the quality of the solution that I have provided. I have acquired my bachelor's from Sunway University, Malaysia.
The document discusses compilers and their components. It begins by defining a compiler as a program that translates programs written in one language into another language. It then describes the main components of a compiler: lexical analysis, syntax analysis, semantic analysis, code generation, and code optimization. The document provides details on each component and how they work together in the compilation process from a source language to a target language.
Abstract:
This paper deals with the vital role of primitive polynomials for designing PN sequence generators. The standard LFSR (linear feedback shift register) used for pattern generation may give repetitive patterns. Which are in certain cases is not efficient for complete test coverage. The programmable LFSR based on primitive polynomial generates maximum-length PRPG with maximum fault coverage.
Characterizing the Distortion of Some Simple Euclidean EmbeddingsDon Sheehy
This talk addresses some upper and lower bounds techniques for bounding the distortion between mappings between Euclidean metric spaces including circles, spheres, pairs of lines, triples of planes, and the union of a hyperplane and a point.
The document provides information about a course on the theory of automata. It includes details such as the course title, prerequisites, duration, lectures, laboratories, and topics to be covered. The topics include finite automata, deterministic finite automata, non-deterministic finite automata, regular expressions, properties of regular languages, context-free grammars, pushdown automata, and Turing machines. It also lists reference books and textbooks, and the marking scheme for the course.
This document provides information about a course on the theory of automata, including:
- The course title is Theory of Computer Science [Automata] and is intended for students pursuing a BCS degree in their fifth semester.
- Key topics that will be covered include finite automata, regular expressions, context-free grammars, pushdown automata, and Turing machines.
- Reference textbooks and materials are listed to support student learning in the course over 18 weeks of lectures and labs.
This document contains lecture notes on finite automata from the Theory of Computation unit 1 course. Some key points include:
1. Finite automata are defined as 5-tuples (Q, Σ, δ, q0, F) where Q is a set of states, Σ is an input alphabet, δ is a transition function, q0 is the initial state, and F is a set of final states.
2. Deterministic finite automata (DFAs) have a single transition between states for each input symbol, while non-deterministic finite automata (NFAs) can have multiple transitions for a single input.
3. Regular expressions are used to describe the languages
This document summarizes two algorithms for computing properties of high-dimensional polytopes given access to certain oracle functions:
1. An algorithm for computing the edge-skeleton of a polytope in oracle polynomial-time using an oracle that returns the vertex maximizing a linear function.
2. A randomized algorithm for approximating the volume of a polytope by generating random points within it using a hit-and-run process, and estimating the volume from these points. The algorithm runs in oracle polynomial-time and provides an approximation with high probability.
Experimental results show the volume algorithm can approximate volumes of polytopes up to 100 dimensions within 1% error in under 2 hours, outperforming exact
This document provides an introduction to automata theory and finite automata. It defines an automaton as an abstract computing device that follows a predetermined sequence of operations automatically. A finite automaton has a finite number of states and can be deterministic or non-deterministic. The document outlines the formal definitions and representations of finite automata. It also discusses related concepts like alphabets, strings, languages, and the conversions between non-deterministic and deterministic finite automata. Methods for minimizing deterministic finite automata using Myhill-Nerode theorem and equivalence theorem are also introduced.
Practical volume estimation of polytopes by billiard trajectories and a new a...Apostolos Chalkis
A new randomized method to approximate the volume of a convex polytope based on simulated annealing for cooling convex bodies and MCMC sampling with geometric random walks
This talk is going to be centered on two papers that are going to appear in the following months:
Neerja Mhaskar and Michael Soltys, Non-repetitive strings over alphabet lists
to appear in WALCOM, February 2015.
Neerja Mhaskar and Michael Soltys, String Shuffle: Circuits and Graphs
to appear in the Journal of Discrete Algorithms, January 2015.
Visit http://soltys.cs.csuci.edu for more details (these two papers are number 3 and 19 on the page), as well as Python programs that can be used to illustrate the ideas in the papers. We are going to introduce some basic concepts related to computations on string, present some recent results, and propose two open problems.
The document discusses regular expressions and regular languages. It provides examples of regular expressions over alphabets, describes how to build regular expressions using operators like concatenation and Kleene star, and how regular expressions correspond to finite automata. It also discusses properties of regular languages and how to use the pumping lemma to prove that a language is not regular.
The document discusses parallel graph algorithms. It describes Dijkstra's algorithm for finding single-source shortest paths and its parallel formulations. It also describes Floyd's algorithm for finding all-pairs shortest paths and its parallel formulation using a 2D block mapping. Additionally, it discusses Johnson's algorithm, a modification of Dijkstra's algorithm to efficiently handle sparse graphs, and its parallel formulation.
The document describes finite state automata and their applications. It introduces deterministic finite state automata (DFA) and nondeterministic finite state automata (NFA). Key points include:
- DFAs have a single transition function, while NFAs can have multiple possible transitions.
- Any language accepted by an NFA can also be accepted by an equivalent DFA. The construction involves creating states that represent sets of NFA states.
- -transitions allow NFAs to move between states without reading an input symbol. However, any NFA with -transitions can be converted to an equivalent NFA without -transitions.
- Examples of state diagrams and tables are provided to illustrate binary addition, strings
Consistency proof of a feasible arithmetic inside a bounded arithmeticYamagata Yoriyuki
1) The author proves that the consistency of the bounded arithmetic theory S12 can be proved in the weaker theory PV-, which is Cook and Urquhart's equational theory PV minus induction. This is a strengthening of previous results that placed the lower bound at weaker but related theories.
2) The proof works by defining a system C for feasibly computing terms in PV- and proving it is sound with respect to PV-. Since C cannot prove false statements, this implies the consistency of PV-.
3) Establishing this lower bound of PV- is significant for attempts to separate the theories S2 and S12 by relative consistency statements.
Graphs are useful tools for modeling problems and consist of vertices and edges. Breadth-first search (BFS) is an algorithm that visits the vertices of a graph starting from a source vertex and proceeding to visit neighboring vertices first, before moving to neighbors that are further away. BFS uses a queue to efficiently traverse the graph and discover all possible paths from the source to other vertices, identifying the shortest paths in an unweighted graph. The time complexity of BFS on an adjacency list representation is O(n+m) where n is the number of vertices and m is the number of edges.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Aho corasick-lecture
1. Biosequence Algorithms, Spring 2012
Lecture 4: Set Matching and
Aho-Corasick Algorithm
Pekka Kilpel¨ainen
University of Eastern Finland
School of Computing
BSA Lecture 4: Aho-Corasick matching – p.1/27
Exact Set Matching Problem
In the exact set matching problem we locate occurrences
of any pattern of a set P = {P1, . . . , Pk}, in target T[1 . . . m]
Let n = k
i=1 |Pi|. Exact set matching can be solved in time
O(|P1| + m + · · · + |Pk| + m) = O(n + km)
by applying any linear-time exact matching k times
Aho-Corasick algorithm (AC) is a classic solution to
exact set matching. It works in time O(n + m + z), where z
is number of pattern occurrences in T
(Main reference here [Aho and Corasick, 1975])
AC is based on a refinement of a keyword tree
BSA Lecture 4: Aho-Corasick matching – p.2/27
Keyword Trees
A keyword tree (or a trie) for a set of patterns P is a
rooted tree K such that
1. each edge of K is labeled by a character
2. any two edges out of a node have different labels
Define the label of a node v as the concatenation of edge
labels on the path from the root to v, and denote it by L(v)
3. for each P ∈ P there’s a node v with L(v) = P, and
4. the label L(v) of any leaf v equals some P ∈ P
BSA Lecture 4: Aho-Corasick matching – p.3/27
Example of a Keyword Tree
A keyword tree for P = {he, she, his, hers}:
-
-
-
-
-
-
-
Z
Z
ZZ~
J
J
J
J
J
J^
h e sr
s
s
h e
i
2
41
3
A keyword tree is an efficient implementation of a
dictionary of strings
BSA Lecture 4: Aho-Corasick matching – p.4/27
Keyword Tree: Construction
Construction for P = {P1, . . . , Pk}:
Begin with a root node only;
Insert each pattern Pi, one after the other, as follows:
Starting at the root, follow the path labeled by chars of Pi;
If the path ends before Pi, continue it by adding new
edges and nodes for the remaining characters of Pi
Store identifier i of Pi at the terminal node of the path
This takes clearly O(|P1| + · · · + |Pk|) = O(n) time
BSA Lecture 4: Aho-Corasick matching – p.5/27
Keyword Tree: Lookup
Lookup of a string P: Starting at root, follow the path
labeled by characters of P as long as possible;
If the path leads to a node with an identifier, P is a
keyword in the dictionary
If the path terminates before P, the string is not in the
dictionary
Takes clearly O(|P|) time — An efficient look-up method!
Naive application to pattern matching would lead to Θ(nm)
time
Next we extend a keyword tree into an automaton,
to support linear-time matching
BSA Lecture 4: Aho-Corasick matching – p.6/27
Aho-Corasick Automaton (1)
States: nodes of the keyword tree
initial state: 0 = the root
Actions are determined by three functions:
1. the goto function g(q, a) gives the state entered from
current state q by matching target char a
if edge (q, v) is labeled by a, then g(q, a) = v;
g(0, a) = 0 for each a that does not label an edge
out of the root
the automaton stays at the initial state while
scanning non-matching characters
Otherwise g(q, a) = ∅
BSA Lecture 4: Aho-Corasick matching – p.7/27
Aho-Corasick Automaton (2)
2. the failure function f(q) for q = 0 gives the state
entered at a mismatch
f(q) is the node labeled by the longest proper suffix
w of L(q) s.t. w is a prefix of some pattern
→ a fail transition does not miss any potential
occurrences
NB: f(q) is always defined, since L(0) = ǫ is a
prefix of any pattern
3. the output function out(q) gives the set of patterns
recognized when entering state q
BSA Lecture 4: Aho-Corasick matching – p.8/27
2. Example of an AC Automaton
0 1
6
2
3 4 5
7
8 9
{he}
{hers}
{his}
h
={h, s}
h {he, she}
s
e
r s
i
e
s
Dashed arrows are fail transitions
BSA Lecture 4: Aho-Corasick matching – p.9/27
AC Search of Target T[1 . . . m]
q := 0; // initial state (root)
for i := 1 to m do
while g(q, T[i]) = ∅ do
q := f(q); // follow a fail
q := g(q, T[i]); // follow a goto
if out(q) = ∅ then print i, out(q);
endfor;
Example:
Search text “ushers” with the preceding automaton
BSA Lecture 4: Aho-Corasick matching – p.10/27
AC Search Example
0 1
6
2
3 4 5
7
8 9
{he}
{hers}
{his}
h
={h, s}
h {he, she}
s
e
r s
i
e
s
1 2 3 4 5 6
T: u s h e r s
q:
BSA Lecture 4: Aho-Corasick matching – p.11/27
Complexity of AC Search
Theorem Searching target T[1 . . . m] with an AC
automaton takes time O(m + z), where z is the number of
pattern occurrences
Proof. For each target character, the automaton performs
0 or more fail transitions, followed by a goto.
Each goto either stays at the root, or increases the depth
of q by 1 ⇒ the depth of q is increased ≤ m times
Each fail moves q closer to the root
⇒ the total number of fail transitions is ≤ m
The z occurrences can be reported in z × O(1) = O(z) time
(say, as pattern identifiers and start positions of
occurrences)
BSA Lecture 4: Aho-Corasick matching – p.12/27
Constructing an AC Automaton
The AC automaton can be constructed in two phases
Phase I:
1. Construct the keyword tree for P
for each P ∈ P added to the tree, set
out(v) := {P} for the node v s.t. L(v) = P
2. complete the goto function for the root by setting
g(0, a) := 0
for each a ∈ Σ that doesn’t label an edge out of the root
If the alphabet Σ is fixed, Phase I takes time O(n)
BSA Lecture 4: Aho-Corasick matching – p.13/27
Result of Phase I
sr
s
s
e
0 1
6
2
3 4 5
7
8 9
{he}
{hers}
{his}
{she}
h
={h, s}
i
e
h
BSA Lecture 4: Aho-Corasick matching – p.14/27
Result of Phase I
After Phase I
goto is complete
out() may be incomplete (E.g., ”he” for state 5)
fail links are missing
The automaton is completed in Phase II
BSA Lecture 4: Aho-Corasick matching – p.15/27
Phase II of the AC Construction
Q := emptyQueue();
for a ∈ Σ do
if q(0, a) = q = 0 then
f(q) := 0; enqueue(q, Q);
while not isEmpty(Q) do
r:= dequeue(Q);
for a ∈ Σ do
if g(r, a) = u = ∅ then
enqueue(u, Q); v := f(r);
while g(v, a) = ∅ do v := f(v); // (*)
f(u) := g(v, a);
out(u) := out(u) ∪ out(f(u));
What does this do?
BSA Lecture 4: Aho-Corasick matching – p.16/27
3. Explanation of Phase II
Functions fail and output are computed for the nodes of
the trie in a breadth-first order
nodes closer to the root have already been processed
Consider nodes r and u = g(r, a), that is, r is the parent of
u and L(u) = L(r)a
Now what should f(u) be?
A: The deepest node labeled by a proper suffix of L(u).
The executions of line (*) find this, by locating the deepest
node v s.t. L(v) is a proper suffix of L(r) and g(v, a)
(= f(u)) is defined.
(Notice that v and g(v, a) may both be the root.)
BSA Lecture 4: Aho-Corasick matching – p.17/27
Completing the Output Functions
What about
out(u) := out(u) ∪ out(f(u)); ?
This is done because the patterns recognized at f(u) (if
any), and only those, are proper suffixes of L(u), and shall
thus be recognized at state u also.
BSA Lecture 4: Aho-Corasick matching – p.18/27
Complexity of the AC Construction
Phase II can be implemented to run in time O(n), too:
The breadth-first traversal alone takes time proportional to
the size of the tree, which is O(n);
OK; . . .
Is there also an O(n) bound for the number of times that
the f transitions are followed (on line (*) of Phase II)?
A: Yes! See next
BSA Lecture 4: Aho-Corasick matching – p.19/27
AC Construction: Number of fail
transitions
Consider nodes u1, . . . , ul created by entering a pattern
a1 . . . al; Let df(ui) denote the depth of their f nodes:
u 1 u 2 u i
u i+1a 2 a i
a i+1
a 1
df(u )i df(u )i+1
a i+1
u l
a l
df(u ) ll
0
df=0 df=0 or 1
When computing f(ui+1), each repeat of line (*)
decrements the value of df(ui+1)
BSA Lecture 4: Aho-Corasick matching – p.20/27
Number of fail transitions (cont)
So, for a pattern of length l,
the line (*) cannot be repeated over l times
Thus, in total over all patterns,
line (*) cannot be repeated over n times
BSA Lecture 4: Aho-Corasick matching – p.21/27
AC Construction: Unions of output
functions
Is it costly to perform
out(u) := out(u) ∪ out(f(u)); ?
No: Before the assignment, out(u) = ∅ or out(u) = {L(u)}.
Any patterns in out(f(u)) are shorter than L(u)
⇒ the sets are disjoint
→ Output sets can be implemented as linked lists, and
united in constant time
BSA Lecture 4: Aho-Corasick matching – p.22/27
Biological Applications
1. Matching against a library of known patterns
A Sequence-tagged-site (STS) is, roughly, a DNA string
of 200–300 bases whose left and right ends occur only
once in the entire genome
ESTs (expressed sequence tags) are STSs that participate
in gene expression, and thus belong to genes
Hundreds of thousands of STSs and tens of thousands of
ESTs (by mid-90’s) are stored in databases, and used to
compare against new DNA sequences
Ability to search for occurrences of patterns in
time that is independent of their number is very useful
BSA Lecture 4: Aho-Corasick matching – p.23/27
2. Matching with Wild Cards
Let φ be a wild card that matches any single character
For example, abφφcφ occurs at positions 2 and 7 of
1234567890123
xabvccababcax
A transcription factor is a protein that binds to specific
locations of DNA and regulates its transcription to RNA
Many families of transcription factors are characterized by
substrings with wild cards
Example: Transcription factor Zinc Finger has signature
CφφCφφφφφφφφφφφφφHφφH
(C and H: amino acids; cysteine and histidine)
BSA Lecture 4: Aho-Corasick matching – p.24/27
4. Matching with Wild Cards (2)
If the number of wild cards is bounded by a constant,
patterns with wild-cards can be matched in linear time, by
counting occurrences of non-wild-card substrings of P:
Let P = {P1, . . . , Pk} be the substrings of P separated by
wild-cards, and let l1, . . . , lk be their end positions in P
Preprocess: Build an AC automaton for P;
Initiate occurrence counts: for i := 1 to |T| do C[i] := 0;
Search target T with the AC automaton
When pattern Pj is found to end at position i ≥ lj of T,
increment C[i − lj + 1] by one;
Then P occurs at T[i] iff C[i] = k
BSA Lecture 4: Aho-Corasick matching – p.25/27
Example
Let P = φATCφφTCφATC
Then P = {ATC, TC, ATC} with l1 = 4, l2 = 8 and l3 = 12
0 1 2
A CT
3
4 5
T C
{8}
C, G
{4, 12, 8}
Search on
i: 12345678901234...
T: ACGATCTCTCGATC...
C[1] = C[7] = C[11] = 1 and C[3] = 3 (∼ occurrence)
BSA Lecture 4: Aho-Corasick matching – p.26/27
Complexity of AC Wild-Card
Matching
Let |P| = n and |T| = m
Preprocessing: O(n + m) (← k
i=1 |Pi| ≤ n)
Search: O(m + z), where z is # of occurrences of P in T
Each of them increments a cell of C by one, and each of
C[1], . . . , C[m] is incremented at most k times ⇒ z ≤ km
We have derived the following result:
Theorem Matching of a pattern P with k wild-cards can be
performed in time O(|P| + k|T|)
BSA Lecture 4: Aho-Corasick matching – p.27/27