Trie (aka radix tree or prefix tree), is an ordered tree data structure where the keys are usually strings. Tries have tremendous applications from all sorts of things like dictionary to
Trie is an efficient data structure for storing and retrieving strings. It stores strings in a tree structure, with each node representing a character. Common operations on a trie like insertion, deletion and searching of strings can be performed in O(M) time where M is the length of the string. The document then provides details on the node structure used to implement a trie, along with pseudocode for inserting strings like "Apple" and "Army" into an empty trie.
The document outlines a presentation on trees and binary trees. It discusses different tree types including binary trees, complete binary trees, and binary search trees. It covers tree traversal methods like preorder, inorder and postorder traversal. It also discusses representations of binary trees using arrays and linked lists and algorithms for insertion and deletion in binary search trees.
The document discusses hashing techniques for storing and retrieving data from memory. It covers hash functions, hash tables, open addressing techniques like linear probing and quadratic probing, and closed hashing using separate chaining. Hashing maps keys to memory addresses using a hash function to store and find data independently of the number of items. Collisions may occur and different collision resolution methods are used like open addressing that resolves collisions by probing in the table or closed hashing that uses separate chaining with linked lists. The efficiency of hashing depends on factors like load factor and average number of probes.
The document discusses different types of binary trees and tree traversal methods. It defines binary trees and outlines their key properties. It also describes different types of binary trees such as strictly binary trees, complete binary trees, and almost complete binary trees. Finally, it discusses two methods for traversing trees - depth-first traversal and breadth-first traversal, and covers preorder, inorder and postorder traversal techniques for binary trees.
BTrees - Great alternative to Red Black, AVL and other BSTsAmrinder Arora
BTrees - designed by Rudolf Bayer and Ed McCreight - fundamental data structure in computer science. Great alternative to BSTs. Very appropriate for disk based access.
Binary search is an algorithm that finds the position of a target value within a sorted array. It works by recursively dividing the array range in half and searching only within the appropriate half. The time complexity is O(log n) in the average and worst cases and O(1) in the best case, making it very efficient for searching sorted data. However, it requires the list to be sorted for it to work.
Trie is an efficient data structure for storing and retrieving strings. It stores strings in a tree structure, with each node representing a character. Common operations on a trie like insertion, deletion and searching of strings can be performed in O(M) time where M is the length of the string. The document then provides details on the node structure used to implement a trie, along with pseudocode for inserting strings like "Apple" and "Army" into an empty trie.
The document outlines a presentation on trees and binary trees. It discusses different tree types including binary trees, complete binary trees, and binary search trees. It covers tree traversal methods like preorder, inorder and postorder traversal. It also discusses representations of binary trees using arrays and linked lists and algorithms for insertion and deletion in binary search trees.
The document discusses hashing techniques for storing and retrieving data from memory. It covers hash functions, hash tables, open addressing techniques like linear probing and quadratic probing, and closed hashing using separate chaining. Hashing maps keys to memory addresses using a hash function to store and find data independently of the number of items. Collisions may occur and different collision resolution methods are used like open addressing that resolves collisions by probing in the table or closed hashing that uses separate chaining with linked lists. The efficiency of hashing depends on factors like load factor and average number of probes.
The document discusses different types of binary trees and tree traversal methods. It defines binary trees and outlines their key properties. It also describes different types of binary trees such as strictly binary trees, complete binary trees, and almost complete binary trees. Finally, it discusses two methods for traversing trees - depth-first traversal and breadth-first traversal, and covers preorder, inorder and postorder traversal techniques for binary trees.
BTrees - Great alternative to Red Black, AVL and other BSTsAmrinder Arora
BTrees - designed by Rudolf Bayer and Ed McCreight - fundamental data structure in computer science. Great alternative to BSTs. Very appropriate for disk based access.
Binary search is an algorithm that finds the position of a target value within a sorted array. It works by recursively dividing the array range in half and searching only within the appropriate half. The time complexity is O(log n) in the average and worst cases and O(1) in the best case, making it very efficient for searching sorted data. However, it requires the list to be sorted for it to work.
The document discusses binary search trees and their implementation. It begins by defining binary trees and their properties. It then describes how binary search trees work, with the key property that for every node, all keys in its left subtree are smaller than the node's key and all keys in its right subtree are larger. It provides pseudocode for basic binary search tree operations like search, insert, delete, find minimum and maximum. Finally, it introduces AVL trees, which are self-balancing binary search trees that ensure fast lookup by keeping the height of left and right subtrees close.
This document discusses tries, a data structure used to store strings. It begins by outlining the advantages of tries over hash tables, including faster insertion and lookup. It then defines a trie as a tree that stores strings where all descendants of a node have a common prefix. The document describes standard tries and compressed tries, noting compressed tries save space by merging nodes with only one child. It provides examples and discusses time/space complexity. Applications mentioned include word matching, prefix matching, and use in web search engines to index words and associated URLs.
1) Trees are hierarchical data structures that store data in nodes connected by edges. They are useful for representing hierarchical relationships.
2) Binary search trees allow quick search, insertion, and deletion of nodes due to the organizational property that the value of each node is greater than all nodes in its left subtree and less than all nodes in its right subtree.
3) Common tree traversal algorithms include preorder, inorder, and postorder traversals, which visit nodes in different orders depending on whether the left/right children or root is visited first.
The document discusses minimum spanning trees (MST) and two algorithms for finding them: Prim's algorithm and Kruskal's algorithm. Prim's algorithm operates by building the MST one vertex at a time, starting from an arbitrary root vertex and at each step adding the cheapest connection to another vertex not yet included. Kruskal's algorithm finds the MST by sorting the edges by weight and sequentially adding edges that connect different components without creating cycles.
The document discusses regular expressions and text processing in Python. It covers various components of regular expressions like literals, escape sequences, character classes, and metacharacters. It also discusses different regular expression methods in Python like match, search, split, sub, findall, finditer, compile and groupdict. The document provides examples of using these regular expression methods to search, find, replace and extract patterns from text.
1. A multi-way search tree allows nodes to have up to m children, where keys in each node are ordered and divide the search space.
2. B-trees are a generalization of binary search trees where all leaves are at the same depth and internal nodes have at least m/2 children.
3. Searching and inserting keys in a B-tree starts at the root and proceeds by comparing keys to guide traversal to the appropriate child node. Insertion may require splitting full nodes to balance the tree.
Trie is a data structure based on the concept of the tree where the retrieval of strings from a set of string is a major concern. the average cost of basic operation lies around O(kd).
Trie Data Structure
LINK: https://leetcode.com/tag/trie/
Easy:
1. Longest Word in Dictionary
Medium:
1. Count Substrings That Differ by One Character
2. Replace Words
3. Top K Frequent Words
4. Maximum XOR of Two Numbers in an Array
5. Map Sum Pairs
Hard:
1. Concatenated Words
2. Word Search II
This document discusses substitution and transposition encryption techniques. Substitution techniques replace plaintext characters with other characters, numbers, or symbols, changing character identity but not position. Transposition techniques rearrange the character positions in the plaintext, changing character position but not identity. Examples of substitution techniques include the Caesar cipher, monoalphabetic ciphers, Playfair cipher, and Vigenère cipher. Transposition techniques examples are the rail fence cipher and row transposition cipher.
Scientific Computing with Python - NumPy | WeiYuanWei-Yuan Chang
This document provides an overview of NumPy, the fundamental package for scientific computing in Python. It discusses NumPy's powerful N-dimensional array object and sophisticated broadcasting functions. The document outlines topics including Ndarray, creating and manipulating arrays, array properties, basic operations, matrices, and advanced usages like boolean indexing and masking. NumPy allows efficient storage and manipulation of multi-dimensional data, and integration with languages like C/C++ and Fortran.
This document provides an overview and comparison of insertion sort and shellsort sorting algorithms. It describes how insertion sort works by repeatedly inserting elements into a sorted left portion of the array. Shellsort improves on insertion sort by making passes with larger increments to shift values into approximate positions before final sorting. The document discusses the time complexities of both algorithms and provides examples to illustrate how they work.
Merge sort is a sorting algorithm that works by dividing an array into two halves, recursively sorting the halves, and then merging the sorted halves into a single sorted array. The document provides details on how merge sort works, including pseudocode for the main, merge sort, and merging functions. It analyzes the time complexity of merge sort as O(n log n), making it more efficient than other basic sorts with O(n^2) time complexity like bubble, selection, and insertion sort.
Binary search provides an efficient O(log n) solution for searching a sorted list. It works by repeatedly dividing the search space in half and focusing on only one subdivision, based on comparing the search key to the middle element. This recursively narrows down possible locations until the key is found or the entire list has been searched. Binary search mimics traversing a binary search tree built from the sorted list, with divide-and-conquer reducing the search space at each step.
It consists of important different concepts related to PL/SQL that are covered like the Basic structure of PL/SQL block, Variables and Constants in PL/SQL, Control Structures i.e., Conditional, Iterative, and Sequential Control, Procedure and Function, Cursors and its Types, Applications of implicit and explicit cursors, Triggers and its Types and Exception Handling.
This document discusses different sorting algorithms including bubble sort, insertion sort, and selection sort. It provides details on each algorithm, including time complexity, code examples, and graphical examples. Bubble sort is an O(n2) algorithm that works by repeatedly comparing and swapping adjacent elements. Insertion sort also has O(n2) time complexity but is more efficient than bubble sort for small or partially sorted lists. Selection sort finds the minimum value and swaps it into place at each step.
This document discusses various strategies for optimizing MySQL queries and indexes, including:
- Using the slow query log and EXPLAIN statement to analyze slow queries.
- Avoiding correlated subqueries and issues in older MySQL versions.
- Choosing indexes based on selectivity and covering common queries.
- Identifying and addressing full table scans and duplicate indexes.
- Understanding the different join types and selecting optimal indexes.
Tries are tree data structures that are useful for storing and retrieving data with associative keys like strings, especially variable-length strings. Tries have applications in auto-complete suggestions, longest prefix matching for IP routing, spell checking, phone book contact searching, and predictive text entry systems like T9. The keys are stored based on prefix matching in a trie, allowing fast lookup and retrieval of all keys that start with a given prefix in near-constant time.
Tries are tree data structures that store a dynamic set of strings, supporting efficient search, insertion, and deletion of strings in time proportional to the length of the strings. Standard tries use more space (O(n) where n is total string size) while compressed tries reduce space to O(m) where m is number of strings by merging chains of single-child nodes. Common applications include autocomplete dictionaries and compressing lookup tables for sparse data.
The document discusses binary search trees and their implementation. It begins by defining binary trees and their properties. It then describes how binary search trees work, with the key property that for every node, all keys in its left subtree are smaller than the node's key and all keys in its right subtree are larger. It provides pseudocode for basic binary search tree operations like search, insert, delete, find minimum and maximum. Finally, it introduces AVL trees, which are self-balancing binary search trees that ensure fast lookup by keeping the height of left and right subtrees close.
This document discusses tries, a data structure used to store strings. It begins by outlining the advantages of tries over hash tables, including faster insertion and lookup. It then defines a trie as a tree that stores strings where all descendants of a node have a common prefix. The document describes standard tries and compressed tries, noting compressed tries save space by merging nodes with only one child. It provides examples and discusses time/space complexity. Applications mentioned include word matching, prefix matching, and use in web search engines to index words and associated URLs.
1) Trees are hierarchical data structures that store data in nodes connected by edges. They are useful for representing hierarchical relationships.
2) Binary search trees allow quick search, insertion, and deletion of nodes due to the organizational property that the value of each node is greater than all nodes in its left subtree and less than all nodes in its right subtree.
3) Common tree traversal algorithms include preorder, inorder, and postorder traversals, which visit nodes in different orders depending on whether the left/right children or root is visited first.
The document discusses minimum spanning trees (MST) and two algorithms for finding them: Prim's algorithm and Kruskal's algorithm. Prim's algorithm operates by building the MST one vertex at a time, starting from an arbitrary root vertex and at each step adding the cheapest connection to another vertex not yet included. Kruskal's algorithm finds the MST by sorting the edges by weight and sequentially adding edges that connect different components without creating cycles.
The document discusses regular expressions and text processing in Python. It covers various components of regular expressions like literals, escape sequences, character classes, and metacharacters. It also discusses different regular expression methods in Python like match, search, split, sub, findall, finditer, compile and groupdict. The document provides examples of using these regular expression methods to search, find, replace and extract patterns from text.
1. A multi-way search tree allows nodes to have up to m children, where keys in each node are ordered and divide the search space.
2. B-trees are a generalization of binary search trees where all leaves are at the same depth and internal nodes have at least m/2 children.
3. Searching and inserting keys in a B-tree starts at the root and proceeds by comparing keys to guide traversal to the appropriate child node. Insertion may require splitting full nodes to balance the tree.
Trie is a data structure based on the concept of the tree where the retrieval of strings from a set of string is a major concern. the average cost of basic operation lies around O(kd).
Trie Data Structure
LINK: https://leetcode.com/tag/trie/
Easy:
1. Longest Word in Dictionary
Medium:
1. Count Substrings That Differ by One Character
2. Replace Words
3. Top K Frequent Words
4. Maximum XOR of Two Numbers in an Array
5. Map Sum Pairs
Hard:
1. Concatenated Words
2. Word Search II
This document discusses substitution and transposition encryption techniques. Substitution techniques replace plaintext characters with other characters, numbers, or symbols, changing character identity but not position. Transposition techniques rearrange the character positions in the plaintext, changing character position but not identity. Examples of substitution techniques include the Caesar cipher, monoalphabetic ciphers, Playfair cipher, and Vigenère cipher. Transposition techniques examples are the rail fence cipher and row transposition cipher.
Scientific Computing with Python - NumPy | WeiYuanWei-Yuan Chang
This document provides an overview of NumPy, the fundamental package for scientific computing in Python. It discusses NumPy's powerful N-dimensional array object and sophisticated broadcasting functions. The document outlines topics including Ndarray, creating and manipulating arrays, array properties, basic operations, matrices, and advanced usages like boolean indexing and masking. NumPy allows efficient storage and manipulation of multi-dimensional data, and integration with languages like C/C++ and Fortran.
This document provides an overview and comparison of insertion sort and shellsort sorting algorithms. It describes how insertion sort works by repeatedly inserting elements into a sorted left portion of the array. Shellsort improves on insertion sort by making passes with larger increments to shift values into approximate positions before final sorting. The document discusses the time complexities of both algorithms and provides examples to illustrate how they work.
Merge sort is a sorting algorithm that works by dividing an array into two halves, recursively sorting the halves, and then merging the sorted halves into a single sorted array. The document provides details on how merge sort works, including pseudocode for the main, merge sort, and merging functions. It analyzes the time complexity of merge sort as O(n log n), making it more efficient than other basic sorts with O(n^2) time complexity like bubble, selection, and insertion sort.
Binary search provides an efficient O(log n) solution for searching a sorted list. It works by repeatedly dividing the search space in half and focusing on only one subdivision, based on comparing the search key to the middle element. This recursively narrows down possible locations until the key is found or the entire list has been searched. Binary search mimics traversing a binary search tree built from the sorted list, with divide-and-conquer reducing the search space at each step.
It consists of important different concepts related to PL/SQL that are covered like the Basic structure of PL/SQL block, Variables and Constants in PL/SQL, Control Structures i.e., Conditional, Iterative, and Sequential Control, Procedure and Function, Cursors and its Types, Applications of implicit and explicit cursors, Triggers and its Types and Exception Handling.
This document discusses different sorting algorithms including bubble sort, insertion sort, and selection sort. It provides details on each algorithm, including time complexity, code examples, and graphical examples. Bubble sort is an O(n2) algorithm that works by repeatedly comparing and swapping adjacent elements. Insertion sort also has O(n2) time complexity but is more efficient than bubble sort for small or partially sorted lists. Selection sort finds the minimum value and swaps it into place at each step.
This document discusses various strategies for optimizing MySQL queries and indexes, including:
- Using the slow query log and EXPLAIN statement to analyze slow queries.
- Avoiding correlated subqueries and issues in older MySQL versions.
- Choosing indexes based on selectivity and covering common queries.
- Identifying and addressing full table scans and duplicate indexes.
- Understanding the different join types and selecting optimal indexes.
Tries are tree data structures that are useful for storing and retrieving data with associative keys like strings, especially variable-length strings. Tries have applications in auto-complete suggestions, longest prefix matching for IP routing, spell checking, phone book contact searching, and predictive text entry systems like T9. The keys are stored based on prefix matching in a trie, allowing fast lookup and retrieval of all keys that start with a given prefix in near-constant time.
Tries are tree data structures that store a dynamic set of strings, supporting efficient search, insertion, and deletion of strings in time proportional to the length of the strings. Standard tries use more space (O(n) where n is total string size) while compressed tries reduce space to O(m) where m is number of strings by merging chains of single-child nodes. Common applications include autocomplete dictionaries and compressing lookup tables for sparse data.
Tries are a data structure for storing strings that allow for fast pattern matching. A trie is a tree where each edge represents a character and each path from the root node to a leaf spells out a key. Standard tries insert strings by adding nodes for each character. Compressed tries reduce redundant nodes by compressing chains. Suffix tries store all suffixes of a text in a compressed trie to enable quick string queries. Tries support faster insertion and lookup compared to hash tables, with no collisions between keys.
A trie, or prefix tree, is an ordered tree data structure used to store associative arrays where keys are usually strings. It shows a trie with words like "tree", "trie", "algo", and "assoc". Functions of a trie include counting prefixes and matching words exactly. It is used to check consistency of phone numbers by ensuring no number is a prefix of another in the data set.
This presentation defines online algorithms, and discusses how we can analyze them using competitive analysis. Using ski rental and ice cream machine as example problems, it covers the applications of online algorithms in load balancing and other verticals.
The document discusses hashing and its components. [1] It describes how hashing works by mapping keys to locations in a hash table using a hash function. [2] Common collision resolution techniques are chaining, linear probing, quadratic probing, and double hashing. [3] Hashing provides fast average-case performance of O(1) by storing data in blocks based on the hash value and load factor.
The document discusses various priority queue data structures like binary heaps, binomial heaps, and Fibonacci heaps. It begins with an overview of binary heaps and their implementation using arrays. It then covers operations like insertion and removal on heaps. Next, it describes binomial heaps and their properties and operations like union and deletion. Finally, it discusses Fibonacci heaps and how they allow decreasing a key in amortized constant time, improving algorithms like Dijkstra's.
The document discusses different types of tries data structures and their applications. It describes standard tries, compressed tries, and suffix tries. Standard tries support operations like finding, inserting, and removing strings in time proportional to the string length and alphabet size. Compressed tries reduce space by compressing chains of redundant nodes. Suffix tries store all suffixes of a text in linear space and support fast pattern matching queries in time proportional to the pattern length plus the number of matches. The document provides examples of using tries for text processing, web search indexing, internet routing, and other applications.
Splay Trees and Self Organizing Data StructuresAmrinder Arora
This document discusses splay trees, a self-organizing binary search tree data structure. Splay trees perform rotations to move recently accessed elements closer to the root of the tree after operations like insertion, deletion and search. While splay trees do not guarantee an upper bound on tree height like AVL or red-black trees, the amortized time of all operations is O(log n). Splaying restructures the tree in a way that frequently accessed elements are faster to access in the future.
This document presents algorithmic puzzles and their solutions. It discusses puzzles involving counterfeit coins, uneven water pitchers, strong eggs on tiny floors, and people arranged in a circle. For each puzzle, it provides the problem description, an analysis or solution approach, and sometimes additional discussion. The document is a presentation on algorithmic puzzles given by Amrinder Arora, including their contact information.
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
Euclid's algorithm for finding greatest common divisor is an elegant algorithm that can be written iteratively as well as recursively. The time complexity of this algorithm is O(log^2 n) where n is the larger of the two inputs.
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Amrinder Arora
Chan's Algorithm for Convex Hull Problem. Output Sensitive Algorithm. Takes O(n log h) time. Presentation for the final project in CS 6212/Spring/Arora.
1) Physical files exist on storage while logical files are how programs view files without knowing the actual physical file.
2) Opening files creates a new file or accesses an existing one, while closing files frees up the file descriptor for another file and ensures all output is written.
3) Core file processing operations include reading, writing, and seeking within a file.
This document defines and describes trees and graphs as non-linear data structures. It explains that a tree is similar to a linked list but allows nodes to have multiple children rather than just one. The document defines key tree terms like height, ancestors, size, and different types of binary trees including strict, full, and complete. It provides properties of binary trees such as the number of nodes in full and complete binary trees based on height.
KMP - Social Media Marketing Seminar Blogging For BusinessMicrosoft
The document discusses blogging as a business tool and social media marketing. It provides tips on how companies can use blogging to engage customers, demonstrate thought leadership, manage crises, and more. It also discusses how to measure conversations and find influential stakeholders online.
This document discusses bitvector operations and how they are used in the Marisa trie data structure. It begins by explaining bitvector get, rank, and select operations. It then provides an overview of Marisa tries and how they use bitvectors for LOUDS, terminal flags, and link flags. The document discusses different implementations of rank and select operations using dictionaries and binary search. It proposes improvements like removing branches from the final round of select using SSE2 instructions and tricks after comparison. The goal is to make select operations faster by removing conditional branches.
This document discusses storage in database systems, including disks, buffer management, files of records, and access methods. It describes how data is stored on disks in pages and the implications this has for database design, such as the need to carefully plan read and write operations. The buffer manager caches pages in main memory to reduce disk I/O. Files are the abstraction used to organize records across multiple pages on disk. Records can be stored in unordered heap files or other structures like sorted files and indexes to enable different access patterns.
Poptrie: A Compressed Trie with Population Count for Fast and Scalable Softwa...Hirochika Asai
Este documento describe Poptrie, una estructura de datos trie comprimido con conteo de población para búsqueda rápida y escalable de tablas de rutas IP en software. Poptrie reduce el tamaño de memoria y mejora el rendimiento mediante compresión de punteros con conteo de población y agregación de rutas. Evaluaciones exhaustivas muestran que Poptrie es 4-578% más rápido que otras tecnologías de vanguardia para tablas de rutas IPv4 y IPv6 públicas y privadas, incluidas tablas futuras
This document discusses approaches for analyzing microarray gene expression data with few observations. It suggests assuming a canonical distribution based on prior knowledge, like the log-normal distribution often seen in this type of data. Parameters of this distribution can then be estimated from the available observations using techniques like estimating the mean and variance. A t-statistic can be used to determine if two gene conditions have the same distribution parameters and thus identify differentially expressed genes. Quality control plots of the data are recommended to validate distribution assumptions.
This document discusses suffix trees, including their definition, important contributors, construction, implementation, and applications. Key points:
- A suffix tree is a compressed trie representing all suffixes of a string. It allows efficient pattern matching in linear time.
- Important contributors include Weiner (1973), McCreight (1976), Ukkonen (1995), and Farach (1997) who developed faster construction algorithms.
- There are multiple ways to implement a suffix tree, including using sibling lists, hash maps, balanced search trees, or sorted arrays. External memory may be needed for very large trees.
- Applications include fast substring search, longest common substring problems, and data compression. References provide more details on
A trie is a tree-like data structure used to store strings that allows for efficient retrieval of strings based on prefixes. It works by splitting strings into individual characters and storing them as nodes in a tree. Each node represents a unique character and branches to other nodes until the full string is represented by a path from the root node. Tries allow for fast insertion, searching, and retrieval of strings in O(m) time where m is the length of the string. They are useful for applications involving large dictionaries of strings like auto-complete features, IP routing tables, and storing browser histories.
R-Trees are an excellent data structure for managing geo-spatial data. Commonly used by mapping applications and any other applications that use the location to customize content. Minimum Bounding Rectangle (MBR) is a commonly used concept in R-trees, which are a modified form of B-trees.
The document discusses Cassandra's data model and how it replaces HDFS services. It describes:
1) Two column families - "inode" and "sblocks" - that replace the HDFS NameNode and DataNode services respectively, with "inode" storing metadata and "sblocks" storing file blocks.
2) CFS reads involve reading the "inode" info to find the block and subblock, then directly accessing the data from the Cassandra SSTable file on the node where it is stored.
3) Keyspaces are containers for column families in Cassandra, and the NetworkTopologyStrategy places replicas across data centers to enable local reads and survive failures.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
This document provides an introduction and overview of Cassandra and NoSQL databases. It discusses the challenges faced by modern web applications that led to the development of NoSQL databases. It then describes Cassandra's data model, API, consistency model, and architecture including write path, read path, compactions, and more. Key features of Cassandra like tunable consistency levels and high availability are also highlighted.
The document discusses tries, which are compact data structures for storing and searching sets of strings. It covers standard tries, compressed tries, suffix tries, and encoding tries like Huffman coding tries. Standard tries support searches, insertions and deletions in time proportional to the string size and alphabet size. Compressed tries reduce the space usage by compressing chains of redundant nodes. Suffix tries represent the suffixes of a string to enable fast pattern matching queries. Encoding tries like Huffman tries map characters to binary codewords to compress strings.
Encryption and decryption are both methods used to ensure the secure passing of messages and other sensitive documents and information. The encryption process plays a major factor in our technology advanced lives. Encryption basically means to convert the message into code or scrambled form. Advanced Encryption Standard (AES) is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data. This paper defines the method to enhance the block and key length of the conventional AES.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
This document discusses various indexing techniques used to improve the performance of data retrieval from large databases. It begins by explaining the need for indexing to enable fast searching of large amounts of data. Then it describes several conventional indexing techniques including dense indexing, sparse indexing, and B-tree indexing. It also covers special indexing structures like inverted indexes, bitmap indexes, cluster indexes, and join indexes. The goal of indexing is to reduce the number of disk accesses needed to find relevant records by creating data structures that map attribute values to locations in storage.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
AIN102S Access string function sample queriesDan D'Urso
This document provides an overview and examples of string functions in Microsoft Access queries. It discusses functions like Left, Right, Mid, Len, Trim, Replace, Asc, Chr and more. Examples are provided to demonstrate how to use each function in an Access query and return the results. The document is a training material for a class on Access query design and string functions.
This document provides an overview of query processing costs, selection operations, join operations, and concurrency control in database systems. It discusses how the costs of queries are estimated based on factors like disk accesses and seeks. It then describes algorithms for common operations like selection, join, and concurrency control protocols. Selection algorithms include file scan, binary search, and using indexes. Join algorithms include nested loops, block nested loops, indexed nested loops, merge join, and hash join. Concurrency control protocols help manage concurrent transaction executions and maintain consistency.
Table of contents
Download R script from my Google drive:
20190828_R-user-group_string-manipulation.R
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary
This document summarizes research on optimizing the LZ77 data compression algorithm. The researchers present a method to improve LZ77 compression by using variable triplet sizes instead of fixed sizes. Specifically, when the offset is equal to the match length, they represent it with a "doublet" of just length and next symbol, saving bits compared to the standard triplet representation. They show this optimization achieves higher compression ratios than the conventional LZ77 algorithm in experiments. The decoding process is made slightly more complex to identify doublets versus triplets but still allows decompressing the encoded data back to the original.
Data compression refers to reducing the amount of space needed to store data or reducing the
amount of time needed to transmit data. Many data compression techniques allow encoding the
compressed form of data with different compression ratio. In particular, in the case of LZ77
technique, it reduces the data concurrency of an input file. In the output of this technique it conveys
more information that is actually not needed in practical. Removing the extra information from the
encoded file that makes this algorithm more optimal. Our task is to identify how much extra
information it conveys and how can we minimize it so that there is no trouble at the time of
decoding. Basically the encoded output of LZ77 is the sequence of triplets (a structure of encoded
output) that is in binary and having fix size. For making the triplets of fix size, sometimes we are
creating unnecessary information. We present the method of variable triplet size as a way to improve
LZ77 compression and demonstrate it through many experiments. In our optimization algorithm we
are getting more compression ratio compare to the conventional LZ77 data compression algorithm.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
The document summarizes a research presentation on distributed storage and processing. It discusses two papers: 1) PABIRS, a data access middleware for distributed file systems that efficiently processes mixed workloads of queries. It proposes an integrated data access middleware to address this. 2) Scalable distributed transactions across heterogeneous stores.
It then provides details on PABIRS, which uses a hybrid index with a bitmap index and LSM (log-structured merge) tree index. The bitmap index is used for low selectivity keys, while the LSM index is built for "hot" values with selectivity above a threshold. The system aims to efficiently support data retrieval and insertion for various query workloads on distributed file systems.
Similar to Tries - Tree Based Structures for Strings (20)
This document discusses algorithms for NP-complete problems. It introduces the maximum independent set problem and shows that while it is NP-complete for general graphs, it can be solved efficiently for trees using a recursive formulation. It also discusses the traveling salesperson problem and presents a dynamic programming algorithm that provides a better running time than brute force. Finally, it discusses approximation algorithms for the TSP and shows a 2-approximation algorithm that finds a tour with cost at most twice the optimal using minimum spanning trees.
Graph Traversal Algorithms - Breadth First SearchAmrinder Arora
The document discusses branch and bound algorithms. It begins with an overview of breadth first search (BFS) and how it can be used to solve problems on infinite mazes or graphs. It then provides pseudocode for implementing BFS using a queue data structure. Finally, it discusses branch and bound as a general technique for solving optimization problems that applies when greedy methods and dynamic programming fail. Branch and bound performs a BFS-like search, but prunes parts of the search tree using lower and upper bounds to avoid exploring all possible solutions.
Graph Traversal Algorithms - Depth First Search TraversalAmrinder Arora
This document discusses graph traversal techniques, specifically depth-first search (DFS) and breadth-first search (BFS). It provides pseudocode for DFS and explains key properties like edge classification, time complexity of O(V+E), and applications such as finding connected components and articulation points.
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana. Presentation for CS 6212 final project in GWU during Fall 2015 (Prof. Arora's class)
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Amrinder Arora
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Alshami and Dong Wang. Final Presentation for P4, in CS 6212, Fall 2015 taught by Prof. Arora.
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Amrinder Arora
The document discusses the union find algorithm and its time complexity. It defines the union find problem and three operations: MAKE-SET, FIND, and UNION. It describes optimizations like union by rank and path compression that achieve near-linear time complexity of O(m log* n) for m operations on n elements. It proves several lemmas about ranks and buckets to establish this time complexity through an analysis of the costs of find operations.
How multiple experts can be leveraged in a machine learning application without knowing apriori who are "good" experts and who are "bad" experts. See how we can quantify the bounds on the overall results.
NP completeness. Classes P and NP are two frequently studied classes of problems in computer science. Class P is the set of all problems that can be solved by a deterministic Turing machine in polynomial time.
Dynamic Programming design technique is one of the fundamental algorithm design techniques, and possibly one of the ones that are hardest to master for those who did not study it formally. In these slides (which are continuation of part 1 slides), we cover two problems: maximum value contiguous subarray, and maximum increasing subsequence.
This document discusses dynamic programming techniques. It covers matrix chain multiplication and all pairs shortest paths problems. Dynamic programming involves breaking down problems into overlapping subproblems and storing the results of already solved subproblems to avoid recomputing them. It has four main steps - defining a mathematical notation for subproblems, proving optimal substructure, deriving a recurrence relation, and developing an algorithm using the relation.
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsAmrinder Arora
This document discusses divide and conquer algorithms. It covers the closest pair of points problem, which can be solved in O(n log n) time using a divide and conquer approach. It also discusses selection algorithms like quickselect that can find the median or kth element of an unsorted array in linear time O(n) on average. The document provides pseudocode for these algorithms and analyzes their time complexity using recurrence relations. It also provides an overview of topics like mergesort, quicksort, and solving recurrence relations that were covered in previous lectures.
Divide and Conquer Algorithms - D&C forms a distinct algorithm design technique in computer science, wherein a problem is solved by repeatedly invoking the algorithm on smaller occurrences of the same problem. Binary search, merge sort, Euclid's algorithm can all be formulated as examples of divide and conquer algorithms. Strassen's algorithm and Nearest Neighbor algorithm are two other examples.
This is the second lecture in the CS 6212 class. Covers asymptotic notation and data structures. Also outlines the coming lectures wherein we will study the various algorithm design techniques.
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
Asymptotic Notation is a notation used to represent and compare the efficiency of algorithms. It is a concise notation that deliberately omits details, such as constant time improvements, etc. Asymptotic notation consists of 5 commonly used symbols: big oh, small oh, big omega, small omega, and theta.
Set Operations - Union Find and Bloom FiltersAmrinder Arora
Set Operations - make set, union, find and contains are standard operations that appear in many scenarios. Union Find is a marvelous data structure to solve problems involving union and find operations.
Different use arises when we merely want to answer queries on whether a set contains an element x without keeping the entire set in the memory. Bloom Filters play an interesting role there.
The document describes a course on advanced data structures. It provides information on the instructor, teaching assistant, topics to be covered including AVL and Red Black Trees, objectives of learning deletions, insertions and searches in logarithmic time. It also lists credits to other professors and researchers. The document then goes into details about balanced binary search trees, describing properties of AVL Trees and Red Black Trees to ensure the tree remains balanced during operations.
Stacks, Queues, Binary Search Trees - Lecture 1 - Advanced Data StructuresAmrinder Arora
This document introduces the course CS 6213 - Advanced Data Structures. It discusses what data structures are, how they are designed to efficiently support specific operations, and provides examples. Common data structures like stacks, queues, linked lists, trees, and graphs are introduced along with their basic operations and implementations. Real-world applications of these data structures are also mentioned.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
3. Michael T. Goodrich and Roberto Tamassia
Data Structures and Algorithms in Java (4th edition)
John Wiley & Sons, Inc.
ISBN: 0-471-73884-0
Haim Kaplan, Tel Aviv University
Jörg Liebeherr, University of Toronto
L6 - Tries CS 6213 - Advanced Data Structures - Arora 3
CREDITS
4. Naïve, brute force for searching a text of size n and a
pattern of size m requires O(nm) time.
Preprocessing the pattern speeds up pattern
matching queries. E.g., KMP algorithm performs
pattern matching in time proportional to the text
size: O(n)
If the text is large, immutable and searched often
(e.g., Shakespeare), we may want to preprocess the
text itself. Want to perform the searching in O(m)
time.
L6 - Tries CS 6213 - Advanced Data Structures - Arora 4
MOTIVATION
5. A trie is a compact data structure for representing a
set of strings, such as all the words in a text. A trie
supports pattern matching queries in time
proportional to the pattern size: O(m)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 5
MOTIVATION (CONT.)
7. The standard trie for a set of strings S is an ordered tree
such that:
Each node but the root is labeled with a character
The children of a node are alphabetically ordered
The paths from the root to the leaves yield the strings of S
Example: set of strings S = { bear, bell, bid, bull, buy, sell, stock,
stop }
L6 - Tries CS 6213 - Advanced Data Structures - Arora 7
STANDARD TRIES
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
8. A standard trie uses O(n) space and supports
searches, insertions and deletions in time
O(dm), where:
n total size of the strings in S
m size of the string parameter of the operation
d size of the alphabet
L6 - Tries CS 6213 - Advanced Data Structures - Arora 8
ANALYSIS OF STANDARD TRIES
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
9. We insert
the words of
the text into
a trie
Each leaf
stores the
occurrences
of the
associated
word in the
text
L6 - Tries CS 6213 - Advanced Data Structures - Arora 9
WORD MATCHING WITH A TRIE
s e e b e a r ? s e l l s t o c k !
s e e b u l l ? b u y s t o c k !
b i d s t o c k !
a
a
h e t h e b e l l ? s t o p !
b i d s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
a r
87 88
a
e
b
l
s
u
l
e t
e
0, 24
o
c
i
l
r
6
l
78
d
47, 58
l
30
y
36
l
12
k
17, 40,
51, 62
p
84
h
e
r
69
a
10. A compressed trie has
internal nodes of
degree at least two
It is obtained from
standard trie by
compressing chains of
“redundant” nodes
L6 - Tries CS 6213 - Advanced Data Structures - Arora 10
COMPRESSED TRIES
e
b
ar ll
s
u
ll y
ell to
ck p
id
a
e
b
r
l
l
s
u
l
l
y
e t
l
l
o
c
k
p
i
d
11. Compact representation of a compressed trie for an array of
strings:
Stores at the nodes ranges of indices instead of substrings
Uses O(s) space, where s is the number of strings in the array
Serves as an auxiliary index structure
L6 - Tries CS 6213 - Advanced Data Structures - Arora 11
COMPACT REPRESENTATION
s e e
b e a r
s e l l
s t o c k
b u l l
b u y
b i d
h e
b e l l
s t o p
0 1 2 3 4
a rS[0] =
S[1] =
S[2] =
S[3] =
S[4] =
S[5] =
S[6] =
S[7] =
S[8] =
S[9] =
0 1 2 3 0 1 2 3
1, 1, 1
1, 0, 0 0, 0, 0
4, 1, 1
0, 2, 2
3, 1, 2
1, 2, 3 8, 2, 3
6, 1, 2
4, 2, 3 5, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
7, 0, 3
0, 1, 1
12. Begins with: where name like ‘x%’
Ends with: where name like ‘%x’
Substring: where name like ‘%x%’
L6 - Tries CS 6213 - Advanced Data Structures - Arora 12
STRING SEARCHES
13. The suffix trie of a string X is the compressed trie of all
the suffixes of X
L6 - Tries CS 6213 - Advanced Data Structures - Arora 13
SUFFIX TRIE
e nimize
nimize ze
zei mi
mize nimize ze
m i n i z em i
0 1 2 3 4 5 6 7
14. Compact representation of the suffix trie for a
string X of size n from an alphabet of size d
Uses O(n) space
Supports arbitrary pattern matching queries in X in O(dm)
time, where m is the size of the pattern
Can be constructed in O(n) time
L6 - Tries CS 6213 - Advanced Data Structures - Arora 14
ANALYSIS OF SUFFIX TRIES
7, 7 2, 7
2, 7 6, 7
6, 7
4, 7 2, 7 6, 7
1, 1 0, 1
m i n i z em i
0 1 2 3 4 5 6 7
15. Auto complete: User types “Rob” and you can type
with all words that begin with Rob, or all contacts
that begin with Rob, etc.
Sequence Assembly in Genetics Sequences
Sorting of Large Sets of Strings: BurstSort
Big Data: See “TeraSort.java” source code
L6 - Tries CS 6213 - Advanced Data Structures - Arora 15
APPLICATIONS OF TRIES
16. L6 - Tries CS 6213 - Advanced Data Structures - Arora 16
SAMPLE APPLICATION – IP ROUTING
Packets of Fun
18. A standardized exterior gateway protocol designed to
exchange routing and reachability information
between autonomous systems (AS) on the Internet.
Makes routing decisions based on paths, network
policies and/or rule-sets configured by a network
administrator.
Plays a key role in the overall operation of the
Internet and is involved in making core routing
decisions.
[Itself uses TCP to exchange its own data.]
L6 - Tries CS 6213 - Advanced Data Structures - Arora 18
BORDER GATEWAY PROTOCOL (BGP)
20. Destination address Next hop
10.0.0.0/8 R1
128.143.0.0/16 R2
128.143.64.0/20
R3
128.143.192.0/20 R3
128.143.71.0/24 R4
128.143.71.55/32 R3
Default R5
With CIDR, there can be multiple
matches for a destination address in the
routing table
Longest Prefix Match: Search for the
routing table entry that has the longest
match with the prefix of the destination
IP address (Most Specific Router):
1. Search for a match on all 32 bits
2. Search for a match for 31 bits
…..
32. Search for a match on 0 bits
Needed: Data structure that supports a FAST
longest prefix match lookup!
L6 - Tries CS 6213 - Advanced Data Structures - Arora 20
ROUTING TABLE LOOKUP: LONGEST
PREFIX MATCH
128.143.71.21
The longest prefix match for
128.143.71.21 is with
128.143.71.0/24
Datagram will be sent to R4
21. The following algorithms are suitable for Longest
Prefix Match routing table lookups
Tries
Path-Compressed Tries
Disjoint-prefix binary Tries
Multibit Tries
Binary Search on Prefix
Prefix Range Search
L6 - Tries CS 6213 - Advanced Data Structures - Arora 21
IP ADDRESS LOOKUP ALGORITHMS
22. t p
te to po
t p
e o
ten tea
n a
top
o
pot
o
t
A trie is a tree-based
data structure for
storing strings:
There is one node for every
common prefix
The strings are stored in
extra leaf nodes
Prefixes are not only stored
at leaf nodes but also at
internal nodes
L6 - Tries CS 6213 - Advanced Data Structures - Arora 22
SLIGHTLY DIFFERENT VERSION OF TRIE
23. Structure
Each leaf contains a
possible address
Prefixes in the table are
marked (dark)
Search
Traverse the tree
according to destination
address
Most recent marked node
is the current longest
prefix
Search ends when a leaf
node is reached
L6 - Tries CS 6213 - Advanced Data Structures - Arora 23
BINARY TRIE
24. Update
Search for the
new entry
Search ends
when a leaf node
is reached
If there is no
branch to take,
insert new
node(s)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 24
BINARY TRIE
z 1010*
1
z
0
25. Path Compression:
Requires to store additional information with nodes Bit number
field is added to node
Bit string of prefixes must be explicitly stored at nodes
Need to make comparison when searching the tree
Goal: Eliminate long
sequences of 1-child
nodes
Path compression
collapses 1-child
branches
L6 - Tries CS 6213 - Advanced Data Structures - Arora 25
COMPRESSED BINARY TRIE
d
26. Search: “010110”
Root node: Inspect 1st bit and move left
“a” node:
Check with prefix of a (“0*”) and find a match
Inspect 3rd bit and move left
“b” node:
Check with prefix of b (“01000*”) and determine that there is no match
Search stops. Longest prefix match is with a
L6 - Tries CS 6213 - Advanced Data Structures - Arora 26
COMPRESSED BINARY TRIE
d
27. Disjoint prefix:
Nodes are split so that there is only one match for each prefix (“Leaf pushing”)
Consequence: Internal nodes do not match with prefixes
Results:
a (0*) is split into: a1 (00*), a3 (010*), a2 (01001*)
d (1*) is represented as d1 (101*)
Multiple matches in
longest prefix rule
require backtracking
of search
Goal: Transform tree
as to avoid multiple
matches
L6 - Tries CS 6213 - Advanced Data Structures - Arora 27
DISJOINT-PREFIX BINARY TRIE
28. 2-bit stride:
1-bit prefix for a (0*) is split into 00* and 01*
1-bit prefix for d (1*) is split into 10* and 11*
3-bit prefix for c has been expanded to two nodes
Why are the prefixes for b and e not expanded?
Goal: Accelerate lookup
by inspecting more than
one bit at a time
“Stride”: number of bits
inspected at one time
With k-bit stride, node
has up to 2k child nodes
L6 - Tries CS 6213 - Advanced Data Structures - Arora 28
VARIABLE-STRIDE MULTIBIT TRIE
29. Scheme Lookup Update Memory
Binary trie O(W) O(W) O(NW)
Path-compressed trie O(W) O(W) O(NW)
k-stride multibit trie O(W/k) O(W/k+2k) O(2kNW/k)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 29
COMPLEXITY OF THE LOOKUP
Bounds are expressed for
Look-up time: What is the longest lookup time?
Update time: How long does it take to change an entry?
Memory: How much memory is required to store the data structure?
W: length of the address (32 bits)
N: number of prefix in the routing table
30. Excellent data structure for managing Strings
Supports prefix and suffix kind of lookups
Extremely fast – After the Trie has been built, the
search time is O(m) where m is the size of the
pattern.
Can be used to build indexes
Various applications in areas that use Strings
(Literature/Dictionary/Content, as well as Networks
and Bioinformatics)
L6 - Tries CS 6213 - Advanced Data Structures - Arora 30
CONCLUSIONS: TRIES