BTrees - designed by Rudolf Bayer and Ed McCreight - fundamental data structure in computer science. Great alternative to BSTs. Very appropriate for disk based access.
Trie (aka radix tree or prefix tree), is an ordered tree data structure where the keys are usually strings. Tries have tremendous applications from all sorts of things like dictionary to
Analyzing Blockchain Transactions in Apache Spark with Jiri KremserDatabricks
Blockchain has become a buzzword: people are excited about distributed ledgers and cryptocurrencies, but these technologies are shrouded in myths, and misunderstanding. This talk will shed some light into how this awesome technology is actually used in practice by using Apache Spark to analyze blockchain transactions.
We’ll start with a brief introduction to blockchain transactions and how we can ETL transaction graph data obtained from the public binary format. Then we will look at how to model graph data in Spark, briefly comparing GraphFrames and GraphX. The majority of the presentation will be a live demo, running on Spark in the cloud, showing how we can run various queries on the transaction graph data, solve graph algorithms such as PageRank for identifying significant BTC addresses, observe network evolution, and more.
All of the work described in this talk is published as open source code and all of the data are available in public and available for community experimentation as well as all the containers. You will leave this talk with a better understanding of blockchain technology and graph processing in Spark and you will have the concrete tools to reproduce my research or start answering your own questions.
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
The document discusses a method for pruning convolutional neural networks to make them more efficient for resource-constrained inference. The method uses a Taylor expansion to calculate the saliency of parameters, allowing it to prune those with the least effect on the network's loss. Experiments on networks like VGG-16 and AlexNet show the method can significantly reduce operations with little loss in accuracy. Layer-wise analysis provides insight into each layer's importance to the overall network.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
The document discusses how the PostgreSQL query planner works. It explains that a query goes through several stages including parsing, rewriting, planning/optimizing, and execution. The optimizer or planner has to estimate things like the number of rows and cost to determine the most efficient query plan. Statistics collected by ANALYZE are used for these estimates but can sometimes be inaccurate, especially for n_distinct values. Increasing the default_statistics_target or overriding statistics on columns can help address underestimation issues. The document also discusses different plan types like joins, scans, and aggregates that the planner may choose between.
The document discusses heap sort, which is a sorting algorithm that uses a heap data structure. It works in two phases: first, it transforms the input array into a max heap using the insert heap procedure; second, it repeatedly extracts the maximum element from the heap and places it at the end of the sorted array, reheapifying the remaining elements. The key steps are building the heap, processing the heap by removing the root element and allowing the heap to reorder, and doing this repeatedly until the array is fully sorted.
These slides are part of a course about interactive objects in games. The lectures cover some of the most widely used methodologies that allow smart objects and non-player characters (NPCs) to exhibit autonomy and flexible behavior through various forms of decision making, including techniques for pathfinding, reactive behavior through automata and processes, and goal-oriented action planning. More information can be found here: http://tinyurl.com/sv-intobj-2013
Trees. Defining, Creating and Traversing Trees. Traversing the File System
Binary Search Trees. Balanced Trees
Graphs and Graphs Traversal Algorithms
Exercises: Working with Trees and Graphs
Trie (aka radix tree or prefix tree), is an ordered tree data structure where the keys are usually strings. Tries have tremendous applications from all sorts of things like dictionary to
Analyzing Blockchain Transactions in Apache Spark with Jiri KremserDatabricks
Blockchain has become a buzzword: people are excited about distributed ledgers and cryptocurrencies, but these technologies are shrouded in myths, and misunderstanding. This talk will shed some light into how this awesome technology is actually used in practice by using Apache Spark to analyze blockchain transactions.
We’ll start with a brief introduction to blockchain transactions and how we can ETL transaction graph data obtained from the public binary format. Then we will look at how to model graph data in Spark, briefly comparing GraphFrames and GraphX. The majority of the presentation will be a live demo, running on Spark in the cloud, showing how we can run various queries on the transaction graph data, solve graph algorithms such as PageRank for identifying significant BTC addresses, observe network evolution, and more.
All of the work described in this talk is published as open source code and all of the data are available in public and available for community experimentation as well as all the containers. You will leave this talk with a better understanding of blockchain technology and graph processing in Spark and you will have the concrete tools to reproduce my research or start answering your own questions.
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
The document discusses a method for pruning convolutional neural networks to make them more efficient for resource-constrained inference. The method uses a Taylor expansion to calculate the saliency of parameters, allowing it to prune those with the least effect on the network's loss. Experiments on networks like VGG-16 and AlexNet show the method can significantly reduce operations with little loss in accuracy. Layer-wise analysis provides insight into each layer's importance to the overall network.
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
The document discusses how the PostgreSQL query planner works. It explains that a query goes through several stages including parsing, rewriting, planning/optimizing, and execution. The optimizer or planner has to estimate things like the number of rows and cost to determine the most efficient query plan. Statistics collected by ANALYZE are used for these estimates but can sometimes be inaccurate, especially for n_distinct values. Increasing the default_statistics_target or overriding statistics on columns can help address underestimation issues. The document also discusses different plan types like joins, scans, and aggregates that the planner may choose between.
The document discusses heap sort, which is a sorting algorithm that uses a heap data structure. It works in two phases: first, it transforms the input array into a max heap using the insert heap procedure; second, it repeatedly extracts the maximum element from the heap and places it at the end of the sorted array, reheapifying the remaining elements. The key steps are building the heap, processing the heap by removing the root element and allowing the heap to reorder, and doing this repeatedly until the array is fully sorted.
These slides are part of a course about interactive objects in games. The lectures cover some of the most widely used methodologies that allow smart objects and non-player characters (NPCs) to exhibit autonomy and flexible behavior through various forms of decision making, including techniques for pathfinding, reactive behavior through automata and processes, and goal-oriented action planning. More information can be found here: http://tinyurl.com/sv-intobj-2013
Trees. Defining, Creating and Traversing Trees. Traversing the File System
Binary Search Trees. Balanced Trees
Graphs and Graphs Traversal Algorithms
Exercises: Working with Trees and Graphs
The document discusses binary tree traversal methods. It defines key binary tree terminology like nodes, edges, root, and provides examples of different types of binary trees like strictly binary, complete, and almost complete binary trees. It also explains the three common traversal techniques for binary search trees - in-order, pre-order and post-order traversals - and provides pseudocode algorithms and examples for each traversal method.
Binary search trees (BSTs) are data structures that allow for efficient searching, insertion, and deletion. Nodes in a BST are organized so that all left descendants of a node are less than the node's value and all right descendants are greater. This property allows values to be found, inserted, or deleted in O(log n) time on average. Searching involves recursively checking if the target value is less than or greater than the current node's value. Insertion follows the search process and adds the new node in the appropriate place. Deletion handles three cases: removing a leaf, node with one child, or node with two children.
Robert Haas
Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Joins you can't reorder. Join removal. Aggregates and DISTINCT. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Other ways the planner can fail. Parameters you can tune. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work.
B+ trees are a data structure used to store sorted data like files in a disk. Each node contains key values and pointers to other nodes. Leaf nodes contain file data while internal nodes contain keys to guide searching. Insertion may cause nodes to split, requiring redistribution of keys and merging of nodes. Deletion is handled through redistribution or merging of neighboring nodes to maintain a minimum number of keys per node. B+ trees provide efficient storage and retrieval of sorted data through balanced tree structure and localized rebalancing during updates.
The document discusses different sorting algorithms including merge sort and quicksort. Merge sort has a divide and conquer approach where an array is divided into halves and the halves are merged back together in sorted order. This results in a runtime of O(n log n). Quicksort uses a partitioning approach, choosing a pivot element and partitioning the array into subarrays of elements less than or greater than the pivot. In the best case, this partitions the array in half at each step, resulting in a runtime of O(n log n). In the average case, the runtime is also O(n log n). In the worst case, the array is already sorted, resulting in unbalanced partitions and a quadratic runtime of O(n^2
This document discusses hashing techniques for implementing symbol tables. It begins by reviewing the motivation for symbol tables in compilers and describing the basic operations of search, insertion and deletion that a hash table aims to support efficiently. It then discusses direct addressing and its limitations when key ranges are large. The concept of a hash function is introduced to map keys to a smaller range to enable direct addressing. Collision resolution techniques of chaining and open addressing are covered. Analysis of expected costs for different operations on chaining hash tables is provided. Various hash functions are described including division and multiplication methods, and the importance of choosing a hash function to distribute keys uniformly is discussed. The document concludes by mentioning universal hashing as a technique to randomize the hash function
The document discusses trees and their applications in computer science. It begins by describing the early history of trees in mathematics. Then it defines the basic concepts of trees including nodes, branches, leaves, internal nodes, parents, children, ancestors, descendants, subtrees, and tree traversals. It also covers binary trees specifically, including their structure, traversals, and applications for expression trees. Finally, it discusses general trees and their conversion to binary trees, as well as insertion methods for general trees.
1) The document discusses sharding time series sensor data from 16,000 traffic sensors across the US to support a nationwide traffic monitoring application.
2) It models the read, write and storage patterns and determines that a sharded cluster is needed to store over 500GB of yearly data that will grow significantly over time.
3) It recommends using a compound shard key of {linkID, date} to distribute writes evenly while enabling targeted queries, and storing summary data in a separate replica set for performance.
A binary tree is composed of nodes, where each node contains a value and references (pointers) to a left and right child node. It may be empty or have a root node from which all other nodes are reachable through unique paths. Nodes without child nodes are leaves. The size is the number of nodes, and the depth is the longest path from the root node. Binary trees can be balanced or unbalanced. Common traversals that visit each node once include preorder, inorder, and postorder, which differ in when the root node is visited among its child subtrees.
Four main types of probabilistic data structures are described: membership, cardinality, frequency, and similarity. Bloom filters and cuckoo filters are discussed as membership data structures that can tell if an element is definitely not or may be in a set. Cardinality structures like HyperLogLog are able to estimate large cardinalities with small error rates. Count-Min Sketch is presented as a frequency data structure. MinHash and locality sensitive hashing are covered as similarity data structures that can efficiently find similar documents in large datasets.
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
This document provides an overview of XGBoost, an open-source gradient boosting framework. It begins with introductions to machine learning algorithms and XGBoost specifically. The document then walks through using XGBoost with R, including loading data, running models, cross-validation, and prediction. It discusses XGBoost's use in winning the Higgs Boson machine learning competition and provides code to replicate its solution. Finally, it briefly covers XGBoost's model specification and training objectives.
Divide and Conquer Algorithms - D&C forms a distinct algorithm design technique in computer science, wherein a problem is solved by repeatedly invoking the algorithm on smaller occurrences of the same problem. Binary search, merge sort, Euclid's algorithm can all be formulated as examples of divide and conquer algorithms. Strassen's algorithm and Nearest Neighbor algorithm are two other examples.
This document discusses binary search trees, including:
- Binary search trees allow for fast addition and removal of data by organizing nodes in a way that satisfies ordering properties.
- New nodes are inserted by recursively searching the tree and placing the node in the proper position to maintain ordering - left subtrees must be less than the root and right subtrees greater than or equal.
- The insert function recursively moves down the tree until an unused leaf node is found in the correct position based on comparing its data to the data being inserted.
How to Build a Fraud Detection Solution with Neo4jNeo4j
This document discusses how to build a fraud detection solution using Neo4j graph database. It covers typical fraudsters and types of fraud, challenges with traditional fraud detection methods, and how graph databases can provide a more holistic view of relationships to better detect fraud rings and organized crime. The document also outlines a typical fraud detection architecture with Neo4j at the core to power a 360-degree view of transactions in real-time and help detect patterns. It concludes with a demo and Q&A section.
1. The document discusses AVL trees, which are self-balancing binary search trees. It provides examples of inserting values into an initially empty AVL tree, showing the tree after each insertion and any necessary rotations to maintain balance.
2. Deletion from an AVL tree is more complex than insertion, as it may require rotations at each level to restore balance, with a worst case of log2N rotations. The document outlines the deletion procedure and provides an example requiring multiple rotations.
The document describes Fibonacci heaps, a data structure used to implement priority queues. A Fibonacci heap is a collection of trees with heap-ordered structure. It supports operations like insert, find minimum, extract minimum, decrease key, and delete in amortized O(1) time by lazily consolidating trees. The extract minimum operation does consolidation work to ensure no two roots have the same degree. Fibonacci heaps improve the running time of Dijkstra's shortest path algorithm compared to binomial heaps.
This document presents algorithmic puzzles and their solutions. It discusses puzzles involving counterfeit coins, uneven water pitchers, strong eggs on tiny floors, and people arranged in a circle. For each puzzle, it provides the problem description, an analysis or solution approach, and sometimes additional discussion. The document is a presentation on algorithmic puzzles given by Amrinder Arora, including their contact information.
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
Euclid's algorithm for finding greatest common divisor is an elegant algorithm that can be written iteratively as well as recursively. The time complexity of this algorithm is O(log^2 n) where n is the larger of the two inputs.
The document discusses binary tree traversal methods. It defines key binary tree terminology like nodes, edges, root, and provides examples of different types of binary trees like strictly binary, complete, and almost complete binary trees. It also explains the three common traversal techniques for binary search trees - in-order, pre-order and post-order traversals - and provides pseudocode algorithms and examples for each traversal method.
Binary search trees (BSTs) are data structures that allow for efficient searching, insertion, and deletion. Nodes in a BST are organized so that all left descendants of a node are less than the node's value and all right descendants are greater. This property allows values to be found, inserted, or deleted in O(log n) time on average. Searching involves recursively checking if the target value is less than or greater than the current node's value. Insertion follows the search process and adds the new node in the appropriate place. Deletion handles three cases: removing a leaf, node with one child, or node with two children.
Robert Haas
Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Joins you can't reorder. Join removal. Aggregates and DISTINCT. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Other ways the planner can fail. Parameters you can tune. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work.
B+ trees are a data structure used to store sorted data like files in a disk. Each node contains key values and pointers to other nodes. Leaf nodes contain file data while internal nodes contain keys to guide searching. Insertion may cause nodes to split, requiring redistribution of keys and merging of nodes. Deletion is handled through redistribution or merging of neighboring nodes to maintain a minimum number of keys per node. B+ trees provide efficient storage and retrieval of sorted data through balanced tree structure and localized rebalancing during updates.
The document discusses different sorting algorithms including merge sort and quicksort. Merge sort has a divide and conquer approach where an array is divided into halves and the halves are merged back together in sorted order. This results in a runtime of O(n log n). Quicksort uses a partitioning approach, choosing a pivot element and partitioning the array into subarrays of elements less than or greater than the pivot. In the best case, this partitions the array in half at each step, resulting in a runtime of O(n log n). In the average case, the runtime is also O(n log n). In the worst case, the array is already sorted, resulting in unbalanced partitions and a quadratic runtime of O(n^2
This document discusses hashing techniques for implementing symbol tables. It begins by reviewing the motivation for symbol tables in compilers and describing the basic operations of search, insertion and deletion that a hash table aims to support efficiently. It then discusses direct addressing and its limitations when key ranges are large. The concept of a hash function is introduced to map keys to a smaller range to enable direct addressing. Collision resolution techniques of chaining and open addressing are covered. Analysis of expected costs for different operations on chaining hash tables is provided. Various hash functions are described including division and multiplication methods, and the importance of choosing a hash function to distribute keys uniformly is discussed. The document concludes by mentioning universal hashing as a technique to randomize the hash function
The document discusses trees and their applications in computer science. It begins by describing the early history of trees in mathematics. Then it defines the basic concepts of trees including nodes, branches, leaves, internal nodes, parents, children, ancestors, descendants, subtrees, and tree traversals. It also covers binary trees specifically, including their structure, traversals, and applications for expression trees. Finally, it discusses general trees and their conversion to binary trees, as well as insertion methods for general trees.
1) The document discusses sharding time series sensor data from 16,000 traffic sensors across the US to support a nationwide traffic monitoring application.
2) It models the read, write and storage patterns and determines that a sharded cluster is needed to store over 500GB of yearly data that will grow significantly over time.
3) It recommends using a compound shard key of {linkID, date} to distribute writes evenly while enabling targeted queries, and storing summary data in a separate replica set for performance.
A binary tree is composed of nodes, where each node contains a value and references (pointers) to a left and right child node. It may be empty or have a root node from which all other nodes are reachable through unique paths. Nodes without child nodes are leaves. The size is the number of nodes, and the depth is the longest path from the root node. Binary trees can be balanced or unbalanced. Common traversals that visit each node once include preorder, inorder, and postorder, which differ in when the root node is visited among its child subtrees.
Four main types of probabilistic data structures are described: membership, cardinality, frequency, and similarity. Bloom filters and cuckoo filters are discussed as membership data structures that can tell if an element is definitely not or may be in a set. Cardinality structures like HyperLogLog are able to estimate large cardinalities with small error rates. Count-Min Sketch is presented as a frequency data structure. MinHash and locality sensitive hashing are covered as similarity data structures that can efficiently find similar documents in large datasets.
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
This document provides an overview of XGBoost, an open-source gradient boosting framework. It begins with introductions to machine learning algorithms and XGBoost specifically. The document then walks through using XGBoost with R, including loading data, running models, cross-validation, and prediction. It discusses XGBoost's use in winning the Higgs Boson machine learning competition and provides code to replicate its solution. Finally, it briefly covers XGBoost's model specification and training objectives.
Divide and Conquer Algorithms - D&C forms a distinct algorithm design technique in computer science, wherein a problem is solved by repeatedly invoking the algorithm on smaller occurrences of the same problem. Binary search, merge sort, Euclid's algorithm can all be formulated as examples of divide and conquer algorithms. Strassen's algorithm and Nearest Neighbor algorithm are two other examples.
This document discusses binary search trees, including:
- Binary search trees allow for fast addition and removal of data by organizing nodes in a way that satisfies ordering properties.
- New nodes are inserted by recursively searching the tree and placing the node in the proper position to maintain ordering - left subtrees must be less than the root and right subtrees greater than or equal.
- The insert function recursively moves down the tree until an unused leaf node is found in the correct position based on comparing its data to the data being inserted.
How to Build a Fraud Detection Solution with Neo4jNeo4j
This document discusses how to build a fraud detection solution using Neo4j graph database. It covers typical fraudsters and types of fraud, challenges with traditional fraud detection methods, and how graph databases can provide a more holistic view of relationships to better detect fraud rings and organized crime. The document also outlines a typical fraud detection architecture with Neo4j at the core to power a 360-degree view of transactions in real-time and help detect patterns. It concludes with a demo and Q&A section.
1. The document discusses AVL trees, which are self-balancing binary search trees. It provides examples of inserting values into an initially empty AVL tree, showing the tree after each insertion and any necessary rotations to maintain balance.
2. Deletion from an AVL tree is more complex than insertion, as it may require rotations at each level to restore balance, with a worst case of log2N rotations. The document outlines the deletion procedure and provides an example requiring multiple rotations.
The document describes Fibonacci heaps, a data structure used to implement priority queues. A Fibonacci heap is a collection of trees with heap-ordered structure. It supports operations like insert, find minimum, extract minimum, decrease key, and delete in amortized O(1) time by lazily consolidating trees. The extract minimum operation does consolidation work to ensure no two roots have the same degree. Fibonacci heaps improve the running time of Dijkstra's shortest path algorithm compared to binomial heaps.
This document presents algorithmic puzzles and their solutions. It discusses puzzles involving counterfeit coins, uneven water pitchers, strong eggs on tiny floors, and people arranged in a circle. For each puzzle, it provides the problem description, an analysis or solution approach, and sometimes additional discussion. The document is a presentation on algorithmic puzzles given by Amrinder Arora, including their contact information.
Euclid's Algorithm for Greatest Common Divisor - Time Complexity AnalysisAmrinder Arora
Euclid's algorithm for finding greatest common divisor is an elegant algorithm that can be written iteratively as well as recursively. The time complexity of this algorithm is O(log^2 n) where n is the larger of the two inputs.
Convex Hull - Chan's Algorithm O(n log h) - Presentation by Yitian Huang and ...Amrinder Arora
Chan's Algorithm for Convex Hull Problem. Output Sensitive Algorithm. Takes O(n log h) time. Presentation for the final project in CS 6212/Spring/Arora.
This presentation defines online algorithms, and discusses how we can analyze them using competitive analysis. Using ski rental and ice cream machine as example problems, it covers the applications of online algorithms in load balancing and other verticals.
The document discusses various priority queue data structures like binary heaps, binomial heaps, and Fibonacci heaps. It begins with an overview of binary heaps and their implementation using arrays. It then covers operations like insertion and removal on heaps. Next, it describes binomial heaps and their properties and operations like union and deletion. Finally, it discusses Fibonacci heaps and how they allow decreasing a key in amortized constant time, improving algorithms like Dijkstra's.
Splay Trees and Self Organizing Data StructuresAmrinder Arora
This document discusses splay trees, a self-organizing binary search tree data structure. Splay trees perform rotations to move recently accessed elements closer to the root of the tree after operations like insertion, deletion and search. While splay trees do not guarantee an upper bound on tree height like AVL or red-black trees, the amortized time of all operations is O(log n). Splaying restructures the tree in a way that frequently accessed elements are faster to access in the future.
The document discusses B-trees, which are tree data structures used to store large data sets that cannot fit into main memory. B-trees allow for fast retrieval of data by balancing search trees across multiple disk blocks. They work by having internal nodes with up to m children, with leaf nodes containing keys in sorted order. This balancing allows B-trees to provide fast access and search times even for very large datasets spanning multiple disk drives.
Graph Traversal Algorithms - Depth First Search TraversalAmrinder Arora
This document discusses graph traversal techniques, specifically depth-first search (DFS) and breadth-first search (BFS). It provides pseudocode for DFS and explains key properties like edge classification, time complexity of O(V+E), and applications such as finding connected components and articulation points.
Divide and Conquer - Part II - Quickselect and Closest Pair of PointsAmrinder Arora
This document discusses divide and conquer algorithms. It covers the closest pair of points problem, which can be solved in O(n log n) time using a divide and conquer approach. It also discusses selection algorithms like quickselect that can find the median or kth element of an unsorted array in linear time O(n) on average. The document provides pseudocode for these algorithms and analyzes their time complexity using recurrence relations. It also provides an overview of topics like mergesort, quicksort, and solving recurrence relations that were covered in previous lectures.
The document presents information on B-trees. It begins by defining B-trees as balanced search trees designed for storage systems that read and write large blocks of data like disks. B-trees generalize binary search trees by allowing nodes to have more than two children. They were invented to index large datasets that could not fit entirely in main memory. The document then discusses properties of B-trees like their minimum and maximum number of keys per node, search, insertion and deletion algorithms, and provides examples of constructing a B-tree.
The document discusses the motivation for using B-trees to store large datasets that do not fit into main memory. It notes that while binary search trees provide logarithmic-time performance, disk access times are significantly slower than memory. B-trees are designed to group related data together to minimize disk I/O and improve performance. The document defines B-trees as m-way search trees where nodes can have up to m children, leaves are on the same level, and operations like insertion and deletion involve splitting or merging nodes to balance the tree.
The document discusses 2-3 trees, which are balanced search trees invented by John Hopcroft in 1970. 2-3 trees have the following properties: each node contains 1 or 2 keys, each internal node has 2 children if it has 1 key or 3 children if it has 2 keys, and all leaves are on the same level. The document covers insertion, search, and removal algorithms for 2-3 trees. Insertion may require splitting nodes and promoting keys up the tree to maintain the balanced structure. Search works by recursively traversing the tree to the appropriate child node based on key comparison. Removal is the reverse of insertion, removing nodes by restructuring child pointers or replacing keys.
16807097.ppt b tree are a good data structureSadiaSharmin40
The document discusses B-trees, which are tree data structures used to store large datasets on disk. B-trees allow for faster retrieval of data compared to binary search trees when data exceeds main memory. B-trees differ from binary search trees in that internal nodes can have more than two child nodes and store multiple key-value pairs. The document outlines the rules for B-tree structure and provides examples of inserting and removing elements from a B-tree while maintaining its balanced structure.
1. Disk access times are much slower than CPU instruction times, so minimizing disk accesses is important for performance.
2. B-trees address this issue by allowing nodes to have multiple children, reducing the height of the tree and thus the number of required disk accesses to retrieve data.
3. Keys are inserted into B-trees by adding them to leaves and splitting nodes as they become full, and deleted by removing from leaves or borrowing/promoting keys from siblings if needed to maintain minimum occupancy.
B-trees are multiway search trees used to store large datasets on disk. They reduce the height of the tree compared to binary trees, lowering the number of disk accesses needed for operations like search. A B-tree of order m has internal nodes with up to m children, keeps the leaves at the same level, and remains balanced during insertions and deletions which may involve splitting and merging nodes as well as promoting keys. B-trees are efficient for disk-based data structures due to their ability to group adjacent records into each node transfer.
This document provides details about a Data Structures course being taught in the Department of Computer Science and Engineering. It includes information such as the course code, category, unit number, topic of trees, faculty name, prerequisites, related courses and course outcomes. It then outlines the agenda for the topic of trees, which will cover introduction to trees, binary trees, tree traversals, binary search trees and AVL trees.
B-trees are multi-way tree data structures used to store large datasets, such as an index, that are too large to fit in memory. B-trees reduce the number of disk accesses needed compared to binary trees by allowing nodes to have more than two child nodes. Keys in non-leaf nodes partition the keys in child nodes, all leaves are at the same level, and nodes are kept at least half full. Insertion and deletion may cause nodes to split or merge to maintain these properties.
Indexed sequential files provide both indexed and sequential access to records in a file. Records are organized into blocks, and a B+ tree index structure is used to index the blocks. This allows both efficient indexed access via the B+ tree as well as sequential access by scanning blocks. B+ trees support insertion and deletion of records through localized splitting, merging, and redistribution of blocks and index nodes to maintain balance and efficiency.
This document defines and provides examples of B-trees, which are multi-way search trees used to store data in databases. B-trees allow for fast searches, insertions, and deletions. The key points are:
1. B-trees are m-way trees where each node can have up to m children. They maintain sorted keys and partition data to keep leaves at the same level.
2. An example demonstrates a 5-way B-tree containing 26 items. The process of constructing a B-tree by inserting keys one by one is also shown step-by-step.
3. Deletion from a B-tree can involve removing a key from a leaf, borrowing/prom
The document discusses different methods for storing data in a database management system, including sequential storage, indexing, and hashing. It describes:
1. Sequential storage stores each row in a predefined order, allowing for fixed data retrieval but slowing down finding arbitrary rows. Indexing solves this by creating a separate index table with pointers to data.
2. Indexing involves sorting key values from the original table with pointers to full row data, allowing for both fast random and sequential access. Linked lists can be used for indexing to facilitate efficient inserts and deletes.
3. Hashing converts key values into storage locations using a hash function and prime number, providing fast random access but no sequential retrieval and more reorganization during inserts
The document discusses different methods for storing data in a database management system, including sequential storage, indexing, and hashing. It describes the benefits and drawbacks of each method. Sequential storage allows for fast sequential retrieval but slow random access. Indexing improves random access time through the use of pointers but can be slow for inserts. Hashing converts key values directly to storage locations, allowing very fast random access, but has no sequential retrieval and is susceptible to collisions.
It stand for multi-way search tree
Multi-way search trees are a generalized version of binary trees
That allow for efficient data searching and sorting
In contrast to binary search trees, which can only have two children per node
Multi-way search trees can have multiple children per node
2-3 Trees
2-3 trees are the data structure same as trees, but it has some different properties like any node can have either single value or double value. So, there are two types of nodes in 2-3 trees:
Search:
If T is empty, return False (key cannot be found in the tree).
If current node contains data value which is equal to K, return True.
If we reach the leaf-node and it doesn’t contain the required key value K, return False.
Recursive Calls:
If K < currentNode.leftVal, we explore the left subtree of the current node.
Else if currentNode.leftVal < K < currentNode.rightVal, we explore the middle subtree of the current node.
Else if K > currentNode.rightVal, we explore the right subtree of the current node.
Insertion:
There are 3 possible cases in insertion which have been discussed below:
Case 1: Insert in a node with only one data element
Case 2: Insert in a node with two data elements whose parent contains only one data element.
Case 3: Insert in a node with two data elements whose parent also contains two data elements.
Deletion
To delete a value, it is replaced by its in-order successor and then removed.
If a node is left with less than one data value then two nodes must be merged together.
If a node becomes empty after deleting a value, it is then merged with another node.
1. The document discusses three primary methods for storing data in a database: sequential storage, index tables, and direct/hashed key storage.
2. Sequential storage stores each row in a predefined order, allowing for fixed data retrieval but slowing down finding arbitrary rows. Index tables use pointers to link key values to data, allowing for faster random access but requiring reorganizing indexes on data changes.
3. Direct/hashed key storage converts key values into storage locations using a hashing function, allowing very fast random access but no sequential retrieval and potential collisions from multiple keys hashing to the same location.
R-Trees are an excellent data structure for managing geo-spatial data. Commonly used by mapping applications and any other applications that use the location to customize content. Minimum Bounding Rectangle (MBR) is a commonly used concept in R-trees, which are a modified form of B-trees.
The document discusses 2-3 trees, which are trees where each internal node has either 2 or 3 children and all leaves are at the same depth. Items are inserted by adding them to leaves, which may cause leaves to split into two. Items are deleted by removing them from leaves, which may cause leaves to merge. This allows 2-3 trees to maintain balance more easily than binary search trees during insertions and deletions.
Stacks, Queues, Binary Search Trees - Lecture 1 - Advanced Data StructuresAmrinder Arora
This document introduces the course CS 6213 - Advanced Data Structures. It discusses what data structures are, how they are designed to efficiently support specific operations, and provides examples. Common data structures like stacks, queues, linked lists, trees, and graphs are introduced along with their basic operations and implementations. Real-world applications of these data structures are also mentioned.
This document discusses B+ trees, which are a type of self-balancing tree used to store data in a block-oriented storage context like file systems. It provides examples of how B+ trees are structured, with internal nodes, leaves, and keys to direct searching. The document explains that B+ trees enable efficient retrieval of data through operations like searching, insertion, and deletion in logarithmic time. It demonstrates examples of inserting, deleting, and rebalancing nodes in a B+ tree through splitting and merging as the tree is modified.
This document discusses algorithms for NP-complete problems. It introduces the maximum independent set problem and shows that while it is NP-complete for general graphs, it can be solved efficiently for trees using a recursive formulation. It also discusses the traveling salesperson problem and presents a dynamic programming algorithm that provides a better running time than brute force. Finally, it discusses approximation algorithms for the TSP and shows a 2-approximation algorithm that finds a tour with cost at most twice the optimal using minimum spanning trees.
Graph Traversal Algorithms - Breadth First SearchAmrinder Arora
The document discusses branch and bound algorithms. It begins with an overview of breadth first search (BFS) and how it can be used to solve problems on infinite mazes or graphs. It then provides pseudocode for implementing BFS using a queue data structure. Finally, it discusses branch and bound as a general technique for solving optimization problems that applies when greedy methods and dynamic programming fail. Branch and bound performs a BFS-like search, but prunes parts of the search tree using lower and upper bounds to avoid exploring all possible solutions.
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana. Presentation for CS 6212 final project in GWU during Fall 2015 (Prof. Arora's class)
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Als...Amrinder Arora
Stopping Rule for Secretory Problem - Presentation by Haoyang Tian, Wesam Alshami and Dong Wang. Final Presentation for P4, in CS 6212, Fall 2015 taught by Prof. Arora.
Proof of O(log *n) time complexity of Union find (Presentation by Wei Li, Zeh...Amrinder Arora
The document discusses the union find algorithm and its time complexity. It defines the union find problem and three operations: MAKE-SET, FIND, and UNION. It describes optimizations like union by rank and path compression that achieve near-linear time complexity of O(m log* n) for m operations on n elements. It proves several lemmas about ranks and buckets to establish this time complexity through an analysis of the costs of find operations.
How multiple experts can be leveraged in a machine learning application without knowing apriori who are "good" experts and who are "bad" experts. See how we can quantify the bounds on the overall results.
NP completeness. Classes P and NP are two frequently studied classes of problems in computer science. Class P is the set of all problems that can be solved by a deterministic Turing machine in polynomial time.
Dynamic Programming design technique is one of the fundamental algorithm design techniques, and possibly one of the ones that are hardest to master for those who did not study it formally. In these slides (which are continuation of part 1 slides), we cover two problems: maximum value contiguous subarray, and maximum increasing subsequence.
This document discusses dynamic programming techniques. It covers matrix chain multiplication and all pairs shortest paths problems. Dynamic programming involves breaking down problems into overlapping subproblems and storing the results of already solved subproblems to avoid recomputing them. It has four main steps - defining a mathematical notation for subproblems, proving optimal substructure, deriving a recurrence relation, and developing an algorithm using the relation.
This is the second lecture in the CS 6212 class. Covers asymptotic notation and data structures. Also outlines the coming lectures wherein we will study the various algorithm design techniques.
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
Asymptotic Notation is a notation used to represent and compare the efficiency of algorithms. It is a concise notation that deliberately omits details, such as constant time improvements, etc. Asymptotic notation consists of 5 commonly used symbols: big oh, small oh, big omega, small omega, and theta.
Set Operations - Union Find and Bloom FiltersAmrinder Arora
Set Operations - make set, union, find and contains are standard operations that appear in many scenarios. Union Find is a marvelous data structure to solve problems involving union and find operations.
Different use arises when we merely want to answer queries on whether a set contains an element x without keeping the entire set in the memory. Bloom Filters play an interesting role there.
The document describes a course on advanced data structures. It provides information on the instructor, teaching assistant, topics to be covered including AVL and Red Black Trees, objectives of learning deletions, insertions and searches in logarithmic time. It also lists credits to other professors and researchers. The document then goes into details about balanced binary search trees, describing properties of AVL Trees and Red Black Trees to ensure the tree remains balanced during operations.
I teach Computer Science, usually algorithms at George Washington University. But once during the semester, I cover something about the learning process itself.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
2. Instructor
Prof. Amrinder Arora
amrinder@gwu.edu
Please copy TA on emails
Please feel free to call as well
TA
Iswarya Parupudi
iswarya2291@gwmail.gwu.edu
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 2
LOGISTICS
3. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 3
CS 6213
Basics
Record /
Struct
Arrays / Linked
Lists / Stacks
/ Queues
Graphs / Trees
/ BSTs
Advanced
Trie, B-Tree
Splay Trees
R-Trees
Heaps and PQs
Union Find
4. T.K.Prasad @ Purdue University
Prof. Sin-Min Lee @ San Jose State University
Rada Mihalcea @ University of North Texas
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 4
CREDITS
5. Eventually you run out of RAM
Plus, you need persistent storage
Storing information on disk requires different
approach to efficiency
Access time includes seek time and rotational delay
Assuming that a disk spins at 3600 RPM, one
revolution occurs in 1/60 of a second, or 16.7ms.
In other words, one disk access takes about the
same time as 200,000 instructions
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 5
MOTIVATION FOR B-TREES
6. Assume that we use an AVL tree to store about 20 million
records
log2 20,000,000 is about 24
24 operations in terms of time is very small (4 GHz CPU,
etc).
Normal data operation should take a few nanoseconds.
However, a large binary tree in a file will cause lots of
different disk accesses
24 * 16.7ms = 0.4 seconds
Suddenly database query response time in seconds starts
making sense.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 6
MOTIVATION FOR B-TREES (CONT.)
7. We can’t improve the theoretical log n lower bound
on search for a binary tree
But, the solution is to use more branches and thus
reduce the height of the tree!
As branching increases, depth decreases
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 7
MOTIVATION FOR B-TREES (CONT.)
8. Invented by Bayer and McCreight in 1972
(Bayer also invented Red Black Trees)
Definition is in terms of “order”, which is not always
clear, and different researchers mean different
things, but concepts remain the same.
We will use Knuth’s terminology, where order
represents the maximum number of children.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 8
B-TREES
9. B-tree of order m (where m is an odd number) is an
m-way search tree, where keys partition the keys in
the children in the fashion of a search tree, with
following additional constraints:
1. [max] a node contains up to m – 1 keys and up to m children
(Actual number of keys is one less than the number of
children)
2. [min] all non-root nodes contain at least (m-1)/2 keys
3. [leaf level] all leaves are on the same level
4. [root] the root is either a leaf node, or it has at least two
children
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 9
B-TREE: DEFINITION
10. While as per Knuth’s definition B-Tree of order 5 is a
tree where a node has a maximum of 5 children
nodes, the same tree may be defined as a [2,4] tree
in the sense that for any node, the number of keys is
between 2 and 4, both inclusive.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 10
B-TREE: ALTERNATE DEFINITION
11. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 11
AN EXAMPLE B-TREE
51 6242
6 12
35
55 60 7564 9245
1 2 4 7 8 13 15 18 32
38 40 46 48 53
A B-tree of order 5:
• Root has at least 2 children
• Every other non-leaf node has at
least 2 keys and 3 children
• Each leaf has at least two keys
• All leaves are at same level.
61
12. Different approach compared AVL Trees
Don’t insert a new leaf, rather split the root and add
a new level above the root. This automatically
increases the height of ALL the leaves by one.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 12
KEEPING THE HEIGHT SAME
13. We want to construct a B-tree of order 5
Suppose we start with an empty B-tree and keys
arrive in the following order: 1 12 8 2 25 5 14 28
17 7 52 16 48 68 3 26 29 53 55 45
The first four items go into the root:
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 13
CONSTRUCTING A B-TREE
1 2 8 12
14. To put the fifth item in the root would violate
constraint 1 (max)
Therefore, when 25 arrives, we pick the middle key
to make a new root
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 14
CONSTRUCTING A B-TREE (CONT.)
1 2
8
12 25
15. 6, 14, 28 get added to the leaf nodes
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 15
CONSTRUCTING A B-TREE (CONT.)
1 2
8
12 146 25 28
16. Adding 17 to the right leaf node would violate
constraint 1 (max), so we promote the middle key
(17) to the root and split the leaf
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 16
CONSTRUCTING A B-TREE (CONT.)
8 17
12 14 25 281 2 6
17. 7, 52, 16, 48 get added to the leaf nodes
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 17
CONSTRUCTING A B-TREE (CONT.)
8 17
12 14 25 281 2 6 16 48 527
18. Adding 68 causes us to split the right most leaf,
promoting 48 to the root, and adding 3 causes us to
split the left most leaf, promoting 3 to the root; 26,
29, 53, 55 then go into the leaves
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 18
CONSTRUCTING A B-TREE (CONT.)
3 8 17 48
52 53 55 6825 26 28 291 2 6 7 12 14 16
19. Adding 45 causes a split of
But we observe that this does not cause the problem
of leaves at different heights.
Rather, we promote 28 to go to the root.
However, root is already full:
So, this causes the root to split: 17 then becomes the
new root.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 19
CONSTRUCTING A B-TREE (CONT.)
25 26 28 29
3 8 17 48
21. Attempt to insert the new key into a leaf
If this would result in that leaf becoming too big,
split the leaf into two, promoting the middle key to
the leaf’s parent
If this would result in the parent becoming too big,
split the parent into two, promoting the middle key
This strategy might have to be repeated all the way
to the top
If necessary, the root is split in two and the middle
key is promoted to a new root, making the tree one
level higher
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 21
SUMMARY: INSERTING INTO A B-TREE
22. Insert the following keys to a 5-way B-tree:
13, 27, 51, 3, 2, 14, 28, 1, 7, 71, 89, 37, 41, 44
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 22
EXERCISE IN INSERTING A B-TREE
23. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 23
REMOVAL FROM A B-TREE – 4
SCENARIOS
Scenario 1:
• Key to delete is a leaf node, and removing it doesn’t
cause that leaf node to have too few keys, then simply
remove the key to be deleted.
Scenario 2:
• Key to delete is not in a leaf and moving its successor or
predecessor does not cause the leaf node to have too
few keys. (We are guaranteed by the nature of a B-tree
that its predecessor or successor will be in a leaf.)
Scenario 3:
• Key to delete is a leaf node, but deleting it will have the
leaf to have too few keys, and we can borrow from an
adjacent leaf node.
Scenario 4:
• Key to delete is a leaf node, but deleting it will have the
leaf to have too few keys, and we cannot borrow from an
adjacent leaf node. Then the lacking leaf and one of its
neighbours can be combined with their shared parent
(the opposite of promoting a key) and the new leaf will
have the correct number of keys; if this step leave the
parent with too few keys then we repeat the process up
to the root itself, if required
24. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 24
SIMPLE LEAF DELETION
12 29 52
2 7 9 15 22 56 69 7231 43
We want to delete 2:
Since there are enough
keys in the node, we can just
delete it
Scenario1
25. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 25
SIMPLE LEAF DELETION (CONT.)
12 29 52
7 9 15 22 56 69 7231 43
That’s it, we deleted 2 and we
are done.
Scenario1
26. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 26
SIMPLE NON-LEAF DELETION
12 29 52
7 9 15 22 56 69 7231 43
Borrow the predecessor
or (in this case) successor
We want to delete 52. So, we
delete it, and see that the
successor can be moved up.
Scenario2
27. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 27
SIMPLE NON-LEAF DELETION (CONT.)
12 29 56
7 9 15 22 69 7231 43
Done. 52 is gone. 56
promoted to the non-leaf
node. Leaf nodes are still
meeting the min constraint.
Scenario2
28. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 28
TOO FEW KEYS IN NODE, BUT WE
CAN BORROW FROM SIBLINGS
12 29
7 9 15 22 695631 43
We want to delete 22 – that will
lead to too few keys in the node
(constraint 2). But we can borrow
from the adjacent node (via the
root).
Demote root key and
promote leaf key
Scenario3
29. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 29
TOO FEW KEYS IN NODE, BUT WE CAN
BORROW FROM SIBLINGS (CONT.)
12
297 9 15
31
695643
Done – 22 is gone. 29 came
down from the parent node, and
31 has gone up from the right
adjacent node.
Scenario3
30. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 30
TOO FEW KEYS IN NODE AND ITS
SIBLINGS
12 29 56
7 9 15 22 69 7231 43
We want to delete 72. This will lead to too few keys in
this node (constraint 2). We cannot borrow from the
adjacent sibling as it only has two. So, we need to
combine 31, 43, 56 and 69 into one node.
Scenario4
31. L4 - BTrees CS 6213 - Advanced Data Structures - Arora 31
TOO FEW KEYS IN NODE AND ITS
SIBLINGS (CONT.)
12 29
7 9 15 22 695631 43
Done. 72 is gone. 31, 43, 56 and 69
combined into one node.
Scenario4
32. The maximum number of items in a B-tree of order m and
height h:
root m – 1
level 1 m(m – 1)
level 2 m2(m – 1)
. . .
level h mh(m – 1)
So, the total number of items is
(1 + m + m2 + m3 + … + mh)(m – 1) =
[(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1
When m = 5 and h = 2 this gives 53 – 1 = 124
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 32
ANALYSIS OF B-TREES
33. Since there is a lower bound on the number of child
nodes of non-root nodes, a B-Tree is at least 50%
“full”.
On average it is 75% full.
The advantage of not being 100% full is that there
are empty spaces for insertions to happen without
going all the way to the root.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 33
ANALYSIS OF B-TREES (CONT.)
34. If m = 3, the specific case of B-Tree is called a 2-3
tree.
For in memory access, 2-3 Tree may be a good
alternative to Red Black or AVL Tree.
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 34
2-3 TREES
35. For small in-memory data structures, BSTs, Arrays, Hashmaps,
etc. work well.
When we exceed the size of the RAM, or for persistence
reasons, the problem becomes quite different.
The cost of each disc transfer is high but doesn't depend much on the
amount of data transferred, especially if adjacent items are transferred
B-Trees are a great alternative (and very highly used) data
structure for disk accesses
A B-Tree of order m allows each node to have from m/2 up to m children.
There is flexibility that allows for gaps. This flexibility allows: (i) some
new elements to be stored in leaves with no other changes, and (ii) some
elements to be deleted easily without changes propagating to root
If we use a B-tree of order 101, a B-tree of order 101 and height 3 can
hold 1014 – 1 items (approximately 100 million) and any item can be
accessed with 3 disc reads (assuming we hold the root in memory)
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 35
CONCLUSIONS &
RECAP OF CENTRAL IDEA
36. If we take m = 3, we get a 2-3 tree, in which non-leaf
nodes have two or three children (i.e., one or two
keys)
B-Trees are always balanced (since the leaves are all at the
same level), so 2-3 trees make a good type of balanced tree
L4 - BTrees CS 6213 - Advanced Data Structures - Arora 36
CONCLUSIONS AND RECAP (CONT.)