This document discusses extendible hashing, which is a hashing technique for dynamic files that allows efficient insertion and deletion of records. It works by using a directory to map hash values to buckets, and dynamically expanding the directory size and number of buckets as needed to accommodate new records. When a bucket overflows, it is split into two buckets, and the directory is expanded to distinguish them. The directory size can also be contracted when buckets can be combined due to deletions. Alternative approaches like dynamic hashing and linear hashing that address the same problem of dynamic files are also overviewed.
The document discusses various file allocation methods and disk scheduling algorithms. There are three main file allocation methods - contiguous allocation, linked allocation, and indexed allocation. Contiguous allocation suffers from fragmentation but allows fast sequential access. Linked allocation does not have external fragmentation but is slower. Indexed allocation supports direct access but has higher overhead. For disk scheduling, algorithms like FCFS, SSTF, SCAN, CSCAN, and LOOK are described. SSTF provides lowest seek time while SCAN and CSCAN have higher throughput but longer wait times.
The document discusses various types of constraints in SQL including column level constraints like NOT NULL, UNIQUE, DEFAULT, and CHECK constraints as well as table level constraints like PRIMARY KEY and FOREIGN KEY. It provides examples of how to define these constraints when creating or altering tables and explains how each constraint enforces integrity rules and data validation. Constraints are used to impose rules on data values and relationships between columns and tables.
2005: Natural Computing - Concepts and ApplicationsLeandro de Castro
The document discusses natural computing, which encompasses computing inspired by nature, simulating natural phenomena using computers, and using natural materials for computing. It surveys ideas from neurocomputing, evolutionary computing, swarm intelligence, immunocomputing, and artificial life. These fields take inspiration from neural networks, evolution, collective animal behavior, the immune system, and the synthesis of life-like behaviors to develop new algorithms and applications. The goal is to develop more robust, adaptive, and fault-tolerant computing approaches.
This week's session is on SQL Views: what they are, how to create them, how to insert, update and delete data through them along with other key details to know!
Watch the video at:
http://www.aaronbuma.com/2016/01/views/
The document discusses abstract data types (ADTs) and several common data structures, including stacks, queues, linked lists, trees, and their applications. An ADT defines a data type and operations on that data type without specifying how those operations are implemented. Common programming languages define simple ADTs like integers, while more complex ADTs like lists, stacks and queues must be explicitly defined. The key operations and applications of each data structure are described.
The document discusses various concurrency control techniques used in database management systems to ensure transaction isolation. It covers locking techniques like two-phase locking and timestamp ordering. Locking involves associating locks like read/write locks with data items. The two-phase locking protocol defines rules for acquiring and releasing locks in two distinct phases. Timestamp ordering assigns unique timestamps to transactions and ensures conflicting operations are executed based on timestamp order to guarantee serializability.
This document provides an overview of Android application development training on accessing and manipulating data using SQLite databases on Android. It covers topics like what SQLite is, creating and connecting to databases, setting database properties, creating tables, inserting, updating, and deleting records from the databases using ContentValues and SQLiteDatabase methods. Code examples are provided for each topic.
The document discusses various file allocation methods and disk scheduling algorithms. There are three main file allocation methods - contiguous allocation, linked allocation, and indexed allocation. Contiguous allocation suffers from fragmentation but allows fast sequential access. Linked allocation does not have external fragmentation but is slower. Indexed allocation supports direct access but has higher overhead. For disk scheduling, algorithms like FCFS, SSTF, SCAN, CSCAN, and LOOK are described. SSTF provides lowest seek time while SCAN and CSCAN have higher throughput but longer wait times.
The document discusses various types of constraints in SQL including column level constraints like NOT NULL, UNIQUE, DEFAULT, and CHECK constraints as well as table level constraints like PRIMARY KEY and FOREIGN KEY. It provides examples of how to define these constraints when creating or altering tables and explains how each constraint enforces integrity rules and data validation. Constraints are used to impose rules on data values and relationships between columns and tables.
2005: Natural Computing - Concepts and ApplicationsLeandro de Castro
The document discusses natural computing, which encompasses computing inspired by nature, simulating natural phenomena using computers, and using natural materials for computing. It surveys ideas from neurocomputing, evolutionary computing, swarm intelligence, immunocomputing, and artificial life. These fields take inspiration from neural networks, evolution, collective animal behavior, the immune system, and the synthesis of life-like behaviors to develop new algorithms and applications. The goal is to develop more robust, adaptive, and fault-tolerant computing approaches.
This week's session is on SQL Views: what they are, how to create them, how to insert, update and delete data through them along with other key details to know!
Watch the video at:
http://www.aaronbuma.com/2016/01/views/
The document discusses abstract data types (ADTs) and several common data structures, including stacks, queues, linked lists, trees, and their applications. An ADT defines a data type and operations on that data type without specifying how those operations are implemented. Common programming languages define simple ADTs like integers, while more complex ADTs like lists, stacks and queues must be explicitly defined. The key operations and applications of each data structure are described.
The document discusses various concurrency control techniques used in database management systems to ensure transaction isolation. It covers locking techniques like two-phase locking and timestamp ordering. Locking involves associating locks like read/write locks with data items. The two-phase locking protocol defines rules for acquiring and releasing locks in two distinct phases. Timestamp ordering assigns unique timestamps to transactions and ensures conflicting operations are executed based on timestamp order to guarantee serializability.
This document provides an overview of Android application development training on accessing and manipulating data using SQLite databases on Android. It covers topics like what SQLite is, creating and connecting to databases, setting database properties, creating tables, inserting, updating, and deleting records from the databases using ContentValues and SQLiteDatabase methods. Code examples are provided for each topic.
The document discusses various database recovery techniques including log-based recovery, shadow paging recovery, and recovery with concurrent transactions. Log-based recovery uses a log to record transactions and supports either deferred or immediate database modification. Shadow paging maintains a shadow page table to allow recovery to a previous state. Checkpointing improves recovery performance. Recovery for concurrent transactions uses undo and redo lists constructed during the recovery process.
This document provides an overview of timestamp protocols in database management systems. It discusses how timestamps are generated and used to order transactions. The basic timestamp ordering protocol checks timestamps on read and write operations to ensure serializability. Strict timestamp ordering delays some transactions to ensure schedules are both serializable and strict. Multiversion timestamp ordering uses multiple versions of data items to allow reads to always succeed while maintaining serializability.
The document discusses various searching and sorting algorithms. It describes linear search, binary search, selection sort, bubble sort, and heapsort. For each algorithm, it provides pseudocode examples and analyzes their performance in terms of number of comparisons required in the worst case. Linear search requires N comparisons in the worst case, while binary search requires log N comparisons. Selection sort and bubble sort both require approximately N^2 comparisons, while heapsort requires 1.5NlogN comparisons.
This document discusses various concepts related to protection in operating systems. It covers the goals of protection which include preventing unauthorized access and enforcing access policies. The principle of least privilege is introduced which dictates that users and programs be given only necessary privileges. Access control and its basic terminology like objects, access rights and access control policies are defined. Implementation techniques for access control like access matrix, access control lists, capability lists and language-based approaches are described at a high level. The document provides an overview of key protection concepts in operating systems.
This document provides an introduction to querying XML documents using XPath and XQuery. It begins with an overview of XML and its tree structure. It then covers the basics of XPath, including path expressions and functions. XQuery is introduced as a more powerful query language that incorporates XPath and allows restructuring results. Examples are provided to demonstrate XPath and XQuery expressions for retrieving, filtering, joining, and aggregating data from XML documents. Built-in functions, sorting, and nested queries in XQuery are also discussed.
Two Phase Commit is a protocol that ensures transactions are either fully committed or aborted across multiple database sites. It uses a coordinator node that initiates a prepare phase where other nodes log transaction details and agree/disagree to commit. If all agree, the coordinator initiates a commit phase where nodes commit and acknowledge. This guarantees consistency if a node fails before completion.
This document discusses data abstraction and abstract data types (ADTs). It defines an ADT as a collection of data along with a set of operations on that data. An ADT specifies what operations can be performed but not how they are implemented. This allows data structures to be developed independently from solutions and hides implementation details behind the ADT's operations. The document provides examples of list ADTs and an array-based implementation of a list ADT in C++.
Hashing is the process of converting a given key into another value. A hash function is used to generate the new value according to a mathematical algorithm. The result of a hash function is known as a hash value or simply, a hash.
Exception handling in Python allows programmers to handle errors and exceptions that occur during runtime. The try/except block handles exceptions, with code in the try block executing normally and code in the except block executing if an exception occurs. Finally blocks ensure code is always executed after a try/except block. Programmers can define custom exceptions and raise exceptions using the raise statement.
Unit I Advanced Java Programming Courseparveen837153
This document provides information about an Advanced Java Programming course taught by Dr. S.SHAIK PARVEEN. It includes details about the course such as prerequisites, objectives, units, and basic Java syntax concepts covered. The document outlines topics like variable declarations, operators, control flow statements, arrays, and object-oriented programming concepts in Java. It aims to teach students advanced Java programming skills like implementing object-oriented principles, working with classes, methods, and threads, as well as creating applets, GUI components, and Java beans.
Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes are connected via edges (links) we always start from the root (head) node. That is, we cannot randomly access a node in a tree.
The document discusses how to create and run ActiveX controls in Visual Basic. It provides steps to create a simple calculator ActiveX control and system clock ActiveX control. The key steps include starting Visual Basic, clicking on the ActiveX control, giving the project a name, adding coding for functionality, adding the component to the toolbox, dragging it onto a form, setting the project properties, and executing the ActiveX control.
VB.NET:An introduction to Namespaces in .NET frameworkRicha Handa
VB.NET namespaces organize code by grouping related type names and reducing name collisions. Namespaces are commonly used to specify which .NET framework libraries are needed for a program. Code can be organized into hierarchies with namespaces nested within other namespaces. For example, the Button class is contained within the System.Windows.Forms namespace, which is part of the larger System namespace that contains many commonly used namespaces like System.IO and System.Collections.
This document discusses hashing and different techniques for implementing dictionaries using hashing. It begins by explaining that dictionaries store elements using keys to allow for quick lookups. It then discusses different data structures that can be used, focusing on hash tables. The document explains that hashing allows for constant-time lookups on average by using a hash function to map keys to table positions. It discusses collision resolution techniques like chaining, linear probing, and double hashing to handle collisions when the hash function maps multiple keys to the same position.
This document provides an introduction to hashing and hash tables. It defines hashing as a data structure that uses a hash function to map values to keys for fast retrieval. It gives an example of mapping list values to array indices using modulo. The document discusses hash tables and their operations of search, insert and delete in O(1) time. It describes collisions that occur during hash function mapping and resolution techniques like separate chaining and linear probing.
The document discusses various indexing techniques used to improve data access performance in databases, including ordered indices like B-trees and B+-trees, as well as hashing techniques. It covers the basic concepts, data structures, operations, advantages and disadvantages of each approach. B-trees and B+-trees store index entries in sorted order to support range queries efficiently, while hashing distributes entries uniformly across buckets using a hash function but does not support ranges.
The document discusses building graphical user interfaces (GUIs) in Java. It covers using the Abstract Window Toolkit (AWT) or Swing for GUI components, laying out components, adding event listeners, and drawing graphics. Key topics include choosing between AWT and Swing, using layout managers, implementing listeners for user interactions, and methods for drawing shapes.
Objects in JavaScript can be created using object literals, the new keyword, or Object.create(). Objects are collections of properties and methods that are mutable and manipulated by reference. Arrays are objects that represent ordered collections of values of any type and are created using array literals or the Array constructor. Common array methods include concat, join, pop, push, reverse, and sort. The Math object provides common mathematical functions like pow, round, ceil, floor, random, and trigonometric functions.
The document discusses various topics related to secondary storage and file organization in databases:
1) Secondary storage devices like magnetic disks are used to permanently store large databases and provide high storage capacity compared to main memory.
2) Files are organized on disks using various methods like heap files, sorted files, and hashing to allow efficient retrieval, insertion, and deletion of records.
3) RAID (Redundant Array of Independent Disks) technology improves disk performance using data striping across multiple disks and reliability using disk mirroring.
This document discusses different searching methods like sequential, binary, and hashing. It defines searching as finding an element within a list. Sequential search searches lists sequentially until the element is found or the end is reached, with efficiency of O(n) in worst case. Binary search works on sorted arrays by eliminating half of remaining elements at each step, with efficiency of O(log n). Hashing maps keys to table positions using a hash function, allowing searches, inserts and deletes in O(1) time on average. Good hash functions uniformly distribute keys and generate different hashes for similar keys.
The document discusses various database recovery techniques including log-based recovery, shadow paging recovery, and recovery with concurrent transactions. Log-based recovery uses a log to record transactions and supports either deferred or immediate database modification. Shadow paging maintains a shadow page table to allow recovery to a previous state. Checkpointing improves recovery performance. Recovery for concurrent transactions uses undo and redo lists constructed during the recovery process.
This document provides an overview of timestamp protocols in database management systems. It discusses how timestamps are generated and used to order transactions. The basic timestamp ordering protocol checks timestamps on read and write operations to ensure serializability. Strict timestamp ordering delays some transactions to ensure schedules are both serializable and strict. Multiversion timestamp ordering uses multiple versions of data items to allow reads to always succeed while maintaining serializability.
The document discusses various searching and sorting algorithms. It describes linear search, binary search, selection sort, bubble sort, and heapsort. For each algorithm, it provides pseudocode examples and analyzes their performance in terms of number of comparisons required in the worst case. Linear search requires N comparisons in the worst case, while binary search requires log N comparisons. Selection sort and bubble sort both require approximately N^2 comparisons, while heapsort requires 1.5NlogN comparisons.
This document discusses various concepts related to protection in operating systems. It covers the goals of protection which include preventing unauthorized access and enforcing access policies. The principle of least privilege is introduced which dictates that users and programs be given only necessary privileges. Access control and its basic terminology like objects, access rights and access control policies are defined. Implementation techniques for access control like access matrix, access control lists, capability lists and language-based approaches are described at a high level. The document provides an overview of key protection concepts in operating systems.
This document provides an introduction to querying XML documents using XPath and XQuery. It begins with an overview of XML and its tree structure. It then covers the basics of XPath, including path expressions and functions. XQuery is introduced as a more powerful query language that incorporates XPath and allows restructuring results. Examples are provided to demonstrate XPath and XQuery expressions for retrieving, filtering, joining, and aggregating data from XML documents. Built-in functions, sorting, and nested queries in XQuery are also discussed.
Two Phase Commit is a protocol that ensures transactions are either fully committed or aborted across multiple database sites. It uses a coordinator node that initiates a prepare phase where other nodes log transaction details and agree/disagree to commit. If all agree, the coordinator initiates a commit phase where nodes commit and acknowledge. This guarantees consistency if a node fails before completion.
This document discusses data abstraction and abstract data types (ADTs). It defines an ADT as a collection of data along with a set of operations on that data. An ADT specifies what operations can be performed but not how they are implemented. This allows data structures to be developed independently from solutions and hides implementation details behind the ADT's operations. The document provides examples of list ADTs and an array-based implementation of a list ADT in C++.
Hashing is the process of converting a given key into another value. A hash function is used to generate the new value according to a mathematical algorithm. The result of a hash function is known as a hash value or simply, a hash.
Exception handling in Python allows programmers to handle errors and exceptions that occur during runtime. The try/except block handles exceptions, with code in the try block executing normally and code in the except block executing if an exception occurs. Finally blocks ensure code is always executed after a try/except block. Programmers can define custom exceptions and raise exceptions using the raise statement.
Unit I Advanced Java Programming Courseparveen837153
This document provides information about an Advanced Java Programming course taught by Dr. S.SHAIK PARVEEN. It includes details about the course such as prerequisites, objectives, units, and basic Java syntax concepts covered. The document outlines topics like variable declarations, operators, control flow statements, arrays, and object-oriented programming concepts in Java. It aims to teach students advanced Java programming skills like implementing object-oriented principles, working with classes, methods, and threads, as well as creating applets, GUI components, and Java beans.
Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes are connected via edges (links) we always start from the root (head) node. That is, we cannot randomly access a node in a tree.
The document discusses how to create and run ActiveX controls in Visual Basic. It provides steps to create a simple calculator ActiveX control and system clock ActiveX control. The key steps include starting Visual Basic, clicking on the ActiveX control, giving the project a name, adding coding for functionality, adding the component to the toolbox, dragging it onto a form, setting the project properties, and executing the ActiveX control.
VB.NET:An introduction to Namespaces in .NET frameworkRicha Handa
VB.NET namespaces organize code by grouping related type names and reducing name collisions. Namespaces are commonly used to specify which .NET framework libraries are needed for a program. Code can be organized into hierarchies with namespaces nested within other namespaces. For example, the Button class is contained within the System.Windows.Forms namespace, which is part of the larger System namespace that contains many commonly used namespaces like System.IO and System.Collections.
This document discusses hashing and different techniques for implementing dictionaries using hashing. It begins by explaining that dictionaries store elements using keys to allow for quick lookups. It then discusses different data structures that can be used, focusing on hash tables. The document explains that hashing allows for constant-time lookups on average by using a hash function to map keys to table positions. It discusses collision resolution techniques like chaining, linear probing, and double hashing to handle collisions when the hash function maps multiple keys to the same position.
This document provides an introduction to hashing and hash tables. It defines hashing as a data structure that uses a hash function to map values to keys for fast retrieval. It gives an example of mapping list values to array indices using modulo. The document discusses hash tables and their operations of search, insert and delete in O(1) time. It describes collisions that occur during hash function mapping and resolution techniques like separate chaining and linear probing.
The document discusses various indexing techniques used to improve data access performance in databases, including ordered indices like B-trees and B+-trees, as well as hashing techniques. It covers the basic concepts, data structures, operations, advantages and disadvantages of each approach. B-trees and B+-trees store index entries in sorted order to support range queries efficiently, while hashing distributes entries uniformly across buckets using a hash function but does not support ranges.
The document discusses building graphical user interfaces (GUIs) in Java. It covers using the Abstract Window Toolkit (AWT) or Swing for GUI components, laying out components, adding event listeners, and drawing graphics. Key topics include choosing between AWT and Swing, using layout managers, implementing listeners for user interactions, and methods for drawing shapes.
Objects in JavaScript can be created using object literals, the new keyword, or Object.create(). Objects are collections of properties and methods that are mutable and manipulated by reference. Arrays are objects that represent ordered collections of values of any type and are created using array literals or the Array constructor. Common array methods include concat, join, pop, push, reverse, and sort. The Math object provides common mathematical functions like pow, round, ceil, floor, random, and trigonometric functions.
The document discusses various topics related to secondary storage and file organization in databases:
1) Secondary storage devices like magnetic disks are used to permanently store large databases and provide high storage capacity compared to main memory.
2) Files are organized on disks using various methods like heap files, sorted files, and hashing to allow efficient retrieval, insertion, and deletion of records.
3) RAID (Redundant Array of Independent Disks) technology improves disk performance using data striping across multiple disks and reliability using disk mirroring.
This document discusses different searching methods like sequential, binary, and hashing. It defines searching as finding an element within a list. Sequential search searches lists sequentially until the element is found or the end is reached, with efficiency of O(n) in worst case. Binary search works on sorted arrays by eliminating half of remaining elements at each step, with efficiency of O(log n). Hashing maps keys to table positions using a hash function, allowing searches, inserts and deletes in O(1) time on average. Good hash functions uniformly distribute keys and generate different hashes for similar keys.
Weka is a popular open source machine learning tool developed at the University of Waikato. It contains algorithms for data preprocessing, classification, regression, clustering, association rules, and visualization. Weka supports various data formats and can be used through a graphical user interface, command line interface, or integrated into other Java code. It contains tools for exploring and analyzing data, applying machine learning algorithms, and evaluating experimental results.
Extendible Hashing Example
Extendible hashing solves bucket overflow by splitting the bucket into two and if necessary increasing the directory size. When the directory size increases it doubles its size a certain number of times.
This document discusses depth-first search (DFS) algorithms and their applications to finding spanning trees and articulation points in graphs. It begins by explaining how DFS generalizes preorder tree traversal and avoids cycles in arbitrary graphs by marking visited vertices. DFS takes O(V+E) time on graphs. The document then explains how DFS can be used to find a depth-first spanning tree and identify biconnected components and articulation points via numbering schemes like num(v) and low(v).
Dijkstra's algorithm is used to find the shortest path between a starting vertex and any other vertex in a graph with positive edge weights. It works by maintaining a distance label for each vertex, with the starting vertex's label set to 0. It then iteratively selects the unprocessed vertex with the smallest distance label and relaxes any incident edges that improve neighboring vertices' distance labels, until all vertices have been processed. Storing predecessor vertices allows reconstruction of the shortest path.
This document discusses different graph traversal algorithms: depth-first traversal, breadth-first traversal, and their implementations. Depth-first traversal uses a stack and can output nodes in either preorder or postorder. Breadth-first traversal uses a queue and outputs nodes level-by-level. Pseudocode and examples are provided for both algorithms. Review questions ask the reader to trace the output order of different traversals on sample graphs.
This document discusses dictionaries and hashing techniques for implementing dictionaries. It describes dictionaries as data structures that map keys to values. The document then discusses using a direct access table to store key-value pairs, but notes this has problems with negative keys or large memory usage. It introduces hashing to map keys to table indices using a hash function, which can cause collisions. To handle collisions, the document proposes chaining where each index is a linked list of key-value pairs. Finally, it covers common hash functions and analyzing load factor and collision probability.
The document discusses various backtracking algorithms and problems. It begins with an overview of backtracking as a general algorithm design technique for problems that involve traversing decision trees and exploring partial solutions. It then provides examples of specific problems that can be solved using backtracking, including the N-Queens problem, map coloring problem, and Hamiltonian circuits problem. It also discusses common terminology and concepts in backtracking algorithms like state space trees, pruning nonpromising nodes, and backtracking when partial solutions are determined to not lead to complete solutions.
Gandhi responds to criticism of his views on ahimsa (non-violence) by Lala Lajpat Rai. He defines ahimsa and distinguishes between its negative and positive forms. Gandhi was inspired by religious figures and scriptures. He provides examples of how ahimsa requires courage and can be more effective than violence. True followers of ahimsa are not cowards and do not harm others for trivial reasons. Gandhi believes ahimsa is the remedy for all evil and following it can make India "the abode of gods" once more.
This document discusses rule-based classification. It describes how rule-based classification models use if-then rules to classify data. It covers extracting rules from decision trees and directly from training data. Key points include using sequential covering algorithms to iteratively learn rules that each cover positive examples of a class, and measuring rule quality based on both coverage and accuracy to determine the best rules.
The document discusses concepts related to entity-relationship modeling and database design. It covers:
1. Key concepts in entity-relationship modeling like entities, attributes, relationships and keys.
2. Different types of attributes, relationships and keys.
3. Storage concepts like primary and secondary storage, buffering, and placing records on disks.
4. File organization techniques like hashing, B-trees and file operations.
4 the relational data model and relational database constraintsKumar
The document discusses the relational data model and constraints in relational databases. It begins by defining key concepts in the relational model such as relations, tuples, attributes, domains and relation schemas. It then covers relational constraints including key constraints, entity integrity constraints, and referential integrity constraints. Examples are provided to illustrate these concepts and constraints. The chapter aims to provide an overview of the formal relational model and constraints that must hold in relational databases.
Huffman coding is a coding technique for lossless compression of data base based upon the frequency of occurance of a symbol in that file.
In huffman coding every Data is based upon 0’s and 1’s which reduces the size of file.
Using binary representation, the number of bits required to represent each character depends upon the number of characters that have to be represented. Using one bit we can represent two characters, i.e., 0 represents the first character and 1 represents the second character.
a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms.
This document summarizes several disk scheduling algorithms used in operating systems to efficiently manage disk drive access. It describes the First Come First Served (FCFS), Shortest Seek Time First (SSTF), SCAN, C-SCAN, and C-LOOK algorithms. Each algorithm is illustrated with an example request queue to show how it would schedule and service the requests to minimize the total disk head movement and seek time. The document concludes by noting that SSTF and LOOK are commonly used default algorithms but performance depends on the workload, and different algorithms may perform better for systems with heavy disk loads.
The document discusses disk scheduling algorithms used by operating systems to efficiently access disk drives. It describes common algorithms like First Come First Served (FCFS), Shortest Seek Time First (SSTF), SCAN, C-SCAN, and C-LOOK. It also summarizes RAID levels which improve reliability and performance using multiple disks, and different methods for attaching disks like host attached, network attached storage, and storage area networks.
The document discusses process management and the First Come First Serve (FCFS) CPU scheduling algorithm. It covers:
1) FCFS is the simplest scheduling algorithm that allocates the CPU to the process that requests it first. It is implemented using a FIFO queue where new processes are added to the tail.
2) FCFS can result in long waiting times for processes. An example is provided where the average waiting time is 17ms.
3) FCFS is non-preemptive and not suitable for time-sharing systems as it does not allow regular intervals of CPU allocation between users.
The document discusses various types of operands and addressing modes in x86 assembly language. It describes three basic operand types: immediate, register, and memory. It covers different addressing modes for accessing memory operands, including direct, register indirect, indexed, based, and based-indexed addressing. Examples are provided to illustrate how to copy a string and sum elements of an array using these addressing modes. Key registers used for 32-bit and 16-bit memory addressing are also outlined.
a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees.
Extendible hashing allows a hash table to dynamically expand by using an extendible index table. The index table directs lookups to buckets, each holding a fixed number of items. When a bucket fills, it splits into two buckets and the index expands accordingly. This allows the hash table size to increase indefinitely with added items while avoiding rehashing and maintaining fast access through the index.
Extendible hashing allows a hash table to dynamically expand by using an extendible index table. The index table directs lookups to buckets, each holding a fixed number of items. When a bucket fills, it splits into two buckets and the index expands accordingly. This allows the hash table size to increase indefinitely with added items while avoiding rehashing and maintaining fast access through the adjustable index.
The document discusses different methods for implementing directories and allocating disk space for files in a file system. There are two main directory implementation methods - linear list and hash table. Linear list allows simple programming but slow searches, while hash table can greatly decrease search time through mapping file names to list locations. For file allocation, contiguous allocation groups file blocks together but causes fragmentation, while linked and indexed allocation scatter blocks but require seeking between blocks. Indexed allocation supports direct access through a file index block but has higher overhead than linked allocation.
The document summarizes a lecture on DBMS internals including hash-based indexing and external sorting. It discusses static hashing which uses a fixed number of buckets and can develop long overflow chains. Extendible hashing is then introduced which uses a directory of pointers to buckets and dynamically doubles the directory and splits buckets as needed when inserting entries. The key aspects are that it can gracefully handle insertions and deletions without performance degradation and requires fewer disk I/Os than static hashing.
This document discusses spatial indexing techniques for multidimensional point data. It describes grid files which partition space into grid cells, each associated with a disk page. It also covers tree-based methods like the kd-tree which partitions space recursively based on dimension values. Z-ordering and space-filling curves like the Hilbert curve are presented as mapping multidimensional points to a linear ordering to enable range queries on a B-tree. The document compares techniques and analyzes properties like the number of disk accesses for range queries.
Hashing is a common technique for implementing dictionaries that provides constant-time operations by mapping keys to table positions using a hash function, though collisions require resolution strategies like separate chaining or open addressing. Popular hash functions include division and cyclic shift hashing to better distribute keys across buckets. Both open hashing using linked lists and closed hashing using linear probing can provide average constant-time performance for dictionary operations depending on load factor.
Hash tables store records in a bucket array using hash functions. Main memory hash tables store records directly in buckets, while secondary storage hash tables store records in blocks associated with buckets. Records are inserted by computing their hash value and storing in the corresponding bucket block. Hash tables can be static or dynamic, with dynamic tables like extendible hashing allowing the number of buckets to grow. Extendible hashing uses a level of indirection, doubles the number of buckets during growth, and splits blocks as needed during insertion. kd-trees and quad trees are data structures for multi-dimensional data that partition space using splitting planes or hyperplanes.
This document describes Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language called HiveQL, which are compiled into map-reduce jobs executed on Hadoop. Hive organizes data into tables partitioned across directories and files in HDFS. It includes a system catalog called Hive Metastore for storing schemas and statistics to optimize queries.
DB2 runs on 5 address spaces that each perform essential functions:
- DSNMSTR controls connections to other systems and performs logging, recovery, and system management.
- DSNDBM1 supports data definition, manipulation, and retrieval.
- IRLMPROC controls concurrent data access and maintains integrity through locking.
- DSNDIST enables remote access to distributed databases.
- DSNSPAS provides an isolated environment to execute stored procedures.
This document provides an outline for exploring data in R. It discusses loading packages like tidyverse and reading data into R from files. Common functions for exploring data frames are demonstrated, like dim(), str(), and table(). Data manipulation with dplyr is explained through functions like filter(), select(), arrange(), and mutate() to subset, select variables, sort, and create new variables. Pipe operators (%>%) can combine multiple data transformation steps into a readable workflow.
VTU 3RD SEM UNIX AND SHELL PROGRAMMING SOLVED PAPERSvtunotesbysree
This document contains information about a UNIX and Shell Programming exam, including:
- The exam is for a 4th semester BE degree and covers UNIX and Shell Programming topics.
- It has two parts (A and B) and students must answer 5 full questions selecting at least 2 from each part.
- Part 1 covers topics like UNIX architecture, parent-child relationships, file systems, and file permissions.
- Part 2 covers topics like grep commands, sed editing, regular expressions, shell features, AWK and Perl programming.
The document discusses indexing and hashing techniques in database management systems. It begins by explaining the basic concept of indexing, noting that indexes work similarly to book indexes by allowing efficient searching for records. It then lists several factors for evaluating indexing techniques, such as access time, insertion/deletion time, and space overhead. The document goes on to explain multi-level indexing with an example involving multiple index levels to handle very large files. It also differentiates between dense and sparse indexes, noting sparse indexes require less space and maintenance overhead. The document concludes by explaining hash file organization with an example using a hash function to map records to disk blocks.
Unlock user behavior with 87 Million events using Hudi, StarRocks & MinIOnadine39280
Understanding conversion funnel and rates is essential for deciphering e-commerce shopping behavior. In this live event, Albert Wong from StarRocks will provide an anonymized, real-world customer dataset featuring 87 million events and 4 million unique products spanning 10,000 product categories. He'll showcase how to deploy a modern data lakehouse with hashtag#ApacheHudi, and MinIO, then conduct complex analytics, including JOIN operations, to analyze purchasing patterns and product conversion rates with hashtag#StarRocks as the analytical engine.
You can catch the live event:
https://youtu.be/-Wp7itPDtgo
This document provides information about dictionaries and hash tables. It defines dictionaries as dynamic sets that support operations like insertion, deletion, and searching. Hash tables are described as an efficient implementation of dictionaries that map keys to array positions using a hash function. The document discusses hash functions, collisions, open and closed addressing techniques to handle collisions, and qualities of good hash functions.
This document discusses indexing mechanisms used to speed up data access in databases. It begins by introducing ordered indices, which store keys in sorted order, and hash indices, which distribute keys uniformly across buckets. B+-tree indices are then presented as an alternative to indexed-sequential files that can efficiently handle insertions and deletions without full reorganizations. The structure and operations of B+-trees, including insertion, deletion, and queries, are explained in detail over multiple pages.
This document provides an overview of the Linux file system. It describes the four types of items that can be stored in a Linux file system: ordinary files, directory files, device files, and links. It then discusses the typical directory structure, with directories like /bin, /home, and /usr. The rest of the document outlines important commands for directory and file handling, such as ls, cd, cp, and rm. It also covers making hard and soft links, specifying multiple filenames, setting file permissions, and finding/sorting files.
Dear students get fully solved assignments
Send your semester & Specialization name to our mail id :
“ help.mbaassignments@gmail.com ”
or
Call us at : 08263069601
(Prefer mailing. Call in emergency )
Hashing and File Structures in Data Structure.pdfJaithoonBibi
Hashing is a technique for storing data in an array such that each element is assigned a unique location based on its key value. This allows for constant time retrieval but collisions can occur when two elements hash to the same location. Collision resolution techniques like chaining, linear probing, quadratic probing, and double hashing are used to handle collisions. File structures like sequential, indexed, and relative organization are used to store records on storage devices efficiently with different access methods. Indexing uses a separate index file to speed up retrieval by mapping keys to record locations.
The document discusses backtracking algorithms. It begins by defining backtracking as a methodical way to try different sequences of decisions to solve a problem until a solution is found. It then provides examples of backtracking for finding a maze path and coloring a map. The key aspects of a backtracking algorithm are that it uses depth-first search to recursively explore choices, pruning paths that do not lead to solutions.
The document discusses the divide and conquer algorithm design strategy. It begins by explaining the general concept of divide and conquer, which involves splitting a problem into subproblems, solving the subproblems, and combining the solutions. It then provides pseudocode for a generic divide and conquer algorithm. Finally, it gives examples of divide and conquer algorithms like quicksort, binary search, and matrix multiplication.
This document discusses randomized data structures and algorithms. It begins by motivating randomized data structures as a way to transform average case runtimes into expected runtimes that are not dependent on specific inputs. It then provides examples of randomized data structures like treaps and randomized skip lists that provide efficient operations like insertion, deletion, and search in expected logarithmic time. It also discusses how randomization can be applied in algorithms like primality testing.
This document discusses randomized data structures and algorithms. It begins by motivating randomized data structures by noting that some data structures like binary search trees have average case performance but worst case inputs. Randomizing the data structure removes dependency on inputs and provides expected case performance. The document then discusses treaps and randomized skip lists as examples of randomized data structures that provide efficient expected case performance for operations like insertion, deletion, and search. It also covers topics like randomized number generation, primality testing, and how randomization can transform average case runtimes into expected case runtimes.
Skip lists are a data structure for implementing dictionaries. They consist of multiple sorted lists, with the top list containing all elements and lower lists being subsequences. Searching works by dropping down lists until finding the target element or determining it is absent. Insertion and deletion use a randomized algorithm adding/removing elements from the appropriate lists. Analysis shows the expected space is O(n) and search, insertion and deletion times are O(log n), with these bounds also holding with high probability. Skip lists provide fast, simple dictionary implementation in practice.
Dynamic programming is an algorithm design paradigm that can be applied to problems exhibiting optimal substructure and overlapping subproblems. It works by breaking down a problem into subproblems and storing the results of already solved subproblems, rather than recomputing them multiple times. This allows for an efficient bottom-up approach. Examples where dynamic programming can be applied include the matrix chain multiplication problem, the 0-1 knapsack problem, and finding the longest common subsequence between two strings.
Dynamic programming is a technique for solving problems with overlapping subproblems and optimal substructure. It works by breaking problems down into smaller subproblems and storing the results in a table to avoid recomputing them. Examples where it can be applied include the knapsack problem, longest common subsequence, and computing Fibonacci numbers efficiently through bottom-up iteration rather than top-down recursion. The technique involves setting up recurrences relating larger instances to smaller ones, solving the smallest instances, and building up the full solution using the stored results.
This document summarizes a talk on dynamic graph algorithms. It begins with an introduction to dynamic graph algorithms, which involve maintaining a graph structure and answering queries efficiently as the graph undergoes a sequence of edge insertions and deletions. It then discusses several examples of fully dynamic algorithms for problems like connectivity, minimum spanning trees, and graph spanners. A key data structure introduced is the Euler tour tree, which represents a dynamic tree as a one-dimensional structure to support efficient updates and queries. The document concludes by outlining a fully dynamic randomized algorithm for maintaining connectivity under edge updates with polylogarithmic update time, using a hierarchical approach with multiple levels of edge partitions and ET trees.
This document discusses divide-and-conquer algorithms and their time complexities. It begins with examples of finding the maximum of a set and binary search. It then presents the general steps of a divide-and-conquer algorithm and analyzes time complexity. Several algorithms are discussed including quicksort, merge sort, 2D maxima finding, closest pair problem, convex hull problem, and matrix multiplication. Strategies like divide, conquer, and merge are used to solve problems recursively in fewer comparisons than brute force methods. Many algorithms have a time complexity of O(n log n).
This document discusses the merge sort algorithm for sorting a sequence of numbers. It begins by introducing the divide and conquer approach, which merge sort uses. It then provides an example of how merge sort works, dividing the sequence into halves, sorting the halves recursively, and then merging the sorted halves together. The document proceeds to provide pseudocode for the merge sort and merge algorithms. It analyzes the running time of merge sort using recursion trees, determining that it runs in O(n log n) time. Finally, it covers techniques for solving recurrence relations that arise in algorithms like divide and conquer approaches.
This document discusses divide-and-conquer algorithms and their time complexities. It begins with examples of finding the maximum of a set and binary search. It then presents the general steps of a divide-and-conquer algorithm and analyzes time complexity. Several algorithms are discussed including quicksort, merge sort, 2D maxima finding, closest pair problem, convex hull problem, and matrix multiplication. Strategies like divide, conquer, and merge are used to solve problems recursively in fewer comparisons than brute force methods. Many algorithms have a time complexity of O(n log n).
The document discusses greedy algorithms and provides examples of problems that can be solved using greedy techniques. It introduces the coin changing problem and activity selection problem. For activity selection, it demonstrates that a greedy approach of always selecting the activity with the earliest finish time results in an optimal solution. It provides pseudo-code for a greedy algorithm and proves that the greedy solution is optimal for the activity selection problem by showing there is always an optimal solution that makes the greedy choice and combining the greedy choice with the optimal solution to the remaining subproblem yields an optimal solution to the original problem.
The document discusses various optimization problems that can be solved using the greedy method. It begins by explaining that the greedy method involves making locally optimal choices at each step that combine to produce a globally optimal solution. Several examples are then provided to illustrate problems that can and cannot be solved with the greedy method. These include shortest path problems, minimum spanning trees, activity-on-edge networks, and Huffman coding. Specific greedy algorithms like Kruskal's algorithm, Prim's algorithm, and Dijkstra's algorithm are also covered. The document concludes by noting that the greedy method can only be applied to solve a small number of optimization problems.
This document discusses greedy algorithms and provides examples of their use. It begins by defining characteristics of greedy algorithms, such as making locally optimal choices that reduce a problem into smaller subproblems. The document then covers designing greedy algorithms, proving their optimality, and analyzing examples like the fractional knapsack problem and minimum spanning tree algorithms. Specific greedy algorithms covered in more depth include Kruskal's and Prim's minimum spanning tree algorithms and Huffman coding.
The document outlines various data structures and algorithms for implementing dictionaries and hash tables, including:
- Separate chaining, which handles collisions by storing elements that hash to the same value in a linked list. Find, insert, and delete take average time of O(1).
- Open addressing techniques like linear probing and quadratic probing, which handle collisions by probing to alternate locations until an empty slot is found. These have faster search but slower inserts and deletes.
- Double hashing, which uses a second hash function to determine probe distances when collisions occur, reducing clustering compared to linear probing.
This document discusses hashing and hash tables. It begins by introducing hash tables and describing how hashing works by mapping keys to array indices using a hash function to allow for fast insertion, deletion and search operations in O(1) average time. However, hash tables do not support ordering of elements efficiently. The document then discusses issues with hash functions such as collisions when different keys map to the same index. It describes techniques for collision resolution including separate chaining, where each index points to a linked list, and open addressing techniques like linear probing, quadratic probing and double hashing that resolve collisions by probing alternate indices in the array.
This document summarizes key points about extendible hashing and discusses its use for a spelling dictionary case study. Extendible hashing is a hashing technique that optimizes disk accesses for huge datasets by storing hash buckets in disk blocks. It uses a directory to hash to the correct bucket. The document explains how to insert keys, split buckets, and rehash the table. It also discusses solutions for the spelling dictionary case study, comparing storage and time efficiency of sorted arrays, open hashing, and closed hashing with linear probing.
The document discusses searching data structures like binary search trees and linked lists. It provides pseudocode for iterative searching algorithms on both ordered and unordered linked lists. The algorithms traverse the list by iterating through each node until the target value is found or the end is reached. For ordered lists, searching can stop early if the current node value is greater than the target. Tracing examples are provided to demonstrate searching for a value in sample linked lists.
1) Tree data structures involve nodes that can have zero or more child nodes and at most one parent node. Binary trees restrict nodes to having zero, one, or two children.
2) Binary search trees have the property that all left descendants of a node are less than the node's value and all right descendants are greater. This property allows efficient searches, inserts, and deletes that take O(log n) time on average.
3) Trees can become unbalanced over many insertions and deletions, affecting performance of operations. Various self-balancing binary search tree data structures use tree rotations to maintain balance.
This document discusses binary search trees (BSTs) and their use for dynamic sets. It covers BST operations like search, insert, find minimum/maximum, and successor/predecessor. It also discusses how BSTs can be used to sort in O(n log n) time by inserting elements in order and performing an inorder traversal, similar to quicksort. Maintaining a height of O(log n) for BSTs is discussed as an area for future improvement.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
1. File Structures SNU-OOPSLA Lab. 1
Chap12. Extendible Hashing
서울대학교 컴퓨터공학부
객체지향시스템연구실
SNU-OOPSLA-LAB
교수 김 형 주
File Structures by Folk, Zoellick and Riccardi
2. File Structures SNU-OOPSLA Lab. 2
Chapter ObjectivesChapter Objectives
Describe the problem solved by extendible hashing and related
approaches
Explain how extendible hashing works; show how it combines
tries with conventional, static hashing
Use the buffer, file, and index classes of previous chapters to
implement extendible hashing, including deletion
Review studies of extendible hashing performance
Examine alternative approaches to the same problem, including
dynamic hashing, linear hashing, and hashing schemes that
control splitting by allowing for overflow buckets
3. File Structures SNU-OOPSLA Lab. 3
ContentsContents
12.1 Introduction
12.2 How extendible hashing works
12.3 Implementation
12.4 Deletion
12.5 Extendible hashing performance
12.6 Alternative approaches
4. File Structures SNU-OOPSLA Lab. 4
12.1 Introduction
Dynamic files
undergo a lot of growths
Static hashing
described in chapter 11 (direct hashing)
typically worse than B-Tree for dynamic files
eventually requires file reorganization
Extendible hashing
hashing for dynamic file
Fagin, Nievergelt, Pippenger, and Strong (ACM TODS 1979)
5. File Structures SNU-OOPSLA Lab. 5
Overview(1)Overview(1)
Direct access (hashing) files have static size, so not
suitable for files whose size is unknown in advance
Dynamic file structure is desired which retains the feature
of fast retrieval by primary key, and which also expands
and contracts as the number of records in the file
fluctuates (without reorganizing the whole file)
Similar motivation!
Indexed-sequential File ==> B tree
Hashing ==> Extendible Hashing
6. File Structures SNU-OOPSLA Lab. 6
Overview(2)Overview(2)
Extendible Hashing
Primary key H(key)
Hashing function
Directory
Index
Extract first d digit
File pointerTable look-up
7. File Structures SNU-OOPSLA Lab. 7
12.2 How Extendible Hashing works12.2 How Extendible Hashing works
Idea from Tries file (radix searching)
The branching factor of the tree is equal to the # of
alternative symbols in each position of the key
e.g.) Radix 26 trie - able, abrahms, adams, anderson,
adnrews, baird
Use the first n characters for branching
a
b
b
d
n
l
r
d
e
r
able
abrahms
adams
anderson
andrews
baird
8. File Structures SNU-OOPSLA Lab. 8
Extendible HashingExtendible Hashing
H maps keys to a fixed address space, with size the largest
prime less than a power of 2 (65531 < 216
)
File pointers point to blocks of records known as buckets,
where an entire bucket is read by one physical data transfer,
buckets may be added to or removed from the file dynamically
The d bits are used as an index in a directory array containing
2d
entries, which usually resides in primary memory
The value d, the directory size(2d
), and the number of buckets
change automatically as the file expands and contracts
10. File Structures SNU-OOPSLA Lab. 10
Turning the trie into a directoryTurning the trie into a directory
Using Trie for extendible hashing
(1) Use Radix 2 Trie :
Keys in A : beginning with 0
Keys in B : beginning with 10
Keys in C : beginning with 11
(2) Retrieving from secondary storage the buckets containing
keys, instead of individual keys
A
B
C
0
1 0
1
11. File Structures SNU-OOPSLA Lab. 11
Representation of Trie (1)Representation of Trie (1)
Tree is not preferable (directory is not big)
A flattened array
1. Make a complete full binary tree
2. Collapse it into the directory structure
0
1
0
1
0
1
C
A
B
00
01
10
11
A
B
C
12. File Structures SNU-OOPSLA Lab. 12
Representation of Trie(2)Representation of Trie(2)
Directory is a complete binary tree
Directory entry : a pointer to the associated bucket
Given an address beginning with the bits 10, the 210
directory
entries
Introduced for uniform distribution
13. File Structures SNU-OOPSLA Lab. 13
Retrieve a recordRetrieve a record
Steps in retrieving a record with a given key
find H(given key)
extract first d bits of H(given key)
use this value as an index into the directory to find a pointer
use this pointer to read a bucket into primary memory
locate the desired record within the bucket (scan)
14. File Structures SNU-OOPSLA Lab. 14
Expansion & Contraction(1)Expansion & Contraction(1)
A pair of adjunct buckets with the same value of d’ which
share a common value of the first d’-1 bits of H(key) can
be combined if the average load < 50%, so all records
would be able to fit into one bucket
File contraction is the reverse of expansion; the directory
can be compacted and d decremented whenever all pairs
of pointers have the same values
17. File Structures SNU-OOPSLA Lab. 17
Splitting to Handle Overflow (1)Splitting to Handle Overflow (1)
When overflow occurs
e.g.1) Overflowing of bucket A
Split A into A and D
Come to use additional unused bits
No need to expand the directory
00
01
10
11
B
C
A
D
00
01
10
11
A
B
C
18. File Structures SNU-OOPSLA Lab. 18
Splitting to Handle Overflow(2)Splitting to Handle Overflow(2)
e.g. Overflowing of bucket B
Do not have additional unused bits
(need to expand the directory)
1. Divide B using 3 bits of hash address
2. Make a complete full binary tree
3. Collapse it into the directory structure
00
01
10
11
A
B
C
19. File Structures SNU-OOPSLA Lab. 19
A
B
C
D
0
1 0
1
0
1
0
1
0
10
1 0
1
0
1 0
1
0
1
A
B
D
C
000
001
010
011
A
100
101
110
111
C
B
D
1. Result of overflow of bucket B
3. Directory
2. Complete Binary Tree
20. File Structures SNU-OOPSLA Lab. 20
Creating Address
Function hash(KEY)
Fold/Add hashing algorithm
Do not MOD hashing value by address space since no fixed
address space exists
Output from the hash function for a number of keys
bill 0000 0011 0110 1100
lee 0000 0100 0010 1000
pauline 0000 1111 0110 0101
alan 0100 1100 1010 0010
julie 0010 1110 0000 1001
mike 0000 0111 0100 1101
elizabeth 0010 1100 0110 1010
mark 0000 1010 0000 0111
21. File Structures SNU-OOPSLA Lab. 21
Int Hash (char * key)
{
int sum = 0;
int len = strlen(key);
if (len % 2 == 1) len ++; // make len even
for (int j = 0; j < len; j+2)
sum = (sum + 100 * key[j] + key[j+1]) % 19937;
return sum;
}
Figure 12.7 Function Hash (key) returns an integer hash value for key
for a 15 bit
22. File Structures SNU-OOPSLA Lab. 22
Int MakeAddress (char * key, int depth)
{
int retval = 0;
int hashVal = Hash(key);
// reverse the bits
for (int j = 0; j < depth; j++)
{
retval = retval << 1;
int lowbit = hashVal & 1;
retval = retval | lowbit;
hashVal = hashVal >> 1;
}
return retval;
}
Figure 12.9 Function MakeAddress(key,depth)
23. File Structures SNU-OOPSLA Lab. 23
Class Bucket: protected TextIndex
{protected:
Bucket (Directory & dir, int maxKeys = defaultMaxKeys);
int Insert (char * key, int recAddr);
int Remove(char * key);
Bucket * Split ();
int NewRange (int & newStart, int & newEnd);
int Redistribute (Bucket & newBucket);
int FindBuddy ();
int TryCombine ();
int Combine (Bucket * buddy, int buddyIndex);
int Depth;
Directory & Dir;
int BucketAddr;
friend class Directory;
friend class BucketBuffer;
}; Figure 12.10 Main members of class Bucket
24. File Structures SNU-OOPSLA Lab. 24
class Directory
{public:
Directory (…..); ~Directory();
int Open (..); int Create(…); int Close();
int Insert(…); int Delete(…); int Search(…);
protected
int DoubleSize();
int Collape();
int InsertBucket (….);
int Find (…);
int StoreBucket(…);
int LoadBucket(…)
…..
}
Figure 12.11 Definition of class Directory
25. File Structures SNU-OOPSLA Lab. 25
12.4 Deletion12.4 Deletion
When to combine buckets
Buddy buckets: the buckets are siblings and at the leaf level
of the tree (Buddy means something like friend)
e.g., B and D in page 19 are buddy buckets
Examine the directory to see if we can make changes
there
Shrink the directory if none of the buckets requires the depth
of address information that is currently available in the
directory
26. File Structures SNU-OOPSLA Lab. 26
Buddy BucketBuddy Bucket
Given a bucket with an address uvwxy, where u,
v, w, x, and y have values of either 0 or 1, the
buddy bucket, if it exists, has the value uvwxz,
such that
z = y XOR 1
If enough keys are deleted, the contents of buddy
buckets can be combined into a single bucket
27. File Structures SNU-OOPSLA Lab. 27
Collapsing the DirectoryCollapsing the Directory
Collapse condition
If a single cell, downsizing is impossible
If there is a pair of directory cells that do not both point to the
same bucket, collapsing is impossible
Allocating space
Allocate half the size of the original
Copy the bucket references shared by each cell pair to a single
cell in the new directory
28. File Structures SNU-OOPSLA Lab. 28
12.5 Extendible Hashing Performance12.5 Extendible Hashing Performance
Time : O(1)
If the directory can kept in RAM: a single access
Otherwise: two accesses are necessary
Space utilization of the bucket
r (# of records), b (block size), N (# of Blocks)
Utilization = r / bN
Average utilization ==> 0.69
Space utilization for the directory
How large a directory should we expect to have,
given an expected number of keys?
Expected value for the directory size by Flajolet(1983)
Estimated directory size =3.92 / b X r(1+1/b)
29. File Structures SNU-OOPSLA Lab. 29
Periodic and fluctuating
With uniform distributed addresses, all the buckets tend to fill up at the
same time -> split at the same time
As buffer fills up : 90%
After a concentrated series of splits : 50%
r : # of records , b : block size
N ~= 4/(b ln 2)
Utilization = r / bN ~= ln 2 = 0.69
Average utilization of 69%
B tree space utilization
Normal B-tree : 67%, B-tree with redistribution in insertion : 85 %
Space utilization for buckets
30. File Structures SNU-OOPSLA Lab. 30
12.6 Alternative Approaches(1):12.6 Alternative Approaches(1): Dynamic Hashing
Similar to dynamic extendible hashing
Use a directory to track bucket addresses
Extend the directory through the use of tries
Start with a hash function that covers an address space of
a fixed size
When overflow occurs
splits forming the leaves of a trie that grows down from the
original address node makes a trie
31. File Structures SNU-OOPSLA Lab. 31
Two kinds of nodes
External node: reference a data bucket
Internal node: point to two children index nodes
When a node has split children, it changed from an external
node to an internal node
Two hash functions
Apply the first hash function original address space
if external node is found : search is completed
if internal node is found : apply second hash function
Alternative Approaches(2):Alternative Approaches(2): Dynamic Hashing
32. File Structures SNU-OOPSLA Lab. 32
1 2 3 4
41 2 3
40 41
41 3
1
410
20 21 41
411
2
Original
address
space
Original
address
space
Original
address
space
(a)
(b)
(c)
33. File Structures SNU-OOPSLA Lab. 33
Dynamic Hashing vs. Extendible Hashing(1)Dynamic Hashing vs. Extendible Hashing(1)
Overflow handling
Both schemes extend the hash function locally, as a binary search
trie
Both schemes use directory structure
Dynamic hashing: a linked structure
Extendible hashing: perfect tree expressible as an array
Space Utilization
both schemes is the same (space utilization : 69%)
34. File Structures SNU-OOPSLA Lab. 34
Dynamic Hashing and Extendible Hashing(2)Dynamic Hashing and Extendible Hashing(2)
Growth of directory
Dynamic hashing: slower, more gradual growth
Extendible hashing: extend directory by doubling it
Actual size of an index node
Dynamic hashing is lager than a directory cell in extendible
hashing (because of pointers)
Page fault
Dynamic hashing: more than one page fault (with linked structure
for the directory)
Extendible hashing: single page fault
35. File Structures SNU-OOPSLA Lab. 35
Alternative Approaches(3):Alternative Approaches(3): Linear Hashing
Unlike extendible hashing and dynamic hashing, linear hashing does
not use a directory.
The actual address space is extended one bucket at a time as
buckets overflow
Because the extension of the address space does not necessarily
correspond to the bucket that is overflowing,
linear hashing necessarily involves the use of overflow buckets, even
as the address space expands
No directories: Avoid additional seek resulting from additional layer
Use more bits of hashed value
hd(k) : depth d hashing function (using function make_address)
36. File Structures SNU-OOPSLA Lab. 36
a b c d
00 01 10 11
a b c d A
w
00 01 10 11 100 101
a b c d A B
x
a b c d A B C
00 01 10 11 100 101 110
x
y
(a) (b)
(c) (d)
(continued...)
The growth of address space in linear hashing(1)
000 01 10 11 100
37. File Structures SNU-OOPSLA Lab. 37
a b c d A B C D
00 01 10 11 100 101 110 111
x
(e)
The growth of address space in linear hashing(2)
38. File Structures SNU-OOPSLA Lab. 38
Alternative Approaches(5)Alternative Approaches(5)
::Approaches to Controlling SplittingApproaches to Controlling Splitting
Postpone splitting: increase space utilization
B-Tree: redistribution rather than splitting
Hashing: placing records in chains of overflow buckets to
postpone splitting
Triggering event for splitting
Linear hashing
Every time any bucket overflows
Not split overflowing bucket
Litwin(1980): overall load factor of the file
Below 2 seeks, 75% ~ 80% storage utilization
39. File Structures SNU-OOPSLA Lab. 39
Alternative Approaches(5)Alternative Approaches(5)
::Approaches to Controlling SplittingApproaches to Controlling Splitting
Postpone splitting for extensible hashing
Use chaining overflow bucket
Avoid doubling directory space
1.1 seek, 76% ~ 81% storage utilization
40. File Structures SNU-OOPSLA Lab. 40
Let’s Review !!!Let’s Review !!!
12.1 Introduction
12.2 How extendible hashing works
12.3 Implementation
12.4 Deletion
12.5 Extendible hashing performance
12.6 Alternative approaches