SlideShare a Scribd company logo
1 of 7
Download to read offline
1
CSCI-B 561 Advanced Database
Concepts Project Report
Buffer Trees - Utility and Applications for
External Memory Data Processing
Milind Gokhale
mgokhale@indiana.edu
November 16, 2014
1 ABSTRACT
Now-a-days, due to the very large amounts of data, dependence on External Memory for data
processing has increased tremendously. However there aren’t many generic External Memory
tools designed for processing the data in a database on the external memory.
This report will focus on the basics of buffer tree and some of the possibilities of its utility as
a generic tool for processing data on the external memory. We introduce the problem of bottleneck
in external memory data processing and the motivation for creation of Buffer Trees. We then
describe the Buffer Tree data structure and observations of some experiments conducted on Buffer
Trees in [1]. Finally we enlist some possible applications of Buffer Trees and conclude.
2 INTRODUCTION
Today users have plenty of high quality and high resolution data available through various
technologies and more data keeps on generating in various domains and fields. So the passage of
huge data sets between External memory and Internal memory of computers has become
commonplace. However there is a vast difference between data access speeds on internal memory
and external memory. Internal memory is very fast while external memory is about 105
to 106
times slower in performing random access than the main memory. This has resulted in the growing
demand for high performance input and output mechanisms to pass the huge data between fast
internal memory and slower external storage. The I/O bandwidth is a bottleneck in many large
scale applications like multimedia, GIS, land information seismic databases, virtual reality
applications, satellite imagery, digital libraries and real-time online applications.
3 MOTIVATION
There is an issue of bottleneck of communication between the internal memory and external
memory. The present methodologies for addressing this issue are [1]:
Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing
2
1. Increasing secondary storage device parallelism - thus improving the bandwidth
between secondary memory and main memory.
2. Exploiting locality reference via organization of the data and processing sequence
3. Overlapping I/O with computation, e.g. using prefetching.
Much work has been done on designing the external versions of data structures designed for
internal memory. Mostly these data structures are used in on-line setting where queries should be
answered immediately and within a good worst case number of I/Os. Since they are used in the
on-line setting, they often do not take the advantage of the available main memory [4].
There are many times when the problem or the system is composed of batch setting where
similar processing operations are performed on many data sets. Problems where the sequence of
operations on the data structure are known in advance are known as batched dynamic problems.
Bulk operations help in processing such batched dynamic problems. A bulk operation is a
collection of individual operations that are executed in consecution without being interrupted by
other requests [3]. Typically in industry - bulk order processing, end-of-day job processing, pre-
provisioning of logical resources, temporal and spatial database processing etc. are some of the
biggest batched problems where bulk operations are performed. In batched problems although the
queries are not required to be answered instantly like in on-line setting, however there are tight
service level agreements to complete processing enormous records in a rather short time.
An important paradigm for batched problems in internal memory setting is to use dynamic
data structure to process a sequence of updates [2]. For example – to sort n items, we can insert
them in one by one in the priority queue, followed by a sequence of N deletemin operations.
However if the same paradigm is used naively in the External Memory (EM) setting, with a B-tree
as the dynamic data structure, it will result into sub-optimal I/O performance [2]. For example if
we use the B-tree as a priority queue in sorting, each update and query operation takes O(logBN)
I/Os, thus resulting in a total of O(NlogBN) I/Os which is larger than the optimal sort (N) by a
substantial factor of roughly B . L. Arge developed an elegant buffer tree data structure to support
batched dynamic operations, as in the sweep-line example, where the queries do not have to be
answered right away or in any particular order [4].
4 BUFFERING METHOD AND BUFFER TREE
The I/O Model being used: We list the terms that we use to denote various components of the
I/O model.
i. N = number of elements in the problem instance.
ii. M = number of elements that can fit into the main memory.
iii. B = number of elements per block.
iv. n = N/B = number of blocks in the problem
v. m = M/B = number of blocks that fit in to the main memory
vi. M < N
vii. 0 < B < M/2
Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing
3
viii. An I/O operation is a swap of B elements from the internal memory with B consecutive
elements from external memory.
4.1 WHAT IS THE BUFFERING TECHNIQUE?
Main idea of buffering technique is to perform operations on an external tree data structure in
a lazy manner [4]. This can be achieved by associating the main memory sized buffers with the
internal nodes of the tree. We assign buffers of size m blocks to each of the internal nodes of the
structure. When inserting an element we do not search down the tree for the relevant leaf right
away but wait until we have collected a batch of insertions (or other operations) and then we insert
this block into the buffer of the root. If the buffer runs full, then buffer overflow is said to have
occurred and in such a case the buffer emptying process pushes the elements in the buffer one level
down to the buffers on the next level of the tree [4]. The advantage of buffering is that as a result
of the laziness, we can have several insertions and deletions of the same elements in the tree at the
same time [4].
4.2 BUFFER TREE
4.2.1 Data Structure
Figure 1
It is an a,b tree with a = m/4 and b = m over n
leaves containing B elements each extended with a
buffer of size m attached to each node [3]. All leaves
are on the same level and all nodes have a fan-out
between m/4 and m i.e. between a and b. Internal
nodes are the nodes that do not have leaves as
children. While Leaf nodes are nodes that are not
internal nodes. The height of the tree is O (logmn) and
the structure uses O (n) space for the n elements.
Figure 2
Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing
4
4.2.2 Operations on Buffer Tree
a. Update:
A request element is created consisting of record to be inserted or deleted, a flag
indicating the type of operation and an automatically generated timestamp [1]. Requests
are collected in internal memory until a block of B requests has been formed. The request
elements as a block are inserted into the buffer of the root using one I/O.
b. Buffer Emptying process:
If the buffer of the root contains less than m/2 blocks then nothing is to be done. If there
are more than m/2 blocks in the root, then buffer is emptied using buffer emptying process
[1]. Buffer emptying process at an internal node requires O (m) I/Os since we load m/2
blocks into the internal memory and distribute the elements among theta (m) children of
that node. Throughout the buffer emptying process the process maintains an invariant that
the buffers of the nodes on the path from the root to the leaf node with full buffer are all
empty. The buffer emptying process is not applied on all the internal nodes recursively, but
is rather applied along the path from root to leaf node. This is done to prevent different
rebalancing operations from interfering with each other. The deletion of a block may
involve initiation of several buffer emptying process at the node involved. The buffer
emptying process can be protected from interference from other processes by using dummy
blocks [1].
c. Rebalancing:
Buffer emptying process at the leaf may require rebalancing the underlying a,b tree. An
a,b tree is rebalanced by performing a series of "fuse" operation in the case of insertion and
"share" operation in the case of a delete [1]. Before performing a rebalance operation, we
ensure that the buffers for the corresponding nodes are empty. This is achieved by doing
buffer emptying process at the node involved.
4.2.3 I/O complexity analysis of the buffer tree operations [1]
a. Update:
Each update element on insertion into the root buffer = O((logmn)/B).
Each block in the buffer of any node v = O(height of the tree rooted at v).
b. Buffer Emptying:
Buffer emptying process at any node = O(m). So ignoring the cost of rebalancing, the
total cost of all buffer emptying process on an internal node bounded by O(nlogmn) I/Os.
c. Rebalancing:
Total number of rebalance operations required in a,b tree where b>2a, over k update
operations on an initially empty a,b tree is bounded by k/(b/2 - a).
d. Theorem
The total cost of an arbitrary sequence of intermixed insert and delete operation on an
initially empty buffer tree is O(nlogmn), i.e. the amortized cost of an operation is
O((logmn)/B) I/Os. The tree used O(n) space [4].
Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing
5
5 EXPERIMENTS AND COMPARISONS
As seen in [1], in order to obtain meaningful performance results, several tests were performed
on buffer trees. Some factors like Contention with other processes for machine resources and
Virtual memory effects were controlled.
5.1 COMPARISON TO QUICK SORT
Figure 3 [1]
Sorting executed with buffer tree (Buffer Tree Sort BTS) outperformed the build in internal
memory quicksort. Although for smaller inputs the internal quicksort requires less running time,
for large input sets, quicksort ran out of memory. As per the test results [1] as seen in figure 3, the
internal quicksort failed due to lack of internal memory for problem sizes with ‘n’ larger than 2.8
million items.
5.2 BUFFER TREE TUNING
Figure 4 Number of block pushes for different b [1] Figure 5 running time of BTS for different b [1]
Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing
6
It was found that the values of a and b in terms of m is important for the performance of buffer
tree sort (BTS) as seen in figure 4. Adjusting parameter b reduces the number of I/O operations
performed by BTS. The values (a, b) = (m/32, m/8) gave the best performance in terms of running
time (almost linear). Along with optimal values of a and b, reducing the fan-out increases the
expected size of data in a block push while buffer emptying process and thus reduces the required
number of I/O operations as seen in figure 5.
6 APPLICATIONS
There are places where buffer trees can be utilized like - Buffer trees can be used as a
subroutine in the standard sweep algorithm in order to get an optimal External Memory algorithm
for orthogonal segment intersection. Buffer trees can also be extended to implement segment trees
in external memory in a batched dynamic setting by reducing the node degrees theta (root m) and
by introducing multi-slabs in each node. Buffer trees provide natural amortized implementation of
the priority queue for time-forward processing applications such as discrete event simulation,
sweeping, and list ranking.
6.1 SORTING
First N items are inserted in the buffer tree. Then write/empty operations are performed. For
this the buffer emptying process is started at the root of the buffer tree to all the way down to the
leaves. Then the leaves of the buffer tree are read sequentially from left to right to obtain the
elements in a sorted order. This can be done in complexity of computing the buffer tree data
structure i.e. O (nlogmn) time. Thus the Corollary: N elements can be sorted in O (nlogmn) I/O
operations using the buffer tree [1]. In practice however, other factors like CPU time can also affect
the running time.
6.2 PRIORITY QUEUES
In general, the leftmost leaf of the search tree contains the smallest element. In a buffer tree,
the smallest element need not be in the leftmost leaf. A buffer tree can be used for maintaining
priority queue in external memory by permitting update operation into priority queue and adding
deletemin operation. So in order to extract the minimum element, first the buffer emptying process
is performed on all the nodes on the path from root the leftmost leaf. Hence the leftmost leaf
contains the B smallest elements, and the children of the leftmost node in the buffer tree consists
of at least Bm/4 smallest elements. So when the deletemin operation is executed, at least Bm/4
deletemin operations can be answered without doing additional I/Os and thus the amortized cost
of operation is also reduced. Thus we have Theorem 2: The total cost of an arbitrary sequence of
N insert, delete and deletemin operations on an initially empty buffer tree is O (nlogmn). However
because of this the buffer tree does not support the changing priorities of the elements in the
priority queue.
Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing
7
7 CONCLUSION
Thus we conclude that although the generic internal memory data structures serve well for
various problems, owing to greater dependence on external memory, there need to be certain data
structures for processing databases on external memory. Simply transforming the internal memory
data structures to work on external memory will not provide optimal structures because they will
not use the internal memory effectively and thus are sub-optimal. Buffer tree uses both external
and internal memory effectively to give optimal running time performance. Taking into
consideration the theory and tests performed on buffer tree in [1] a buffer tree as a generic data
structure appears to perform well in theory and practice. Since the buffer tree takes the advantage
of the large internal memory we get a good amortized performance in processing the batched
dynamic operations.
8 FUTURE DIRECTION
For any external memory algorithms using buffer tree, the actual running times may be
improved by tuning various other parameters. Measuring I/O efficiency experimentally is an
important topic that can be further explored for various known parameters and currently unknown
parameters.
REFERENCES
[1] D. Hutchinson, A. Maheshwari, J.-R. Sack, and R. Velicescu, “Early experiences in
implementing the buffer tree,” in Proceedings of the Workshop on Algorithm Engineering,
Springer-Verlag, 1997.
[2] Jeffrey Scott Vitter. “Algorithms and Data Structures for External Memory”. Foundations and
Trends in Theoretical Computer Science Volume 2 Issue 4, 2006.
[3] J. van den Bercken, B. Seeger, and P. Widmayer, “A generic approach to bulk loading
multidimensional index structures,” in Proceedings of the International Conference on Very
Large Databases, pp. 406–415, 1997.
[4] L. Arge, “The buffer tree: A technique for designing batched external data structures,”
Algorithmica, vol. 37, no. 1, pp. 1–24, 2003.

More Related Content

What's hot

Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecturenupurmakhija1211
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit IIImalathieswaran29
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionNaiyan Noor
 
Diffusion Deformable Model for 4D Temporal Medical Image Generation
Diffusion Deformable Model for 4D Temporal Medical Image GenerationDiffusion Deformable Model for 4D Temporal Medical Image Generation
Diffusion Deformable Model for 4D Temporal Medical Image GenerationBoahKim2
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibEl Habib NFAOUI
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query OptimizationRavinder Kamboj
 
FP304 DATABASE SYSTEM PAPER FINAL EXAM AGAIN
FP304 DATABASE SYSTEM  PAPER FINAL EXAM AGAINFP304 DATABASE SYSTEM  PAPER FINAL EXAM AGAIN
FP304 DATABASE SYSTEM PAPER FINAL EXAM AGAINSyahriha Ruslan
 

What's hot (20)

Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecture
 
bnp tropo
bnp tropo bnp tropo
bnp tropo
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer Version
 
Description of data
Description of dataDescription of data
Description of data
 
Diffusion Deformable Model for 4D Temporal Medical Image Generation
Diffusion Deformable Model for 4D Temporal Medical Image GenerationDiffusion Deformable Model for 4D Temporal Medical Image Generation
Diffusion Deformable Model for 4D Temporal Medical Image Generation
 
File tracking system
File tracking systemFile tracking system
File tracking system
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Pass 1 flowchart
Pass 1 flowchartPass 1 flowchart
Pass 1 flowchart
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
 
Semi join
Semi joinSemi join
Semi join
 
In-Memory DataBase
In-Memory DataBaseIn-Memory DataBase
In-Memory DataBase
 
Cost estimation for Query Optimization
Cost estimation for Query OptimizationCost estimation for Query Optimization
Cost estimation for Query Optimization
 
FP304 DATABASE SYSTEM PAPER FINAL EXAM AGAIN
FP304 DATABASE SYSTEM  PAPER FINAL EXAM AGAINFP304 DATABASE SYSTEM  PAPER FINAL EXAM AGAIN
FP304 DATABASE SYSTEM PAPER FINAL EXAM AGAIN
 

Viewers also liked

Ecological Landscaping
Ecological LandscapingEcological Landscaping
Ecological Landscapingtierramor
 
Rhs level 2 certificate year 1 week 28 presentation
Rhs level 2 certificate year 1 week 28 presentationRhs level 2 certificate year 1 week 28 presentation
Rhs level 2 certificate year 1 week 28 presentationvikkis
 
Introduction to garden planning and design session 4
Introduction to garden planning and design session 4Introduction to garden planning and design session 4
Introduction to garden planning and design session 4vikkis
 
Introduction to garden planning and design session 2 slides
Introduction to garden planning and design session 2 slidesIntroduction to garden planning and design session 2 slides
Introduction to garden planning and design session 2 slidesvikkis
 
Week 29 2012 presentation
Week 29 2012 presentationWeek 29 2012 presentation
Week 29 2012 presentationvikkis
 
Elements of landscape design
Elements of landscape designElements of landscape design
Elements of landscape designDiksha Sharma
 
Casestudy landscape ip park.
Casestudy landscape ip park.Casestudy landscape ip park.
Casestudy landscape ip park.Antima Kuda
 
Plants care gurgaon slides
Plants care  gurgaon slidesPlants care  gurgaon slides
Plants care gurgaon slidesanilshakia
 

Viewers also liked (15)

Ecological Landscaping
Ecological LandscapingEcological Landscaping
Ecological Landscaping
 
Rhs level 2 certificate year 1 week 28 presentation
Rhs level 2 certificate year 1 week 28 presentationRhs level 2 certificate year 1 week 28 presentation
Rhs level 2 certificate year 1 week 28 presentation
 
Introduction to garden planning and design session 4
Introduction to garden planning and design session 4Introduction to garden planning and design session 4
Introduction to garden planning and design session 4
 
Introduction to garden planning and design session 2 slides
Introduction to garden planning and design session 2 slidesIntroduction to garden planning and design session 2 slides
Introduction to garden planning and design session 2 slides
 
Computer memory
Computer memoryComputer memory
Computer memory
 
Week 29 2012 presentation
Week 29 2012 presentationWeek 29 2012 presentation
Week 29 2012 presentation
 
Landscape Architecture
Landscape ArchitectureLandscape Architecture
Landscape Architecture
 
Virtual memory ppt
Virtual memory pptVirtual memory ppt
Virtual memory ppt
 
Elements of landscape design
Elements of landscape designElements of landscape design
Elements of landscape design
 
Casestudy landscape ip park.
Casestudy landscape ip park.Casestudy landscape ip park.
Casestudy landscape ip park.
 
Presentation on memory
Presentation on memoryPresentation on memory
Presentation on memory
 
Computer memory
Computer memoryComputer memory
Computer memory
 
Plants care gurgaon slides
Plants care  gurgaon slidesPlants care  gurgaon slides
Plants care gurgaon slides
 
Landscaping Architecture
Landscaping ArchitectureLandscaping Architecture
Landscaping Architecture
 
Landscape Design and Principles
Landscape Design and PrinciplesLandscape Design and Principles
Landscape Design and Principles
 

Similar to Buffer Trees - Utility and Applications for External Memory Data Processing

AN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORY
AN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORYAN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORY
AN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORYcsandit
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory SortingMilind Gokhale
 
GR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash MemoryGR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash MemoryIDES Editor
 
A survey of data recovery on flash memory
A survey of data recovery on flash memory A survey of data recovery on flash memory
A survey of data recovery on flash memory IJECEIAES
 
A Seminar Presentation on Compiler Techniques for Energy Reduction in High-P...
A Seminar Presentation onCompiler Techniques for Energy Reduction in High-P...A Seminar Presentation onCompiler Techniques for Energy Reduction in High-P...
A Seminar Presentation on Compiler Techniques for Energy Reduction in High-P...shashank wake
 
Plam15 slides.potx
Plam15 slides.potxPlam15 slides.potx
Plam15 slides.potxVlad Lesin
 
database backup and recovery
database backup and recoverydatabase backup and recovery
database backup and recoverysdrhr
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...IRJET Journal
 
b tree file system report
b tree file system reportb tree file system report
b tree file system reportDinesh Gupta
 
Bt0070, operating systems
Bt0070, operating systemsBt0070, operating systems
Bt0070, operating systemssmumbahelp
 
Comparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsComparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsAmir Mahdi Akbari
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
Analysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory ManagementAnalysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory Managementijtsrd
 
Query processing and optimization
Query processing and optimizationQuery processing and optimization
Query processing and optimizationArif A.
 
Data and File Structure Lecture Notes
Data and File Structure Lecture NotesData and File Structure Lecture Notes
Data and File Structure Lecture NotesFellowBuddy.com
 
Software Systems Modularization
Software Systems ModularizationSoftware Systems Modularization
Software Systems Modularizationchiao-fan yang
 
Enery efficient data prefetching
Enery efficient data prefetchingEnery efficient data prefetching
Enery efficient data prefetchingHimanshu Koli
 

Similar to Buffer Trees - Utility and Applications for External Memory Data Processing (20)

G143
G143G143
G143
 
AN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORY
AN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORYAN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORY
AN EFFICIENT RECOVERY SCHEME FOR BUFFER-BASED B-TREE INDEXES ON FLASH MEMORY
 
Buffering.pptx
Buffering.pptxBuffering.pptx
Buffering.pptx
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory Sorting
 
GR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash MemoryGR-FB Block Cleaning Scheme in Flash Memory
GR-FB Block Cleaning Scheme in Flash Memory
 
A survey of data recovery on flash memory
A survey of data recovery on flash memory A survey of data recovery on flash memory
A survey of data recovery on flash memory
 
A Seminar Presentation on Compiler Techniques for Energy Reduction in High-P...
A Seminar Presentation onCompiler Techniques for Energy Reduction in High-P...A Seminar Presentation onCompiler Techniques for Energy Reduction in High-P...
A Seminar Presentation on Compiler Techniques for Energy Reduction in High-P...
 
Plam15 slides.potx
Plam15 slides.potxPlam15 slides.potx
Plam15 slides.potx
 
Opetating System Memory management
Opetating System Memory managementOpetating System Memory management
Opetating System Memory management
 
database backup and recovery
database backup and recoverydatabase backup and recovery
database backup and recovery
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
 
b tree file system report
b tree file system reportb tree file system report
b tree file system report
 
Bt0070, operating systems
Bt0070, operating systemsBt0070, operating systems
Bt0070, operating systems
 
Comparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsComparison of In-memory Data Platforms
Comparison of In-memory Data Platforms
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
Analysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory ManagementAnalysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory Management
 
Query processing and optimization
Query processing and optimizationQuery processing and optimization
Query processing and optimization
 
Data and File Structure Lecture Notes
Data and File Structure Lecture NotesData and File Structure Lecture Notes
Data and File Structure Lecture Notes
 
Software Systems Modularization
Software Systems ModularizationSoftware Systems Modularization
Software Systems Modularization
 
Enery efficient data prefetching
Enery efficient data prefetchingEnery efficient data prefetching
Enery efficient data prefetching
 

More from Milind Gokhale

Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Milind Gokhale
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Technology Survey and Design
Technology Survey and DesignTechnology Survey and Design
Technology Survey and DesignMilind Gokhale
 
Epics and User Stories
Epics and User StoriesEpics and User Stories
Epics and User StoriesMilind Gokhale
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSMilind Gokhale
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentMilind Gokhale
 
Android games analysis final presentation
Android games analysis final presentationAndroid games analysis final presentation
Android games analysis final presentationMilind Gokhale
 
Android gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalAndroid gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalMilind Gokhale
 
Building effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportBuilding effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportMilind Gokhale
 
Building effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationBuilding effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationMilind Gokhale
 
Internet marketing report
Internet marketing reportInternet marketing report
Internet marketing reportMilind Gokhale
 
Change: to be or not to be
Change: to be or not to beChange: to be or not to be
Change: to be or not to beMilind Gokhale
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 

More from Milind Gokhale (20)

Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Sprint Plan1
Sprint Plan1Sprint Plan1
Sprint Plan1
 
Technology Survey and Design
Technology Survey and DesignTechnology Survey and Design
Technology Survey and Design
 
Market Survey Report
Market Survey ReportMarket Survey Report
Market Survey Report
 
Epics and User Stories
Epics and User StoriesEpics and User Stories
Epics and User Stories
 
Visualforce
VisualforceVisualforce
Visualforce
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRS
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design Document
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Android games analysis final presentation
Android games analysis final presentationAndroid games analysis final presentation
Android games analysis final presentation
 
Android gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalAndroid gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinal
 
One sample runs test
One sample runs testOne sample runs test
One sample runs test
 
Building effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportBuilding effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project report
 
Building effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationBuilding effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - Presentation
 
Internet marketing report
Internet marketing reportInternet marketing report
Internet marketing report
 
Internet marketing
Internet marketingInternet marketing
Internet marketing
 
Indian it industry
Indian it industryIndian it industry
Indian it industry
 
Change: to be or not to be
Change: to be or not to beChange: to be or not to be
Change: to be or not to be
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Buffer Trees - Utility and Applications for External Memory Data Processing

  • 1. 1 CSCI-B 561 Advanced Database Concepts Project Report Buffer Trees - Utility and Applications for External Memory Data Processing Milind Gokhale mgokhale@indiana.edu November 16, 2014 1 ABSTRACT Now-a-days, due to the very large amounts of data, dependence on External Memory for data processing has increased tremendously. However there aren’t many generic External Memory tools designed for processing the data in a database on the external memory. This report will focus on the basics of buffer tree and some of the possibilities of its utility as a generic tool for processing data on the external memory. We introduce the problem of bottleneck in external memory data processing and the motivation for creation of Buffer Trees. We then describe the Buffer Tree data structure and observations of some experiments conducted on Buffer Trees in [1]. Finally we enlist some possible applications of Buffer Trees and conclude. 2 INTRODUCTION Today users have plenty of high quality and high resolution data available through various technologies and more data keeps on generating in various domains and fields. So the passage of huge data sets between External memory and Internal memory of computers has become commonplace. However there is a vast difference between data access speeds on internal memory and external memory. Internal memory is very fast while external memory is about 105 to 106 times slower in performing random access than the main memory. This has resulted in the growing demand for high performance input and output mechanisms to pass the huge data between fast internal memory and slower external storage. The I/O bandwidth is a bottleneck in many large scale applications like multimedia, GIS, land information seismic databases, virtual reality applications, satellite imagery, digital libraries and real-time online applications. 3 MOTIVATION There is an issue of bottleneck of communication between the internal memory and external memory. The present methodologies for addressing this issue are [1]:
  • 2. Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing 2 1. Increasing secondary storage device parallelism - thus improving the bandwidth between secondary memory and main memory. 2. Exploiting locality reference via organization of the data and processing sequence 3. Overlapping I/O with computation, e.g. using prefetching. Much work has been done on designing the external versions of data structures designed for internal memory. Mostly these data structures are used in on-line setting where queries should be answered immediately and within a good worst case number of I/Os. Since they are used in the on-line setting, they often do not take the advantage of the available main memory [4]. There are many times when the problem or the system is composed of batch setting where similar processing operations are performed on many data sets. Problems where the sequence of operations on the data structure are known in advance are known as batched dynamic problems. Bulk operations help in processing such batched dynamic problems. A bulk operation is a collection of individual operations that are executed in consecution without being interrupted by other requests [3]. Typically in industry - bulk order processing, end-of-day job processing, pre- provisioning of logical resources, temporal and spatial database processing etc. are some of the biggest batched problems where bulk operations are performed. In batched problems although the queries are not required to be answered instantly like in on-line setting, however there are tight service level agreements to complete processing enormous records in a rather short time. An important paradigm for batched problems in internal memory setting is to use dynamic data structure to process a sequence of updates [2]. For example – to sort n items, we can insert them in one by one in the priority queue, followed by a sequence of N deletemin operations. However if the same paradigm is used naively in the External Memory (EM) setting, with a B-tree as the dynamic data structure, it will result into sub-optimal I/O performance [2]. For example if we use the B-tree as a priority queue in sorting, each update and query operation takes O(logBN) I/Os, thus resulting in a total of O(NlogBN) I/Os which is larger than the optimal sort (N) by a substantial factor of roughly B . L. Arge developed an elegant buffer tree data structure to support batched dynamic operations, as in the sweep-line example, where the queries do not have to be answered right away or in any particular order [4]. 4 BUFFERING METHOD AND BUFFER TREE The I/O Model being used: We list the terms that we use to denote various components of the I/O model. i. N = number of elements in the problem instance. ii. M = number of elements that can fit into the main memory. iii. B = number of elements per block. iv. n = N/B = number of blocks in the problem v. m = M/B = number of blocks that fit in to the main memory vi. M < N vii. 0 < B < M/2
  • 3. Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing 3 viii. An I/O operation is a swap of B elements from the internal memory with B consecutive elements from external memory. 4.1 WHAT IS THE BUFFERING TECHNIQUE? Main idea of buffering technique is to perform operations on an external tree data structure in a lazy manner [4]. This can be achieved by associating the main memory sized buffers with the internal nodes of the tree. We assign buffers of size m blocks to each of the internal nodes of the structure. When inserting an element we do not search down the tree for the relevant leaf right away but wait until we have collected a batch of insertions (or other operations) and then we insert this block into the buffer of the root. If the buffer runs full, then buffer overflow is said to have occurred and in such a case the buffer emptying process pushes the elements in the buffer one level down to the buffers on the next level of the tree [4]. The advantage of buffering is that as a result of the laziness, we can have several insertions and deletions of the same elements in the tree at the same time [4]. 4.2 BUFFER TREE 4.2.1 Data Structure Figure 1 It is an a,b tree with a = m/4 and b = m over n leaves containing B elements each extended with a buffer of size m attached to each node [3]. All leaves are on the same level and all nodes have a fan-out between m/4 and m i.e. between a and b. Internal nodes are the nodes that do not have leaves as children. While Leaf nodes are nodes that are not internal nodes. The height of the tree is O (logmn) and the structure uses O (n) space for the n elements. Figure 2
  • 4. Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing 4 4.2.2 Operations on Buffer Tree a. Update: A request element is created consisting of record to be inserted or deleted, a flag indicating the type of operation and an automatically generated timestamp [1]. Requests are collected in internal memory until a block of B requests has been formed. The request elements as a block are inserted into the buffer of the root using one I/O. b. Buffer Emptying process: If the buffer of the root contains less than m/2 blocks then nothing is to be done. If there are more than m/2 blocks in the root, then buffer is emptied using buffer emptying process [1]. Buffer emptying process at an internal node requires O (m) I/Os since we load m/2 blocks into the internal memory and distribute the elements among theta (m) children of that node. Throughout the buffer emptying process the process maintains an invariant that the buffers of the nodes on the path from the root to the leaf node with full buffer are all empty. The buffer emptying process is not applied on all the internal nodes recursively, but is rather applied along the path from root to leaf node. This is done to prevent different rebalancing operations from interfering with each other. The deletion of a block may involve initiation of several buffer emptying process at the node involved. The buffer emptying process can be protected from interference from other processes by using dummy blocks [1]. c. Rebalancing: Buffer emptying process at the leaf may require rebalancing the underlying a,b tree. An a,b tree is rebalanced by performing a series of "fuse" operation in the case of insertion and "share" operation in the case of a delete [1]. Before performing a rebalance operation, we ensure that the buffers for the corresponding nodes are empty. This is achieved by doing buffer emptying process at the node involved. 4.2.3 I/O complexity analysis of the buffer tree operations [1] a. Update: Each update element on insertion into the root buffer = O((logmn)/B). Each block in the buffer of any node v = O(height of the tree rooted at v). b. Buffer Emptying: Buffer emptying process at any node = O(m). So ignoring the cost of rebalancing, the total cost of all buffer emptying process on an internal node bounded by O(nlogmn) I/Os. c. Rebalancing: Total number of rebalance operations required in a,b tree where b>2a, over k update operations on an initially empty a,b tree is bounded by k/(b/2 - a). d. Theorem The total cost of an arbitrary sequence of intermixed insert and delete operation on an initially empty buffer tree is O(nlogmn), i.e. the amortized cost of an operation is O((logmn)/B) I/Os. The tree used O(n) space [4].
  • 5. Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing 5 5 EXPERIMENTS AND COMPARISONS As seen in [1], in order to obtain meaningful performance results, several tests were performed on buffer trees. Some factors like Contention with other processes for machine resources and Virtual memory effects were controlled. 5.1 COMPARISON TO QUICK SORT Figure 3 [1] Sorting executed with buffer tree (Buffer Tree Sort BTS) outperformed the build in internal memory quicksort. Although for smaller inputs the internal quicksort requires less running time, for large input sets, quicksort ran out of memory. As per the test results [1] as seen in figure 3, the internal quicksort failed due to lack of internal memory for problem sizes with ‘n’ larger than 2.8 million items. 5.2 BUFFER TREE TUNING Figure 4 Number of block pushes for different b [1] Figure 5 running time of BTS for different b [1]
  • 6. Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing 6 It was found that the values of a and b in terms of m is important for the performance of buffer tree sort (BTS) as seen in figure 4. Adjusting parameter b reduces the number of I/O operations performed by BTS. The values (a, b) = (m/32, m/8) gave the best performance in terms of running time (almost linear). Along with optimal values of a and b, reducing the fan-out increases the expected size of data in a block push while buffer emptying process and thus reduces the required number of I/O operations as seen in figure 5. 6 APPLICATIONS There are places where buffer trees can be utilized like - Buffer trees can be used as a subroutine in the standard sweep algorithm in order to get an optimal External Memory algorithm for orthogonal segment intersection. Buffer trees can also be extended to implement segment trees in external memory in a batched dynamic setting by reducing the node degrees theta (root m) and by introducing multi-slabs in each node. Buffer trees provide natural amortized implementation of the priority queue for time-forward processing applications such as discrete event simulation, sweeping, and list ranking. 6.1 SORTING First N items are inserted in the buffer tree. Then write/empty operations are performed. For this the buffer emptying process is started at the root of the buffer tree to all the way down to the leaves. Then the leaves of the buffer tree are read sequentially from left to right to obtain the elements in a sorted order. This can be done in complexity of computing the buffer tree data structure i.e. O (nlogmn) time. Thus the Corollary: N elements can be sorted in O (nlogmn) I/O operations using the buffer tree [1]. In practice however, other factors like CPU time can also affect the running time. 6.2 PRIORITY QUEUES In general, the leftmost leaf of the search tree contains the smallest element. In a buffer tree, the smallest element need not be in the leftmost leaf. A buffer tree can be used for maintaining priority queue in external memory by permitting update operation into priority queue and adding deletemin operation. So in order to extract the minimum element, first the buffer emptying process is performed on all the nodes on the path from root the leftmost leaf. Hence the leftmost leaf contains the B smallest elements, and the children of the leftmost node in the buffer tree consists of at least Bm/4 smallest elements. So when the deletemin operation is executed, at least Bm/4 deletemin operations can be answered without doing additional I/Os and thus the amortized cost of operation is also reduced. Thus we have Theorem 2: The total cost of an arbitrary sequence of N insert, delete and deletemin operations on an initially empty buffer tree is O (nlogmn). However because of this the buffer tree does not support the changing priorities of the elements in the priority queue.
  • 7. Milind Gokhale Buffer Trees - Utility and Applications for External Memory Data Processing 7 7 CONCLUSION Thus we conclude that although the generic internal memory data structures serve well for various problems, owing to greater dependence on external memory, there need to be certain data structures for processing databases on external memory. Simply transforming the internal memory data structures to work on external memory will not provide optimal structures because they will not use the internal memory effectively and thus are sub-optimal. Buffer tree uses both external and internal memory effectively to give optimal running time performance. Taking into consideration the theory and tests performed on buffer tree in [1] a buffer tree as a generic data structure appears to perform well in theory and practice. Since the buffer tree takes the advantage of the large internal memory we get a good amortized performance in processing the batched dynamic operations. 8 FUTURE DIRECTION For any external memory algorithms using buffer tree, the actual running times may be improved by tuning various other parameters. Measuring I/O efficiency experimentally is an important topic that can be further explored for various known parameters and currently unknown parameters. REFERENCES [1] D. Hutchinson, A. Maheshwari, J.-R. Sack, and R. Velicescu, “Early experiences in implementing the buffer tree,” in Proceedings of the Workshop on Algorithm Engineering, Springer-Verlag, 1997. [2] Jeffrey Scott Vitter. “Algorithms and Data Structures for External Memory”. Foundations and Trends in Theoretical Computer Science Volume 2 Issue 4, 2006. [3] J. van den Bercken, B. Seeger, and P. Widmayer, “A generic approach to bulk loading multidimensional index structures,” in Proceedings of the International Conference on Very Large Databases, pp. 406–415, 1997. [4] L. Arge, “The buffer tree: A technique for designing batched external data structures,” Algorithmica, vol. 37, no. 1, pp. 1–24, 2003.