Cache Conscious Indexes


Published on

One of the way to improve the mining of huge dataset. It can be done by improving the performance of algorithm in searching process.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Pointer elimination is important in cache optimization, but removing pointer completely introduces some restriction, so we use partial elimination
  • Cache Conscious Indexes

    1. 1. Presented by: Sumit Lole M. Tech. (CS) 1 st Sem School Of Computer Science & IT Guided by: Mrs. Shraddha Masih Cache Sensitive B+ Tree for Large Data Set
    2. 2. OUTLINE <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Cache Sensitive B+-Trees </li></ul><ul><li>Conclusion </li></ul>
    3. 3. INTRODUCTION <ul><li>The principal criterion of any mining algorithm is to welcome influx of huge data that poses a real challenge to space-time requirement. Unless data are arranged in a compact and efficient way, algorithms, with limited primary storage, fail to produce output within reasonable time. </li></ul>
    4. 4. Challenges in Data Mining <ul><li>A fundamental challenge is to extend data mining to large data sets. </li></ul><ul><li>The size of the processed transactions will be larger than the size of main memory. </li></ul><ul><li>The most basic approach is to manipulate the data until it fits into memory. </li></ul><ul><li>Scaling data mining algorithms. </li></ul>
    5. 5. RELATED WORK <ul><li>Computational and programming model that are used in high performance computing & reduce the cost of computing. </li></ul><ul><li>There are two basic distinctions between the various programming models used in high performance computing </li></ul><ul><ul><li>Data Parallelism </li></ul></ul><ul><ul><li>Task Parallelism </li></ul></ul>
    6. 6. Basic Approaches for Scaling Data Intensive Computing <ul><li>Manipulate the data so that it fits into memory </li></ul><ul><li>Reduce the time to access out of memory data </li></ul><ul><li>Use several processors </li></ul><ul><li>Precomputing </li></ul>
    7. 7. Reduce the Time to Access out of Memory data <ul><li>Efficiently access of disk with improves in performance of core algorithms that can be scaled to large data set. </li></ul><ul><li>One way is to use specialized data structures to access data on disk </li></ul>
    8. 8. Comparison between B+-Trees and CSS-Trees <ul><li>B+ tree </li></ul><ul><li>full pointer </li></ul><ul><li>more cache access and more cache misses </li></ul><ul><li>efficient for updating operation, e.g. insertion and deletion </li></ul><ul><li>CSS tree </li></ul><ul><li>no pointer </li></ul><ul><li>fewer cache access and fewer cache misses </li></ul><ul><li>acceptable for static data updated in batches </li></ul>
    9. 9. CACHE SENSITIVE B+ TREES <ul><li>Cache Sensitive B+-Trees with One Child Pointer </li></ul><ul><li>Full CSB+-Trees </li></ul>
    10. 10. Cache Sensitive B+-Trees with One Pointer <ul><li>Similar as B+-tree </li></ul><ul><li>All the child nodes of any given node are put into a node group with one pointer </li></ul><ul><li>Nodes within a node group are stored continuously and can be accessed using an offset to the first node in the group </li></ul>
    11. 11. Cache Sensitive B+-Trees with One Pointer (cont’d) <ul><li>Cache misses are reduced because a cache line can hold more keys than B+-Trees and can satisfy one more level comparison. </li></ul><ul><li>CSB+-Tree can support incremental updates in a way similar to B+-Tree </li></ul>
    12. 12. Full CSB+-Tree <ul><li>Motivation: reduce the split cost </li></ul><ul><li>Method: </li></ul><ul><ul><li>pre-allocate space for a full node group </li></ul></ul><ul><ul><li>shift part of the node group along by one node when a node split </li></ul></ul><ul><li>Result: </li></ul><ul><ul><li>reduce the split cost, but increase the space complexity </li></ul></ul>
    13. 13. Operations on CSB+-Tree —Search <ul><li>Determine the rightmost key K in the node that is smaller than the search key </li></ul><ul><li>Get the address of the child node </li></ul><ul><li>Go to first step until find the search key or there is no other node can be checked </li></ul>
    14. 14. CONCLUSION <ul><li>Use of specialized data structures to access data on disk can makes lot improvement in efficiency of Mining Algorithm . </li></ul><ul><li>CSB+-Trees are more cache conscious than B+-Tree because of partial pointer elimination </li></ul><ul><li>CSB+-Trees support efficient incremental updates </li></ul><ul><li>Cache conscious index structures such as Cache Sensitive Search Trees (CSS-Trees) perform lookups much faster </li></ul>
    15. 15. REFERENCES <ul><li>HANKINS, R.A., & PATEL, J. M. Effect of Node Size on the Performance of Cache-Conscious Indices. Extended Report, </li></ul><ul><li>Jun Rao and Kenneth A. Ross. Cache conscious indexing for decision-support in main memory. In Proceedings of the 25th VLDB Conference , 1999. </li></ul><ul><li>Ramakrishnan, Ramu (1997). Database Management Systems . McGraw-Hill, New York. </li></ul><ul><li>Alan J. Smith. Cache memories. ACM Computing Surverys , 14(3):473-530, 1982. </li></ul>