APPLICATIONS OF ADVANCED
        DATA STRUCTURES




1                     9/3/2012
Index-Sequential File Organization

       Index-Sequential files are files (which holds
        information for data) ordered sequentially on a
        search key.

       Main disadvantage is that performance degrades as
        file size grows for lookups and sequential scans.

       Degradation can be fixed with reorganization of the
        file. Reorganization require lot of overhead space so
        frequent reorganization is undesirable.



    2                                         9/3/2012
3   9/3/2012
       An index speeds up certain queries or searches
        because it stores information about where data is
        stored on the disc. The index points directly to the
        location of a record on the disc and can be used to
        avoid searching a large file.
       The DBMS represents data as records in a table.
        However, a disc stores data in blocks, or pages.
        Many records may be placed in one block or one
        record may be placed across many blocks.
       The computer can only transfer one block at a time
        between main memory and the disc.

    4                                        9/3/2012
       The problem for the DBMS is to decide in which
        block each record should be placed and what
        information should be stored in addition to the record
        to allow the record to be retrieved easily.




    5                                          9/3/2012
6   9/3/2012
       But when the number of indexed values is large, the
        index will not fit in one block. Therefore, the contents
        of the index must be placed in two or more blocks.
       The solution to this problem is to create an index of
        an index. That is, the single index is split into a
        number of blocks and a new index is created that
        indexes each block.
       The B+-Tree structure is an index of an index, called
        multi-level index.




    7                                           9/3/2012
8   9/3/2012
Dynamic Multilevel Indexes Using B-Trees
                and B+-Trees
       Because of the insertion and deletion problem, most
        multi-level indexes use B-tree or B+-tree data
        structures, which leave space in each tree node (disk
        block) to allow for new index entries

       These data structures are variations of search trees
        that allow efficient insertion and deletion of new search
        values.

       In B-Tree and B+-Tree data structures, each node
        corresponds to a disk block

       Each node is kept between half-full and completely full


    9                                            9/3/2012
Dynamic Multilevel Indexes Using B-Trees
                and B+-Trees
    An insertion into a node that is not full is quite
     efficient; if a node is full the insertion causes a split
     into two nodes
    Splitting may propagate to other tree levels
    A deletion is quite efficient if a node does not become
     less than half full
    If a deletion causes a node to become less than half
     full, it must be merged with neighboring nodes


    10                                        9/3/2012
The nodes of a B+-tree. (a) Internal node of a B+-tree with q
–1 search values. (b) Leaf node of a B+-tree with q – 1 search
               values and q – 1 data pointers.




11                                           9/3/2012
root


                                       EMBRY
Index set



                BOLEN       CAMP                      FABER       FOLKS




 ADAMS-BERNE                CAMP-DUTTON   EMBRY-EVANS                FOLKS-GADDIS

     1         BOLEN-CAGE
                                   3           4        FABER-FOLK
                                                                             6

                    2                                         5


     12                                                   9/3/2012
B+ Tree Result
 First level (root level)                  6144
      
                                                                                          Node
 Second level
                            3718    4161                      7409   7422    7917
 Third level
 (leaf level)

2014 2019 3147          3718 3904          4161 4162                                7422 7602         7917 8003 8193




                                                       6144 7329            7409 7418




                                                                   empno     lastname           job      …
                                                              7409            vicky        CLERK         …


    13                                                                                  9/3/2012
Advantages of using B+ Trees in database
    high fanout / low depth

    simple and consistent block storage

    high key density




    14                                     9/3/2012
15   9/3/2012
ABOUT GOOGLE SEARCH:
   Normally in the Google search:
       Every word matters (Except ‘stop words’). All the words that
        you type in the search box are used by Google.


       Word order will also become more important, as the first
        word entered will dictate which results are shown first.

       The search is case-insensitive i.e. Google does not find any
        difference between CAPITAL and capital.

       Generally, punctuations or special characters like ~, !, @, #, $,
        (, ), {, }, [, ],  are ignored.
   Google ignores some words (stop words) such as I, a,
    about, an, are, the, etc.,
16                                                    9/3/2012
EARLIER GOOGLE SEARCH:
 We   had to,
    Use the words that we think are most likely to appear on
     the page.


    Use descriptive words. The accuracy of results depends
     on the uniqueness of the description.

    Use as fewer words as possible. Since a combination of
     many words may limit your search results.




17                                           9/3/2012
“Google took a much more active role in
     leading searchers to not just the answer,
           but also the question itself.”
18                                    9/3/2012
ABOUT GOOGLE INSTANT:
    When the user begins typing their query into the
     Google search box, Google will display a short list of
     predicted queries that are related to the letters the
     user has started to type in.
    As the user types these predictions may change
     depending on the characters being entered.
    Not only the suggestions, the search results also
     keeps changing without the press of the Enter key as
     the user enters queries.
    15 new technologies contribute to Google Instant
     functionality.

    19                                      9/3/2012
DATA STRUCTURE IN GI:



¢    a b c d e f g   …………………………………………………..    z ¶




20                                 9/3/2012
This is a trie for keys
  “A”,
  “to”,
  “tea”,
  “ted”,
  “ten”,
  “i”,
  “in”, and
  “inn”.


 21                        9/3/2012
 When    we need to do auto complete for the starting
  characters, “te”, we need to get output tea, ted and
  ten.
 Instead of checking regular expression match for all
  the words in the database, it will make use of
  transitions.
 First character is t. Then in the root element, it will
  make transition for „t‟ so that it will reach the node
  with data „t‟, then at node „t‟, it will make transition for
  next node „e‟.
 At that point, we need to follow all paths from node
  „e‟ to leaf nodes so that we can get the paths t->e-
  >a, t->e->d and t->e->n.
  This is the basic algorithm behind implementing an
22                                           9/3/2012
  efficient auto complete.
23   9/3/2012
FASTER SEARCHES:
    Before Google Instant, the typical searcher took
     more than 9 seconds to enter a search term.

    We can see many examples of searches that takes
     30-90 seconds to type.

    Using Google Instant can save 2-5 seconds per
     search.

    If everyone uses Google Instant globally, Google
     estimates that this will save more than 3.5 billion
     seconds a day (that's 11 hours saved every second).
    24                                    9/3/2012
SMARTER PREDICTIONS:

    Even when we don‟t know exactly what we are
     looking for, predictions guides our search.

    The top prediction is shown in grey text in the
     search box, so that we can stop typing as soon
     as we see what we need.




    25                                 9/3/2012
INSTANT RESULTS:

    As we start typing the query and the results appear
     at once.

    But before the GI we had to type a full search term,
     hit enter, and hope for the right result.

    Now the results appear instantly, helping us to head
     to our search much more easier and faster in a
     simpler way.

    It‟s really amazing that GI goes through more than
     6000 words per second.
    26                                    9/3/2012
CONTRIBUTING FACTORS:

    Query volume.

    Geography of searchers.

    Keywords or phrases mentioned.



                                             video


    27                                9/3/2012
28   9/3/2012

Ads applications of ads

  • 1.
    APPLICATIONS OF ADVANCED DATA STRUCTURES 1 9/3/2012
  • 2.
    Index-Sequential File Organization  Index-Sequential files are files (which holds information for data) ordered sequentially on a search key.  Main disadvantage is that performance degrades as file size grows for lookups and sequential scans.  Degradation can be fixed with reorganization of the file. Reorganization require lot of overhead space so frequent reorganization is undesirable. 2 9/3/2012
  • 3.
    3 9/3/2012
  • 4.
    An index speeds up certain queries or searches because it stores information about where data is stored on the disc. The index points directly to the location of a record on the disc and can be used to avoid searching a large file.  The DBMS represents data as records in a table. However, a disc stores data in blocks, or pages. Many records may be placed in one block or one record may be placed across many blocks.  The computer can only transfer one block at a time between main memory and the disc. 4 9/3/2012
  • 5.
    The problem for the DBMS is to decide in which block each record should be placed and what information should be stored in addition to the record to allow the record to be retrieved easily. 5 9/3/2012
  • 6.
    6 9/3/2012
  • 7.
    But when the number of indexed values is large, the index will not fit in one block. Therefore, the contents of the index must be placed in two or more blocks.  The solution to this problem is to create an index of an index. That is, the single index is split into a number of blocks and a new index is created that indexes each block.  The B+-Tree structure is an index of an index, called multi-level index. 7 9/3/2012
  • 8.
    8 9/3/2012
  • 9.
    Dynamic Multilevel IndexesUsing B-Trees and B+-Trees  Because of the insertion and deletion problem, most multi-level indexes use B-tree or B+-tree data structures, which leave space in each tree node (disk block) to allow for new index entries  These data structures are variations of search trees that allow efficient insertion and deletion of new search values.  In B-Tree and B+-Tree data structures, each node corresponds to a disk block  Each node is kept between half-full and completely full 9 9/3/2012
  • 10.
    Dynamic Multilevel IndexesUsing B-Trees and B+-Trees  An insertion into a node that is not full is quite efficient; if a node is full the insertion causes a split into two nodes  Splitting may propagate to other tree levels  A deletion is quite efficient if a node does not become less than half full  If a deletion causes a node to become less than half full, it must be merged with neighboring nodes 10 9/3/2012
  • 11.
    The nodes ofa B+-tree. (a) Internal node of a B+-tree with q –1 search values. (b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers. 11 9/3/2012
  • 12.
    root EMBRY Index set BOLEN CAMP FABER FOLKS ADAMS-BERNE CAMP-DUTTON EMBRY-EVANS FOLKS-GADDIS 1 BOLEN-CAGE 3 4 FABER-FOLK 6 2 5 12 9/3/2012
  • 13.
    B+ Tree Result First level (root level) 6144  Node Second level 3718 4161 7409 7422 7917 Third level (leaf level) 2014 2019 3147 3718 3904 4161 4162 7422 7602 7917 8003 8193 6144 7329 7409 7418 empno lastname job … 7409 vicky CLERK … 13 9/3/2012
  • 14.
    Advantages of usingB+ Trees in database  high fanout / low depth  simple and consistent block storage  high key density 14 9/3/2012
  • 15.
    15 9/3/2012
  • 16.
    ABOUT GOOGLE SEARCH:  Normally in the Google search:  Every word matters (Except ‘stop words’). All the words that you type in the search box are used by Google.  Word order will also become more important, as the first word entered will dictate which results are shown first.  The search is case-insensitive i.e. Google does not find any difference between CAPITAL and capital.  Generally, punctuations or special characters like ~, !, @, #, $, (, ), {, }, [, ], are ignored.  Google ignores some words (stop words) such as I, a, about, an, are, the, etc., 16 9/3/2012
  • 17.
    EARLIER GOOGLE SEARCH: We had to,  Use the words that we think are most likely to appear on the page.  Use descriptive words. The accuracy of results depends on the uniqueness of the description.  Use as fewer words as possible. Since a combination of many words may limit your search results. 17 9/3/2012
  • 18.
    “Google took amuch more active role in leading searchers to not just the answer, but also the question itself.” 18 9/3/2012
  • 19.
    ABOUT GOOGLE INSTANT:  When the user begins typing their query into the Google search box, Google will display a short list of predicted queries that are related to the letters the user has started to type in.  As the user types these predictions may change depending on the characters being entered.  Not only the suggestions, the search results also keeps changing without the press of the Enter key as the user enters queries.  15 new technologies contribute to Google Instant functionality. 19 9/3/2012
  • 20.
    DATA STRUCTURE INGI: ¢ a b c d e f g ………………………………………………….. z ¶ 20 9/3/2012
  • 21.
    This is atrie for keys “A”, “to”, “tea”, “ted”, “ten”, “i”, “in”, and “inn”. 21 9/3/2012
  • 22.
     When we need to do auto complete for the starting characters, “te”, we need to get output tea, ted and ten.  Instead of checking regular expression match for all the words in the database, it will make use of transitions.  First character is t. Then in the root element, it will make transition for „t‟ so that it will reach the node with data „t‟, then at node „t‟, it will make transition for next node „e‟.  At that point, we need to follow all paths from node „e‟ to leaf nodes so that we can get the paths t->e- >a, t->e->d and t->e->n. This is the basic algorithm behind implementing an 22 9/3/2012 efficient auto complete.
  • 23.
    23 9/3/2012
  • 24.
    FASTER SEARCHES:  Before Google Instant, the typical searcher took more than 9 seconds to enter a search term.  We can see many examples of searches that takes 30-90 seconds to type.  Using Google Instant can save 2-5 seconds per search.  If everyone uses Google Instant globally, Google estimates that this will save more than 3.5 billion seconds a day (that's 11 hours saved every second). 24 9/3/2012
  • 25.
    SMARTER PREDICTIONS:  Even when we don‟t know exactly what we are looking for, predictions guides our search.  The top prediction is shown in grey text in the search box, so that we can stop typing as soon as we see what we need. 25 9/3/2012
  • 26.
    INSTANT RESULTS:  As we start typing the query and the results appear at once.  But before the GI we had to type a full search term, hit enter, and hope for the right result.  Now the results appear instantly, helping us to head to our search much more easier and faster in a simpler way.  It‟s really amazing that GI goes through more than 6000 words per second. 26 9/3/2012
  • 27.
    CONTRIBUTING FACTORS:  Query volume.  Geography of searchers.  Keywords or phrases mentioned. video 27 9/3/2012
  • 28.
    28 9/3/2012