Your SlideShare is downloading. ×
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Select Operation- disk access and Indexing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Select Operation- disk access and Indexing

545

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
545
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 3 7 1 . 2 .3 4.5.7 9.10 1F.1M2F2M3F3M ….
  • Transcript

    • 1.
      • Select Operation- disk access
      • and Indexing
      • *Some info on slides from Dr. S. Son, U. Va
    • 2. Disk access
      • DBs traditionally stored on disk
      • Cheaper to store on disk than in memory
      • Costs for:
        • Seek time, latency, data transfer time
      •   Disk access is page (block) oriented
      • 2 - 4 KB page size
    • 3. Access time
      • Access time is the time to randomly access a page
      • System initially determines if page in memory buffer (page tables, etc.)
      • Large disparity between disk access and memory access
    • 4. Select operation using table scan
      • If read the entire table for a select – table scan
      • Improvements to table scan of disk:
        • Parallel access
        • Sequential prefetch
    • 5. Parallel access
      • Linear search - all data rows read in from disk
        • I/O parallelism can be used (Raid)
          • multiple I/O read requests satisfied at the same time
          • stripe the data across different disks       
        • Problems with parallelism?
          • must balance disk arm load to gain maximum parallelism
          • requires the same total number of random I/O's, but using devices for a shorter time
    • 6. Sequential prefetch I/O
      • Retrieve one disk page after another (on same track) - typically 32
      • Seek time no longer a problem
      • Must know in advance to read 32 successive pages
      • Speed up of I/O by a factor of ≈ 10 (500 I/O's per second vs. 70)
    • 7. Access time
      • Seek time – average 8-10 ms, as low as 4 ms server
      • Latency time – 2-4 ms, as low as 1 or less
      • Data transfer time – .4-2 ms
    • 8. Access time
      • RIO            Seq. Prefetch .010             .010                    Seek - disk arm to cylinder .002             .002                    Latency - platter to sector . 0015            .048                  Data transfer - Page .0135           .060                   1 page vs. 32 pages
      • .43* seconds vs  .060 seconds 32 pages for both
      • * .00135X32=.43
    • 9. Access time for fast I/O
      • RIO            Seq. Prefetch .004             .004                    Seek - disk arm to cylinder .001             .001                    Latency - platter to sector . 0005            .016                 Data transfer - Page .0055           .021                   1 page vs. 32 pages
      • .176* seconds  .021 seconds 32 pages for both
      • * .0055X32=.176
    • 10. Organizing disk space
      • How to store data so minimize access time if read the entire table?
    • 11. Disk allocation
      • Disk Resource Allocation for Databases (DBA has control)
      • Goal – contiguous sectors on disk - want data as close together as possible  to minimize seek time
      • No standard SQL approach, but general way to deal with allocation
      • Some OS allow specification of size of file and disk device
    • 12. Types of Files
      • Heap files (unordered – sequential)
      • Sorted files (ordered – sort key)
      • Hash files (hash key, hash function)
        • Internal, external, file expansion
        • B+-trees
      • Raid technology (parallelizing)
      • Storage area networks – ERP (enterprise resource planning) and DW (data warehouses)
        • Storage devices configured as nodes in network – can attach/detach
    • 13. Tablespace
      • Tablespace is:
      • Allocation medium for tables and indexes for ORACLE, DB2, etc.
      • Can put >1 table in a table space if accessed together
      • Tablespace corresponds to 1 or more OS files and can span disk devices
      • Usually relations cannot span disk devices
    • 14. DB storage structures
      • DB CAP Database
      • Table- tspace 1 system
      • space
      • OS files fname1 fname2 fname3
      • Tables Cust agents prods orders orindx
      • Segments data data data data index
      • Extents
    • 15. Tablespace
      • ORACLE DB's contain several tablespaces, including one called system -      data description +  indexes + user-defined tables
      • default tablespace given to each user
      • if multiple tablespaces - better control over load balancing
      • can take some disk space off-line
    • 16. Extent
      • Relation composed of 1 or more extents
      • Extent - contiguous storage on disk
      • when data segment or index segment first created, given an initial extent from tablespace 10KB (5 pages)
      • if need more space given next contiguous extent
    • 17. Extent
      • Can increase the size by a positive % (cannot decrease)
        • initial n - size of initial extent
        • next n - size of next
        • max extents - maximum number of extents
        • min extents - number of extents initially allocated
        • pct increase n - % by which next extent grows over previous one
    • 18. Oracle create tablespace
      • http://www.adp-gmbh.ch/ora/sql/create_tablespace.html
    • 19. Create table
      • Create table statement - can specify tablespace, no. of extents
        • When initial extent full, new extent allocated
        • pctfree - determine how much space in a page can be used for inserts of new rows
          • if pctfree =10%, inserts stop when page is 90% full
              • Uses another page
        • pctused – determines when new inserts start again
          • if fall below certain percentage of total, default pctused = 40%                  pctfree + pctused < 100
          • For more info: http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm
    • 20. Rows
      • Row layout on each disk page
      • 1 2 3… N Row N Row N-1 … Row 1
      • Header info Row directory free space data rows
      • Header -
      • Row directory – row number and page byte offset
        • Row number is row number in page – book calls it slot#
          • Page byte offset – with varchar, row size not constant
      • To identify a particular row use RID (RowID) –
      • page #, slot # [file#]
      • slot# is number in row directory (logical #)
    • 21. Differences in DBMSs re: rows
      • ROWID can be retrieved in ORACLE but not DB2 (violates relational model rule)
      • ORACLE
          • rows can be split between pages (row record fragmentation)
          • Can have rows from multiple tables on same page, more info
      • DB2, no splitting, entire row moved to new page, need forwarding pointer
    • 22. Select operation using Indexes
      • Alternative to table scan
    • 23. Binary Search
      • “ Find all students with gpa > 3.0 ”
        • If data is in sorted file, do binary search to find first such student, then scan to find others.
        • Cost of binary search can be quite high.
    • 24. Binary Search
      • Binary search on disk
        • optimal for comparisons - not optimal for disk-based look-up
        • must keep data in order
        • may be reading values from same page at different times
    • 25. Indexing
      • Instead: Create an `index’ file
      • Keyed access retrieval method
      • index is a sorted file - sorted by index key
      • index entries:
          • index key pointer  (RID)
          •   
      • pointer is RID  
      • index resides on disk, partially memory resident when accessed
    • 26. Index File Page 1 Page 2 Page N Page 3 Data File k2 kN k1 Index File
    • 27. Tree-based index
      • B-tree – balanced tree
      • Nodes point to data (RIDs) and also point to other nodes in tree
    • 28. B+-tree
      • Most commonly used index structure type in DBs today
      • Based on B-tree
      • Good for equality and range searches
      • B+ tree : dynamic, adjusts gracefully under inserts and deletes.
      • Used to minimize disk I/O
      • available in DB2, ORACLE also has hash cluster, Ingres has heap structure, B-tree, isam (chain together new nodes)
    • 29. Structure of B+ Trees
      • leaf level pointers to data (RIDs)
      • the remaining are directory (index) nodes that point to other index nodes Fig.
      Index Entries Data Entries (&quot;Sequence set&quot;) (Direct search)
    • 30. Characteristics of B+ Tree
      • Insert/delete at log F N cost; keep tree height-balanced . (F = fanout, N = # leaf pages)
      • Minimum 50% occupancy (except for root). Each node contains d <= m <= 2 d entries. The parameter d is called the order of the tree.
      • Supports equality and range-searches efficiently
    • 31. Cost of I/O for B+-tree
      • Assume number of entries in each index node fits on one page - one node is one page
      • If tree with depth of 3, 3 I/Os to get pointer to data B+-tree structured to get most out of every disk page read
      • Read in index node, can make multiple probes to same page if remains in memory
        • likely since frequent access to upper -level nodes of actively used B+-trees
    • 32. B+ Trees in Practice
      • Typical order: 100.
      • Typical fill-factor: 2/3 full (66.6%)
        • average fanout = 133
      • Typical capacities:
        • Height 4: 133 4 = 312,900,700 records
        • Height 3: 133 3 = 2,352,637 records
      • Can often hold top levels in buffer pool:
        • Level 1 = 1 page = 8 Kbytes
        • Level 2 = 133 pages = 1 Mbyte
        • Level 3 = 17,689 pages = 133 MBytes
    • 33. B+-tree
      • B+ tree has a directory structure that allows retrieval of a range of values efficiently
        • search for leftmost index entry S i such that
      • X <= S i
      • Index entries always placed in sequence by value - can use sequential prefetch on index
      • Index entries shorter than data rows and require proportionately less I/O
    • 34. B+-tree
      • Balancing of B+-trees - insert, delete
      • Nodes usually not full
      • utilities to reorganize to lower disk I/O
      • Most systems allow nodes to become depopulated- no automatic algorithm to balance
      • Average node below root level 71% full in active growing B+-trees
      • Insert/delete
    • 35. Inserting into B+ Tree
      • Find correct leaf L.
      • Put data entry onto L .
        • If L has enough space, done !
        • Else, must split L (into L and a new node L2)
          • Redistribute entries evenly, copy up middle key.
          • Insert index entry pointing to L2 into parent of L .
      • This can happen recursively
        • To split index node , redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)
      • Splits “grow” tree; root split increases height.
        • Tree growth: gets wider or one level taller at top.
        • Algorithm
    • 36. Deleting from B+ tree
      • Start at root, find leaf L where entry belongs.
      • Remove the entry.
        • If L is at least half-full, done!
        • If L has only d-1 entries,
          • Try to re-distribute , borrowing from sibling (adjacent node with same parent as L) .
          • If re-distribution fails, merge L and sibling.
      • If merge occurred, must delete entry (pointing to L or sibling) from parent of L .
      • Merge could propagate to root, decreasing height.
      • Algorithm
    • 37. Duplicate key values
      • Duplicate key values in index
      • leaf nodes have sibling pointers
      • but a delete of a row that has a heavily duplicated key entails a long search through the leaf-level of the B+-tree
      • Index compression - with multiple duplicates
      • | header info | PrX keyval RID RID ... RID | PrX keyval RID…RID|
      • where PrX is count of RID values
    • 38. Create Index
      •     Options:        
      • multiple columns         tablespace         storage - initial extents, etc.         percent free default = 10
      • % of each page left unfilled
      • free page (1 free page for every n index pages)
      •      Can control % of B+-tree node pages left unfilled when index created, refers to initial creation
    • 39.     Why use an index? 
      • If use a select (or join) on the same attribute frequently
      • want a way to improve performance - use indexes
        • For example:
            • Select from Employee
            • where ssn = 333445555
      • Instead of reading the entire file until ssn is found, it would be nice if we had a pointer to that employee
    • 40. Types of indexes (textbook)
      • Primary index - key field is a candidate key (must be unique) – data file ordered by key field
      • Clustering index - key field is not unique, data file is ordered – all records with same values on same pages
      • Secondary index - non-clustering index – data file not ordered
        • First record in the data page (or block) is called the anchor record
          • Non-dense index - pointer in index entry points to anchor
          • Dense index - pointer to every record in the file
    • 41. Non-clustered indexes
      • Non-clustered index (secondary index)
        • key field is a non ordering field - it is not used to physically order the data file
        • the index itself is still ordered
        • How many non-clustering indexes can a table have?
    • 42. Clustered Indexes
      • Placing rows on disk in order by some common index key value        (remember the index itself is always sorted)
        • Clustered index - (primary and clustering)
        • key field is an ordering field - all the data with the same values for the key field physically placed on the same pages on the disk.
        • If primary key, data ordered on a page by key field
        • Usually assume disk pages themselves also clustered on the disk
        • How many clustering indexes can a table have?
    • 43. Clustering
      • Efficiency advantage        read in a page, get all of the rows with
      • the same value
      • clustering is useful for range queries         e.g.  between keyval1 and keyval2
    • 44. Example
      • http://www.dba-oracle.com/oracle_tip_hash_index_cluster_table.htm
    • 45. Clustering
      • Can only cluster table by 1 clustering index at a time
      • In SQL server
        • creates clustered index on PK automatically if no other clustered index on table and PK nonclustered index not specified
      • In DB2 –
        • if the table is empty, rows sorted as placed on disk
        • subsequent insertions not clustered, must use REORG
      • In Oracle-
        • Cluster index – now available for PK in 10g
        • Define a cluster to create cluster index for 2 tables
    • 46. Indexes vs. table scan
      • To illustrate the difference between table scan, secondary index (non clustered) and clustered index
        • Assume 10 M customers, 200 cities
        • 2KB/page, row = 100 bytes, 20 rows/page
        •             Select *             From Customers             Where city = Birmingham
        • 1/200 * 10M if assume selectivity = 1/200
        • 50,000 customers in a city
    • 47. Rules of Thumb for I/O
      • Random I/O – 160 pages/second, .00625
      • Sequential prefetch I/O – 1600 pages/second, .000625
      • Will discuss later:
      • List prefetch I/O – 400 pages/second, .0025
    • 48. Table Scan
      • Table Scan - read entire table
      • If used an random I/O
      • 10,000,000/20 = 500,000 pages 
      • 500,000*R = 3125
      • Instead, it makes more sense to use:
      • sequential prefetch read 32 pages at a time
      • 500,000*S = 312
    • 49. Clustering Index
      • Clustering Index –
      • All entries for B'ham clustered on same pages
      • 50,000/20 = 2500 pages (with 20 rows per page)
      • Assume: 3 upper nodes of the tree  
      • Assume: 1000 index entries per leaf node, read 50000/1000 index pages
      •  
      • (3 + 50 + 2500) * ? = 2,553 * ?
      • If assume ?=R, then 2,553*R=16
      • Makes more sense to assume (3+50+2500) * S = 1.6
    • 50. Secondary Index
      • Secondary Index–
      • In the worst case 1 entry for B'ham per page
      • 50,000 pages (10M/200)
      • 3 upper nodes of the tree  
      • Assume 1000 index entries per leaf node, read 50000/1000 index pages
      • (3 + 50 + 50,000)*? = 50, 053 * ?
      • If assume ?=R then 50,053*R=312.8
      • Better to assume (3+50)*S + 50,000*R=312.53
    • 51. List Prefetch
      • Create list of data pages to access
      • Pages not necessarily in contiguous sequential order
      • system orders pages to minimize disk I/O
        • E.g. elevator algorithm for disk request scheduling
      • 50, 053 * L = 125.1
      • Best to assume (3+50)*S+50,000*L=125.03
    • 52. % Free
      • Redo the previous calculations assuming relations created with 50% free option specified.
    • 53. Creating Indexes
      • When determining what indexes to create consider: workload - mix of queries and frequencies of requests             20% of requests are updates, etc.
      •             can create lots of indexes but:                 cost to create                 insertions                 initial load time high if a large table                 index entries can become longer and longer as multiple columns included
    • 54. Multiple Indexes
      • More than one index on a relation            
        • e.g. class - one index, gender - one index
    • 55. Composite Index
      • One index based on more than one attribute   Create Index index_name on Table (col1, col2,... coln)
      •     Composite index entry - values for each attribute             class, gender             entry in index is:  C1, C2, RID
      • What would B+ tree look like?
    • 56. Threads
      • Thread results from a fork of a computer program, usually contained inside a process
        • Multiple threads inside same process, share resources, address space and memory
        • Processes do not share these resources
        • Thread have their own stack, copy of registers, PC and local thread storage
      • Some languages support multiple threads, but do not execute at the same time
        • Kernel threads can run concurrently
    • 57. Parallel computing
      • Form of computation in which many calculations carried out simultaneously
        • Divide large problem into smaller ones
        • data, instruction level and task parallelism
        • SISD, SIMD, MISD, MIMD
      • Dominant paradigm in the form of multicore processors
      • Parallel computer – shared or distributed memory
      • Parallel program difficult to write due to
        • Software bugs, race conditions
        • Communication and synchronization
      • Multiple processing elements working concurrently
        • Single computer with multiple processors, networked computers, special hardware, etc.
    • 58.
      • Multithreading
        • Model to allow multiple threads within single process
        • Can execute in parallel on multiprocessor system
      • Process, kernel thread, user thread, fiber (cooperatively scheduled, can run in any thread in the same process)
      • Subtasks in a parallel program are called threads
        • Lightweight version of threads – fibers
        • Bigger versions – processes
    • 59.
      • Parallel computing – model of computation
        • Can utilize processes, multithreading to implement

    ×