CS 542 -- Query Optimization
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

CS 542 -- Query Optimization

on

  • 2,042 views

 

Statistics

Views

Total Views
2,042
Views on SlideShare
2,042
Embed Views
0

Actions

Likes
1
Downloads
55
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

CS 542 -- Query Optimization Presentation Transcript

  • 1. CS 542 Database Management Systems
    Query Optimization
    J Singh
    March 28, 2011
  • 2. Outline
    Convert SQL query to a parse tree
    Semantic checking: attributes, relation names, types
    Convert to a logical query plan (relational algebra expression)
    deal with subqueries
    Improve the logical query plan
    use algebraic transformations
    group together certain operators
    evaluate logical plan based on estimated size of relations
    Convert to a physical query plan
    search the space of physical plans
    choose order of operations
    complete the physical query plan
  • 3. Desired Endpoint
     x=1 AND y=2 AND z<5 (R)
    R ⋈ S ⋈ U
    Example Physical Query Plans
    two-pass
    hash-join
    101 buffers
    Filter(x=1 AND z<5)
    materialize
    IndexScan(R,y=2)
    two-pass
    hash-join
    101 buffers
    TableScan(U)
    TableScan(R)
    TableScan(S)
  • 4. Physical Plan Selection
    The particular operation being performed
    Size of intermediate results, as derived last week (sec 16.4 of book)
    Physical Operator Implementation used,
    e.g., one- or two-pass
    Operation ordering,
    esp. Join ordering
    Operation output: materialized or pipelined.
    Governed by disk I/O, which in turn is governed by
  • 5. Index-based physical plans (p1)
    Selection example. What is the cost of a=v(R) assuming
    B(R) = 2000
    T(R) = 100,000
    V(R, a) = 20
    Table scan (assuming R is clustered):
    B(R) = 2,000 I/Os
    Index based selection:
    If index is clustering: B(R) / V(R,a) = 100 I/Os
    If index is unclustered: T(R) / V(R,a) = 5,000 I/Os
    For small V(R, a), table scan can be faster than an unclustered index
    Heuristics that pick indexed over not-indexed can lead you astray
    Determine the cost of both methods and let the algorithm decide
    5
  • 6. Index-based physical plans (p2)
    Example: Join if S has an index on the join attribute
    For each tuplein R, fetch corresponding tuple(s) from S
    Assume R is clustered. Cost:
    If index on S is clustering: B(R) + T(R) B(S) / V(S,a)
    If index on S is unclustered: B(R) + T(R) T(S) / V(S,a)
    Another case: when R is output of another Iterator. Cost:
    B(R) is accounted for in the iterator
    If index on S is clustering: T(R) B(S) / V(S,a)
    If index on S is unclustered: T(R) T(S) / V(S,a)
    If S is not indexed but fits in memory: B(S)
    A number of other cases
  • 7. Index-based physical plans (p3)
    Index Based Join ifboth R and S have a sorted index (B+ tree) on the join attribute
    Then perform a merge join
    called zig-zag join
    Cost: B(R) + B(S)
  • 8. Grand Summary of Physical Plans (p1)
    Scans and Selects
    Index: N = None, C = Clustering, NC = Non-clustered
  • 9. Grand Summary of Physical Plans (p2)
    Joins
    Index: N = None, C = Clustering, NC = Non-clustered
    Relation fits in memory: F = Yes, NF = No
  • 10. Physical plans at non-leaf Operators (p1)
    What if the input of the operator is from another operator?
    For Select, cost= 0.
    Cost of pipelining is assumed to be zero
    The number of tuples emitted is reduced
    For Join, when R is from an operator and S from a table:
    B(R) is accounted for in the iterator
    If index on S is clustering: T(R) B(S) / V(S,a)
    If index on S is unclustered: T(R) T(S) / V(S,a)
    If S is not indexed but fits in memory: B(S)
    If S is not indexed and doesn’t fit: k*B(S) for k chunks
    If S is not indexed and doesn’t fit: 3*B(S) for sort- or hash-join
  • 11. Physical plans at non-leaf Operators (p2)
    For Join, when R and S are both from operators, cost depends on whether the result are sorted by the Join attribute(s)
    If yes, we use the zig-zag algorithm and the cost is zero. Why?
    If either relation will fit in memory, the cost is zero. Why?
    At most, the cost is 2*(B(R) + B(S)). Why?
  • 12. Example (787)
    Product(pname, maker), Company(cname, city)
    Select Product.pname
    From Product, Company
    Where Product.maker=Company.cname
    and Company.city = “Seattle”
    How do we execute this query ?
  • 13. Example (787)
    Product(pname, maker), Company(cname, city)
    Select Product.pname
    From Product, Company
    Where Product.maker=Company.cname
    and Company.city = “Seattle”
    Logical Plan
    Clustering Indices:
    Product.pname
    Company.cname
    Unclustered Indices:
    Product.maker
    Company.city
    maker=cname
    scity=“Seattle”
    Product(pname,maker)
    Company(cname,city)
  • 14. Example (787) Physical Plans
    Physical Plan 1
    Physical Plans 2a and 2b
    Merge-join
    Index-basedjoin
    Index-basedselection
    maker=cname
    scity=“Seattle”
    cname=maker
    scity=“Seattle”
    Product(pname,maker)
    Company(cname,city)
    Product(pname,maker)
    Company(cname,city)
    Index-scan
    Scan and sort (2a)index scan (2b)
  • 15. Evaluate (787) Physical Plans
    Physical Plan 1
    Tuples:
    T(city='Seattle'(Company)) = T(Company) / V(Company, City)
    Cost:
    T(city='Seattle'(Company)) * T(Product) / V(Product, maker)
    or, simplifying,
    T(Company) / V(Company, City) * T(Product) / V(Product, maker)
    Total Cost:
    2a: 3B(Product) + B(Company)
    2b: T(Product) + B(Company)
    Merge-join
    maker=cname
    scity=“Seattle”
    Product(pname,maker)
    Company(cname,city)
    Index-scan
    Scan and sort (2a)index scan (2b)
  • 16. Final Evaluation
    Plan Costs:
    Plan 1: T(Company) / V(Company, city)  T(Product)/V(Product, maker)
    Plan 2a: B(Company) + 3B(Product)
    Plan 2b: B(Company) + T(Product)
    Which is better?
    It depends on the data
  • 17. Example (787) Evaluation Results
    Common assumptions:
    T(Company) = 5,000 B(Company) = 500 M = 100
    T(Product) = 100,000 B(Product) = 1,000
    Assume V(Product, maker)  T(Company)
    Case 2:
    V(Company, city) << T(Company)
    V(Company, city) = 20
    Plan 1: 250  20 = 5,000
    Plan 2a: 3,500
    Plan 2b: 100,500
    Case 1:
    V(Company, city)  T(Company)
    V(Company, city) = 5,000
    Plan 1: 1  20 = 20
    Plan 2a: 3,500
    Plan 2b: 100,500
    Reference from previous page:
    • Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker)
    • 18. Plan 2a: B(Company) + 3B(Product)
    • 19. Plan 2b: B(Company) + T(Product)
  • Lessons
    Need to consider several physical plans
    even for one, simple logical plan
    No magic “best” plan: depends on the data
    In order to make the right choice
    need to have statistics over the data
    the B’s, the T’s, the V’s
  • 20. Query Optimzation
    Have a SQL query Q
    Create a plan P
    Find equivalent plans P = P’ = P’’ = …
    Choose the “cheapest”.
    HOW ??
  • 21. Logical Query Plan
    SELECT P.buyer
    FROM Purchase P, Person Q
    WHERE P.buyer=Q.name AND
    Q.city=‘seattle’ AND
    Q.phone > ‘5430000’
    Plan
    buyer

    City=‘seattle’
    phone>’5430000’
    Buyer=name
    In class:
    find a “better” plan P’
    Person
    Purchase
  • 22. CS 542 Database Management Systems
    Query Optimization – Choosing the Order of Operations
    J Singh
    March 28, 2011
  • 23. Outline
    Convert SQL query to a parse tree
    Semantic checking: attributes, relation names, types
    Convert to a logical query plan (relational algebra expression)
    deal with subqueries
    Improve the logical query plan
    use algebraic transformations
    group together certain operators
    evaluate logical plan based on estimated size of relations
    Convert to a physical query plan
    search the space of physical plans
    choose order of operations
    complete the physical query plan
  • 24. Join Trees
    Recall that the following are equivalent:
    • R ⋈ S ⋈ U
    • 25. R ⋈ (S ⋈ U)
    • 26. (R ⋈ S) ⋈ U
    • 27. S ⋈ (R ⋈ U)
    • 28. But they are not equivalent from an execution viewpoint.
    Considerable research has gone into picking the best order for Joins
  • 29. Join Trees
    R1 ⋈R2 ⋈ …⋈Rn
    Join tree:
    Definitions
    A plan = a join tree
    A partial plan = a subtree of a join tree
    R3
    R1
    R2
    R4
    24
  • 30. Left & Right Join Arguments
    The argument relations in joins determine the cost of the join
    In Physical Query Plans, the left argument of the join is
    Called the build relation
    Assumed to be smaller
    Stored in main-memory
  • 31. Left & Right Join Arguments
    The right argument of the join is
    Called the probe relation
    Read a block at a time
    Its tuples are matched with those of build relation
    The join algorithms which distinguish between the arguments are:
    One-pass join
    Nested-loop join
    Index join
  • 32. Types of Join Trees
    • Right deep
    Left deep:
    Bushy
    R3
    R4
    R1
    R2
    R5
    R3
    R2
    R4
    R5
    R2
    R4
    R3
    R1
    Many different orders, very important to pick the right one
    R5
    R1
  • 33. Optimization Algorithms
    Heuristic based
    Cost based
    Dynamic programming: System R
    Rule-based optimizations: DB2, SQL-Server
  • 34. Dynamic Programming
    Given: a query R1 ⋈R2 ⋈… ⋈Rn
    Assume we have a function cost() that gives us the cost of a join tree
    Find the best join tree for the query
  • 35. Dynamic Programming
    Problem Statement
    Given: a query R1 ⋈ R2 ⋈… ⋈Rn
    Assume we have a function cost() that gives us the cost of a join tree
    Find the best join tree for the query
    Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset
    Algorithm: In increasing order of set cardinality, compute the cost for
    Step 1: for {R1}, {R2}, …, {Rn}
    Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn}

    Step n: for {R1, …, Rn}
    It is a bottom-up strategy
    Skipping further details of the algorithm
    Read from book if interested
    Will not be on the exam
  • 36. Dynamic Programming Algorithm
    • When computing R1 ⋈ R2 ⋈ … ⋈ Rn,
    Best Plan (R1 ⋈ R2 ⋈ … ⋈ Rn) = min cost plan of
    • Best Plan (R2 ⋈ R3 ⋈ … ⋈ Rn) ⋈ R1
    • 37. Best Plan (R1 ⋈ R3 ⋈ … ⋈ Rn) ⋈ R2
    • 38. …
    • 39. Best Plan (R1 ⋈ R2 ⋈ … ⋈ Rn-1) ⋈ Rn
  • Reducing the Search Space
    Left-deep trees vsBushy trees
    Combinatoric explosion of the number of possible trees
    Computing the cost of all possible trees is not feasible
    For a 6-way Join, we can have
    More than 30,000 bushy trees
    6!, or 720 left-deep trees
    Left-deep trees leave their result in memory, making it possible to pipeline efficiently
    Trees without Cartesian product
    Example: R(A,B) ⋈S(B,C) ⋈ T(C,D)
    Plan: (R(A,B) ⋈T(C,D)) ⋈S(B,C) has a Cartesian product
    Most query optimizers will not consider it
  • 40. Outline
    Convert SQL query to a parse tree
    Semantic checking: attributes, relation names, types
    Convert to a logical query plan (relational algebra expression)
    deal with subqueries
    Improve the logical query plan
    use algebraic transformations
    group together certain operators
    evaluate logical plan based on estimated size of relations
    Convert to a physical query plan
    search the space of physical plans
    choose order of operations
    complete the physical query plan
    Three topics
    Choosing the physical implementations (e.g., select and join methods)
    Decisions regarding materialized vs pipelined
    Notation for physical query plans
  • 41. Choosing a Selection Method
    Algorithm for each selection operator
    1. Can we use an created index on an attribute?
    If yes, index-scan. (Otherwise table-scan)
    2. After retrieving all condition-satisfied tuples in (1), filter them with the remaining selection conditions
    In other words,
    When computing c1  c2  …  cn(R), we index-scan on ci, then filter the result on all other ci, where j  i.
    The next 2 pages show an example where we examine several options and pick the best one
  • 42. Selection Method Example (p1)
    Selection: x=1  y=2  z < 5 (R)
    Where parameters of R are:
    T(R) = 5,000 B(R) = 200
    V(R, x) = 100 V(R, y) = 500
    Relation R is clustered
    x and y have non-clustering indices
    z is a clustering index
  • 43. Selection Method Example (p2)
    Selection options:
    Table-scan  filter x, y, z.
    Cost isB(R) = 200since R is clustered.
    Use index on x =1  filter on y, z.
    Cost is 50 sinceT(R) / V(R, x) is (5000/100) = 50 tuples, x is not clustering.
    Use index on y =2  filter on x, z.
    Cost is 10 sinceT(R) / V(R, y) is (5000/500) = 10 tuples, y is not clustering.
    Index-scan on clustering index w/ z < 5 filter x ,y.
    Cost is about B(R)/3 = 67
    Therefore:
    First retrieve all tuples with y = 2 (option 3)
    Then filter for x and z
  • 44. Outline
    Convert SQL query to a parse tree
    Semantic checking: attributes, relation names, types
    Convert to a logical query plan (relational algebra expression)
    deal with subqueries
    Improve the logical query plan
    use algebraic transformations
    group together certain operators
    evaluate logical plan based on estimated size of relations
    Convert to a physical query plan
    search the space of physical plans
    choose order of operations
    complete the physical query plan
    Three topics
    Choosing the physical implementations (e.g., select and join methods)
    Decisions regarding materialized vs pipelined
    Notation for physical query plans
  • 45. Pipelining Versus Materialization
    Materialization
    store (intermediate) result of each operations on disk
    Pipelining
    Interleave the execution of several operations, the tuples produced by one operation are passed directly to the operations that used it
    store (intermediate) result of each operations on buffer, which is implemented on main memory
    Prefer Pipelining where possible
    Sometimes not possible, as the following example shows
    Next few pages, a fully worked-out example
  • 46. R⋈S⋈U Example (p1)
    Consider physical query plan for the expression
    (R(w, x) ⋈ S(x, y)) ⋈ U(y, z)
    Assumption
    R occupies 5,000 blocks, S and U each 10,000 blocks.
    The intermediate result R ⋈ S occupies k blocks for some k.
    Both joins will be implemented as hash-joins, either one-pass or two-pass depending on k
    There are 101 buffers available.
  • 47. R⋈S⋈U Example (p2)
    When joining R ⋈ S, neither relation fits in buffers
    Need two-pass hash-join to partition R
    How many hash buckets for R?
    100 at most
    The 2nd pass hash-join uses 51 buffers, leaving 50 buffers for joining result of R ⋈ S with U.
    Why 51?
  • 48. R⋈S⋈U Example (p3)
    Case 1: Suppose k 49, the result of R ⋈ S occupies at most 49 blocks.
    Steps
    Pipeline in R ⋈ S into 49 buffers
    Organize them for lookup as a hash table
    Use one buffer left to read each block of U in turn
    Execute the second join as one-pass join.
    The total number of I/O’s is 55,000
    45,000 for two-pass hash join of R and S
    10,000 to read U for one-pass hash join of (R⋈ S) ⋈U.
  • 49. R⋈S⋈U Example (p4)
    Case 2: suppose k > 49 but < 5,000, we can still pipeline, but need another strategy where intermediate results join with U in a 50-bucket, two-pass hash-join. Steps are:
    Before start on R ⋈ S, we hash U into 50 buckets of 200 blocks each.
    Perform two-pass hash join of R and U using 51 buffers as case 1, and placing results in 50 remaining buffers to form 50 buckets for the join of R ⋈ S with U.
    Finally, join R ⋈ S with U bucket by bucket.
    The number of disk I/O’s is:
    20,000 to read U and write its tuples into buckets
    45,000 for two-pass hash-join R ⋈ S
    k to write out the buckets of R ⋈ S
    k+10,000 to read the buckets of R ⋈ S and U in the final join
    The total cost is 75,000+2k.
  • 50. R⋈S⋈U Example (p5)
    Case 3: k > 5,000, we cannot perform two-pass join in 50 buffers available if result of R ⋈ S is pipelined. We are forced to materialize the relation R ⋈ S.
    The number of disk I/O’s is:
    45,000 for two-pass hash-join R and S
    k to store R ⋈ S on disk
    30,000 + 3k for two-pass join of U in R ⋈ S
    The total cost is 75,000+4k.
  • 51. R⋈S⋈U Example (p6)
    In summary, costs of physical plan as function of R ⋈ S size.
    Pause and Reflect
    It’s all about the expected size of the intermediate result R ⋈ S
    What would have happened if
    We guessed 45 but had 55? Guessed 55 but only had 45?
    Guessed 4,500 but had 5,500? Guessed 5,500 but only had 4,500?
  • 52. Outline
    Convert SQL query to a parse tree
    Semantic checking: attributes, relation names, types
    Convert to a logical query plan (relational algebra expression)
    deal with subqueries
    Improve the logical query plan
    use algebraic transformations
    group together certain operators
    evaluate logical plan based on estimated size of relations
    Convert to a physical query plan
    search the space of physical plans
    choose order of operations
    complete the physical query plan
    Three topics
    Choosing the physical implementations (e.g., select and join methods)
    Decisions regarding materialized vs pipelined
    Notation for physical query plans
  • 53. Notation for Physical Query Plans
    Several types of operators:
    Operators for leaves
    (Physical) operators for Selection
    (Physical) Sorts Operators
    Other Relational-Algebra Operations
    In practice, each DBMS uses its own internal notation for physical query plans
  • 54. PQP Notation
    Leaves:Replace a leaf in an LQP by
    TableScan(R): Read all blocks
    SortScan(R, L): Read in order according to L
    IndexScan(R, C): Scan R using index attribute A by condition AC
    IndexScan(R, A): Scan R using index attribute A
    Selects: Replace a Select in an LQP by one of the leaf operators plus:
    Filter(D) for condition D
    Sorts: Replace a leaf-level sort as shown above. For other operation,
    Sort(L): Sort a relation that is not stored
    Other Operators: Operation- and algorithm-specific (e.g., Hash-Join)
    Also need to specify # passes, buffer sizes, etc.
  • 55. We have Arrived at the Desired Endpoint
     x=1 AND y=2 AND z<5 (R)
    R ⋈ S ⋈ U
    Example Physical Query Plans
    two-pass
    hash-join
    101 buffers
    Filter(x=1 AND z<5)
    materialize
    IndexScan(R,y=2)
    two-pass
    hash-join
    101 buffers
    TableScan(U)
    TableScan(R)
    TableScan(S)
  • 56. Outline
    Convert SQL query to a parse tree
    Semantic checking: attributes, relation names, types
    Convert to a logical query plan (relational algebra expression)
    deal with subqueries
    Improve the logical query plan
    use algebraic transformations
    group together certain operators
    evaluate logical plan based on estimated size of relations
    Convert to a physical query plan
    search the space of physical plans
    choose order of operations
    complete the physical query plan
  • 57. Optimization Issues and Proposals
    The “fuzz” in estimation of sizes
    Parametric Query Optimization
    Specify alternatives to the execution engine so it may respond to conditions at runtime
    Multiple-query optimization
    Take concurrent execution of several queries into account
    Combinatoric explosion of options when doing an n-way Join
    Becomes really expensive around n > 15
    Alternatives optimizations have been proposed for special situations, but no general framework
    Rule-based optimizers
    Randomized plan generation
  • 58. CS 542 Database Management Systems
    Distributed Query Execution
    Source: Carsten Binnig, Univ of Zurich, 2006
    J Singh
    March 28, 2011
  • 59. Motivation
    Algorithms based on Semi-Joins have been proposed as techniques for query optimization
    They shine in Distributed and Parallel Databases
    Good opportunity to explore them in that context
    Semi-join by example:
    Semi-join formal definition:
  • 60. Distributed / Parallel Join Processing
    Scenario:
    How to compute A ⋈B?
    Table A resides on Node 1
    Table B resides on Node 2
    Node 1
    Node 2
    Table A
    Table B
  • 61. Naïve approach (1)
    Idea: Use standard join and fetch table page-wise from remote node if necessary (send- and receive-operators)
    Example:
    Join is executed on node 2 using a Nested-Loop-Join
    Outer loop: Request page of table A from node 1 (remote)
    Inner loop: For each page iterate over table B and produce output
    => Random access of pages on node 1 (due to network delay)
    Node 1
    Node 2
    Request
    Table A
    Page A1
    Table B
    Send
  • 62. Naïve approach (2)
    Idea: Ship one table completely to the other node
    Example:
    Ship complete table A from node 1 to node 2
    Join table A and B locally on node 2
    • Avoid random page access on node 1
    Node 1
    Node 2
    Table A
    Table A
    Table B
    Ship
  • 63. Naïve Approach: Implications
    Problems:
    High cost for shipping data
    Network cost roughly the same as I/O cost for a hard disk (or even worse because of unpredictability of network delay)
    Shipping A roughly equivalent to a full table scan
    (Trivial) Optimizations:
    Ship always smaller table to the other side
    If query contains a selection, apply selection before sending A
    Note: bigger table may become the smaller table (after selection)
  • 64. Semi-join Approach (p1)
    Idea: Before shipping a table, reduce to data that is shipped to those tuples that are only relevant for join
    Example: Join on A.id=B.id and table A should be shipped to node 2
    Node 1
    Node 2
    Table A
    Table B
  • 65. Semi-join Approach (p2)
    (1) Compute projection B.id of table B on node 2
    (2) Ship column B.id to node 1
    Node 1
    Node 2
    Table A
    Table B
    Ship
  • 66. Semi-join Approach (p3)
    (3) Execute semi-join of B.id and table A on A.id=B.id (to select only relevant tuples of table A => table A’)
    (4) Send result of semi-join (table A’) to node 2
    Node 1
    Node 2
    Table A
    Table B
    Table A’
    Ship
  • 67. Semi-join Approach (p4)
    (5) Join the shipped table A’ locally on node 2 with table B
    => Optimization of this approach: If node 1 holds a join index (e.g., type 1 with A.id -> {B.RID}) we can start with step (3)
    Node 1
    Node 2
    Table A
    Table B
    Table A’
    Ship
  • 68. Semi-join Approach Discussion
    This strategy works well if semi-join reduces size of the table that needs to be shipped
    Assume all rows of Table A are needed anyway => none of the rows of table A can be discarded
    Then this approach is more costly than shipping the entire table A in the first place!
    Consequence:
    Need to decide whether this method makes sense based on semi-join selectivity
    => Cost-based optimization must decide this
  • 69. Bloom-join Approach (p1)
    Algorithm same as semi-join approach
    Ship a bloom-filter instead of (foreign) key column
    Use bloom-filter technique to compress data
    Goal: only send a small bit list (to reduce network I/O) instead of all keys of column (as bit-vector)
    Problems:
    A superset of tuples that might join will be sent back (same problem as in bloom-filters for bitmap-indexes)
    => More tuples must be sent over network and thus net gain depends on good hash function
  • 70. Bloom-join Approach (p2)
    (1) Compute bloom filter BL of size n for column B.id of table B on node 2 with n << |B.id| (e.g., by B.id % n)
    (2) Ship bloom filter B.id’ to node 1
    Node 1
    Node 2
    Table A
    Table B
    Ship
  • 71. Bloom-join Approach (p3)
    (3) Probe bloom filter B.id’ with tuples from table A to get a superset of possible join candidates (=> table A’)
    (4) Send result (table A’) to node 2 (table A’ might contain join candidates that do not have a partner in table B)
    (5) Join the shipped table A’ locally on node 2 with table B
    Node 1
    Node 2
    Table A
    Table B
    Table A’
    Ship
    Probe
  • 72. Bloom-join Approach Discussion
    Communication cost much reduced
    But have to deal with false positives
    Widely used in NoSQLdatabases
  • 73. Project Rubric