o   o
                           o oo



            DIMENSIONAL DATA IN
            DISTRIBUTED HASH TABLES
             ...
SIMPLEGEO

                             We originally began as a mobile
                                gaming startup, bu...
ABOUT ME

                           MIKE MALONE
                           INFRASTRUCTURE ENGINEER
                      ...
REQUIREMENTS
                                                Multidimensional
                            Efficiency   Que...
CONFLICT
        Integrity vs Availability

        Locality vs Distributedness

        Reductionism vs Emergence




   ...
DATABASE CANDIDATES
     THE TRANSACTIONAL RELATIONAL DATABASES
           They’re theoretically pure, well understood, an...
INTEGRITY
                               - vs -
                           AVAILABILITY




                              ...
ACID
        These terms are not formally defined - they’re a
        framework, not mathematical axioms

     ATOMICITY
  ...
ACID HELPS
        ACID is a sort-of-formal contract that makes it
        easy to reason about your data, and that’s good...
CAP THEOREM
        At PODC 2000 Eric Brewer told us there were three
        desirable DB characteristics. But we can onl...
CAP THEOREM IN 30 SECONDS

          CLIENT           SERVER        REPLICA




                                    STRANG...
CAP THEOREM IN 30 SECONDS

          CLIENT                  SERVER        REPLICA
                           wre




   ...
CAP THEOREM IN 30 SECONDS

          CLIENT                  SERVER   plice        REPLICA
                           wr...
CAP THEOREM IN 30 SECONDS

          CLIENT                  SERVER   plice        REPLICA
                           wr...
CAP THEOREM IN 30 SECONDS

          CLIENT                    SERVER   plice        REPLICA
                           ...
CAP THEOREM IN 30 SECONDS

          CLIENT                   SERVER               REPLICA
                           wre...
CAP THEOREM IN 30 SECONDS

          CLIENT                    SERVER              REPLICA
                            wr...
ACID HURTS
        Certain aspects of ACID encourage (require?)
        implementors to do “bad things”

     Unfortunatel...
BALANCE
     IT’S A QUESTION OF VALUES
           For traditional databases CAP consistency is the holy grail: it’s
      ...
APACHE CASSANDRA
     A DISTRIBUTED HASH TABLE WITH SOME TRICKS
           Peer-to-peer gossip supports fault tolerance an...
CASSANDRA
     DATA MODEL
     {                     column family
          “users”: {            key
              “alic...
DISTRIBUTED HASH TABLE
     DESTROY LOCALITY (BY HASHING) TO ACHIEVE GOOD
     BALANCE AND SIMPLIFY LINEAR SCALABILITY

  ...
HASH TABLE
     SUPPORTED QUERIES

                    EXACT MATCH
                    RANGE
                    PROXIMITY...
LOCALITY
                                - vs -
                           DISTRIBUTEDNESS




                           ...
THE ORDER PRESERVING
     PARTITIONER
     CASSANDRA’S PARTITIONING
     STRATEGY IS PLUGGABLE
           Partitioner maps...
ORDER PRESERVING PARTITIONER

                 EXACT MATCH
                 RANGE
                           On a single d...
SPATIAL DATA
     IT’S INHERENTLY MULTIDIMENSIONAL



                           2       x        2, 2


                 ...
DIMENSIONALITY REDUCTION
     WITH SPACE-FILLING CURVES



                           1   2




                          ...
Z-CURVE
     SECOND ITERATION




                           STRANGE LOOP 2010
Monday, October 18, 2010
Z-VALUE




                                14
                           x




                               STRANGE LOO...
GEOHASH
     SIMPLE TO COMPUTE
           Interleave the bits of decimal coordinates
           (equivalent to binary enco...
DATA MODEL
     {
         “record-index”: {
                                       key
                                  ...
BOUNDING BOX
     E.G., MULTIDIMENSIONAL RANGE

            Gie stuff  bg box!    Gie 2  3


                        ...
SPATIAL DATA
     STILL MULTIDIMENSIONAL
     DIMENSIONALITY REDUCTION ISN’T PERFECT
           Clients must
            •...
Z-CURVE LOCALITY




                           STRANGE LOOP 2010
Monday, October 18, 2010
Z-CURVE LOCALITY



                               x
                           x


                                   STR...
Z-CURVE LOCALITY



                               x
                           x


                                   STR...
Z-CURVE LOCALITY



                                        x
                                o o o       x
              ...
THE WORLD
     IS NOT BALANCED




                           Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP D...
TOO MUCH LOCALITY


                           1        2

                    SAN FRANCISCO


                           ...
TOO MUCH LOCALITY


                           1        2

                    SAN FRANCISCO


                           ...
TOO MUCH LOCALITY


                           1        2   I’m sad.


                    SAN FRANCISCO


               ...
TOO MUCH LOCALITY
                                                   I’m b.


                           1        2   I’...
A TURNING POINT



                                STRANGE LOOP 2010
Monday, October 18, 2010
HELLO, DRAWING BOARD
     SURVEY OF DISTRIBUTED P2P INDEXING
           An overlay-dependent index works directly with nod...
ANOTHER LOOK AT POSTGIS
     MIGHT WORK, BUT
           The relational transaction management system (which we’d
         ...
LET’S TAKE A STEP BACK




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH, TREE, RING




                           STRANGE LOOP 2010
Monday, October 18, 2010
DATA MODEL
     {
         “record-index”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “...
DATA MODEL
     {
         “record-index”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “...
DATA MODEL
     {
         “record-index”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “...
SPLITTING
     IT’S PRETTY MUCH JUST A CONCURRENT TREE
           Splitting shouldn’t lock the tree for reads or writes an...
THE ROOT IS HOT
     MIGHT BE A DEAL BREAKER
           For a tree to be useful, it has to be traversed
            • Typi...
BACK TO THE BOOKS
     LOTS OF ACADEMIC WORK ON THIS TOPIC
           But academia is obsessed with provable, deterministi...
REDUCTIONISM
                               - vs -
                            EMERGENCE




                             ...
We have   We want




                                    STRANGE LOOP 2010
Monday, October 18, 2010
THINKING HOLISTICALLY
     WE OBSERVED THAT
           Once a node in the tree exists, it doesn’t go away
           Node ...
CACHE IT
     STUPID SIMPLE SOLUTION
           Keep an LRU cache of nodes that have been traversed
           Start trave...
TRAVERSAL
     NEAREST NEIGHBOR


                           o   o
                           o xo




                   ...
TRAVERSAL
     NEAREST NEIGHBOR


                           o   o
                           o xo




                   ...
TRAVERSAL
     NEAREST NEIGHBOR


                           o   o
                           o xo




                   ...
KEY CHARACTERISTICS
     PERFORMANCE
           Best case on the happy path (everything cached) has zero
           read o...
DISTRIBUTED TREE
     SUPPORTED QUERIES

                    EXACT MATCH
                    RANGE
                    PRO...
DISTRIBUTED TREE
     SUPPORTED QUERIES
                                        MUL
                    EXACT MATCH      D...
PEACE
        Integrity and Availability

        Locality and Distributedness

        Reductionism and Emergence




   ...
QUESTIONS?

                           MIKE MALONE
                           INFRASTRUCTURE ENGINEER
                    ...
STRANGE LOOP 2010
Monday, October 18, 2010
Upcoming SlideShare
Loading in...5
×

Working with Dimensional data in Distributed Hash Tables

9,637

Published on

Recently a new class of database technologies has developed offering massively scalable distributed hash table functionality. Relative to more traditional relational database systems, these systems are simple to operate and capable of managing massive data sets. These characteristics come at a cost though: an impoverished query language that, in practice, can handle little more than exact-match lookups at scale.

This talk will explore the real world technical challenges we faced at SimpleGeo while building a web-scale spatial database on top of Apache Cassandra. Cassandra is a distributed database that falls into the broad category of second-generation systems described above. We chose Cassandra after carefully considering desirable database characteristics based on our prior experiences building large scale web applications. Cassandra offers operational simplicity, decentralized operations, no single points of failure, online load balancing and re-balancing, and linear horizontal scalability.

Unfortunately, Cassandra fell far short of providing the sort of sophisticated spatial queries we needed. We developed a short term solution that was good enough for most use cases, but far from optimal. Long term, our challenge was to bridge the gap without compromising any of the desirable qualities that led us to choose Cassandra in the first place.

The result is a robust general purpose mechanism for overlaying sophisticated data structures on top of distributed hash tables. By overlaying a spatial tree, for example, we’re able to durably persist massive amounts of spatial data and service complex nearest-neighbor and multidimensional range queries across billions of rows fast enough for an online consumer facing application. We continue to improve and evolve the system, but we’re eager to share what we’ve learned so far.

Published in: Technology
2 Comments
32 Likes
Statistics
Notes
No Downloads
Views
Total Views
9,637
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
365
Comments
2
Likes
32
Embeds 0
No embeds

No notes for slide

Working with Dimensional data in Distributed Hash Tables

  1. 1. o o o oo DIMENSIONAL DATA IN DISTRIBUTED HASH TABLES fturg Mike Male STRANGE LOOP 2010 Monday, October 18, 2010
  2. 2. SIMPLEGEO We originally began as a mobile gaming startup, but quickly discovered that the location services and infrastructure needed to support our ideas didn’t exist. So we took matters into our own hands and began building it ourselves. Mt Gaig Joe Stump CEO & co-founder CTO & co-founder STRANGE LOOP 2010 Monday, October 18, 2010
  3. 3. ABOUT ME MIKE MALONE INFRASTRUCTURE ENGINEER mike@simplegeo.com @mjmalone For the last 10 months I’ve been developing a spatial database at the core of SimpleGeo’s infrastructure. STRANGE LOOP 2010 Monday, October 18, 2010
  4. 4. REQUIREMENTS Multidimensional Efficiency Query Speed Complex Queries Decentralization Locality Spatial Performance Fault Tolerance Availability Distributedness GOALS Replication Reliability Simplicity Resilience Conceptual Integrity Scalability Operational Consistency Data Coherent Concurrency Query Volume Durability Linear STRANGE LOOP 2010 Monday, October 18, 2010
  5. 5. CONFLICT Integrity vs Availability Locality vs Distributedness Reductionism vs Emergence STRANGE LOOP 2010 Monday, October 18, 2010
  6. 6. DATABASE CANDIDATES THE TRANSACTIONAL RELATIONAL DATABASES They’re theoretically pure, well understood, and mostly standardized behind a relatively clean abstraction They provide robust contracts that make it easy to reason about the structure and nature of the data they contain They’re battle hardened, robust, durable, etc. OTHER STRUCTURED STORAGE OPTIONS Plain I see you, western youths, see you tramping with the foremost, Pioneers! O pioneers! STRANGE LOOP 2010 Monday, October 18, 2010
  7. 7. INTEGRITY - vs - AVAILABILITY STRANGE LOOP 2010 Monday, October 18, 2010
  8. 8. ACID These terms are not formally defined - they’re a framework, not mathematical axioms ATOMICITY Either all of a transaction’s actions are visible to another transaction, or none are CONSISTENCY Application-specific constraints must be met for transaction to succeed ISOLATION Two concurrent transactions will not see one another’s transactions while “in flight” DURABILITY The updates made to the database in a committed transaction will be visible to future transactions STRANGE LOOP 2010 Monday, October 18, 2010
  9. 9. ACID HELPS ACID is a sort-of-formal contract that makes it easy to reason about your data, and that’s good IT DOES SOMETHING HARD FOR YOU With ACID, you’re guaranteed to maintain a persistent global state as long as you’ve defined proper constraints and your logical transactions result in a valid system state STRANGE LOOP 2010 Monday, October 18, 2010
  10. 10. CAP THEOREM At PODC 2000 Eric Brewer told us there were three desirable DB characteristics. But we can only have two. CONSISTENCY Every node in the system contains the same data (e.g., replicas are never out of date) AVAILABILITY Every request to a non-failing node in the system returns a response PARTITION TOLERANCE System properties (consistency and/or availability) hold even when the system is partitioned and data is lost STRANGE LOOP 2010 Monday, October 18, 2010
  11. 11. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA STRANGE LOOP 2010 Monday, October 18, 2010
  12. 12. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA wre STRANGE LOOP 2010 Monday, October 18, 2010
  13. 13. CAP THEOREM IN 30 SECONDS CLIENT SERVER plice REPLICA wre STRANGE LOOP 2010 Monday, October 18, 2010
  14. 14. CAP THEOREM IN 30 SECONDS CLIENT SERVER plice REPLICA wre ack STRANGE LOOP 2010 Monday, October 18, 2010
  15. 15. CAP THEOREM IN 30 SECONDS CLIENT SERVER plice REPLICA wre aept ack STRANGE LOOP 2010 Monday, October 18, 2010
  16. 16. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA wre FAIL! ni UNAVAILAB! STRANGE LOOP 2010 Monday, October 18, 2010
  17. 17. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA wre FAIL! aept CSTT! STRANGE LOOP 2010 Monday, October 18, 2010
  18. 18. ACID HURTS Certain aspects of ACID encourage (require?) implementors to do “bad things” Unfortunately, ANSI SQL’s definition of isolation... relies in subtle ways on an assumption that a locking scheme is used for concurrency control, as opposed to an optimistic or multi-version concurrency scheme. This implies that the proposed semantics are ill-defined. Joseph M. Hellerstein and Michael Stonebraker Anatomy of a Database System STRANGE LOOP 2010 Monday, October 18, 2010
  19. 19. BALANCE IT’S A QUESTION OF VALUES For traditional databases CAP consistency is the holy grail: it’s maximized at the expense of availability and partition tolerance At scale, failures happen: when you’re doing something a million times a second a one-in-a-million failure happens every second We’re witnessing the birth of a new religion... • CAP consistency is a luxury that must be sacrificed at scale in order to maintain availability when faced with failures STRANGE LOOP 2010 Monday, October 18, 2010
  20. 20. APACHE CASSANDRA A DISTRIBUTED HASH TABLE WITH SOME TRICKS Peer-to-peer gossip supports fault tolerance and decentralization with a simple operational model Random hash-based partitioning provides auto-scaling and efficient online rebalancing - read and write throughput increases linearly when new nodes are added Pluggable replication strategy for multi-datacenter replication making the system resilient to failure, even at the datacenter level Tunable consistency allow us to adjust durability with the value of data being written STRANGE LOOP 2010 Monday, October 18, 2010
  21. 21. CASSANDRA DATA MODEL { column family “users”: { key “alice”: { “city”: [“St. Louis”, 1287040737182], columns (name, value, timestamp) “name”: [“Alice” 1287080340940], , }, ... }, “locations”: { }, ... } STRANGE LOOP 2010 Monday, October 18, 2010
  22. 22. DISTRIBUTED HASH TABLE DESTROY LOCALITY (BY HASHING) TO ACHIEVE GOOD BALANCE AND SIMPLIFY LINEAR SCALABILITY { column family “users”: { fffff 0 key “ alice”: { “city”: [“St. Louis”, 1287040737182], columns “name”: [“Alice” 1287080340940], , }, ... }, } ... alice bob s3b 3e8 STRANGE LOOP 2010 Monday, October 18, 2010
  23. 23. HASH TABLE SUPPORTED QUERIES EXACT MATCH RANGE PROXIMITY ANYTHING THAT’S NOT EXACT MATCH STRANGE LOOP 2010 Monday, October 18, 2010
  24. 24. LOCALITY - vs - DISTRIBUTEDNESS STRANGE LOOP 2010 Monday, October 18, 2010
  25. 25. THE ORDER PRESERVING PARTITIONER CASSANDRA’S PARTITIONING STRATEGY IS PLUGGABLE Partitioner maps keys to nodes Random partitioner destroys locality by hashing Order preserving partitioner retains locality, storing keys in natural lexicographical order around ring z a alice a bob u h sam m STRANGE LOOP 2010 Monday, October 18, 2010
  26. 26. ORDER PRESERVING PARTITIONER EXACT MATCH RANGE On a single dimension ? PROXIMITY STRANGE LOOP 2010 Monday, October 18, 2010
  27. 27. SPATIAL DATA IT’S INHERENTLY MULTIDIMENSIONAL 2 x 2, 2 1 1 2 STRANGE LOOP 2010 Monday, October 18, 2010
  28. 28. DIMENSIONALITY REDUCTION WITH SPACE-FILLING CURVES 1 2 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  29. 29. Z-CURVE SECOND ITERATION STRANGE LOOP 2010 Monday, October 18, 2010
  30. 30. Z-VALUE 14 x STRANGE LOOP 2010 Monday, October 18, 2010
  31. 31. GEOHASH SIMPLE TO COMPUTE Interleave the bits of decimal coordinates (equivalent to binary encoding of pre-order traversal!) Base32 encode the result AWESOME CHARACTERISTICS Arbitrary precision Human readable Sorts lexicographically 01101 e STRANGE LOOP 2010 Monday, October 18, 2010
  32. 32. DATA MODEL { “record-index”: { key <geohash>:<id> “9yzgcjn0:moonrise hotel”: { “”: [“”, 1287040737182], }, ... }, “records”: { “moonrise hotel”: { “latitude”: [“38.6554420”, 1287040737182], “longitude”: [“-90.2992910”, 1287040737182], ... } } } STRANGE LOOP 2010 Monday, October 18, 2010
  33. 33. BOUNDING BOX E.G., MULTIDIMENSIONAL RANGE Gie stuff  bg box! Gie 2  3 1 2 3 4 Gie 4  5 STRANGE LOOP 2010 Monday, October 18, 2010
  34. 34. SPATIAL DATA STILL MULTIDIMENSIONAL DIMENSIONALITY REDUCTION ISN’T PERFECT Clients must • Pre-process to compose multiple queries • Post-process to filter and merge results Degenerate cases can be bad, particularly for nearest-neighbor queries STRANGE LOOP 2010 Monday, October 18, 2010
  35. 35. Z-CURVE LOCALITY STRANGE LOOP 2010 Monday, October 18, 2010
  36. 36. Z-CURVE LOCALITY x x STRANGE LOOP 2010 Monday, October 18, 2010
  37. 37. Z-CURVE LOCALITY x x STRANGE LOOP 2010 Monday, October 18, 2010
  38. 38. Z-CURVE LOCALITY x o o o x o o o o STRANGE LOOP 2010 Monday, October 18, 2010
  39. 39. THE WORLD IS NOT BALANCED Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP Digital Archive STRANGE LOOP 2010 Monday, October 18, 2010
  40. 40. TOO MUCH LOCALITY 1 2 SAN FRANCISCO 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  41. 41. TOO MUCH LOCALITY 1 2 SAN FRANCISCO 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  42. 42. TOO MUCH LOCALITY 1 2 I’m sad. SAN FRANCISCO 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  43. 43. TOO MUCH LOCALITY I’m b. 1 2 I’m sad. Me o. SAN FRANCISCO 3 4 Let’s py xbox. STRANGE LOOP 2010 Monday, October 18, 2010
  44. 44. A TURNING POINT STRANGE LOOP 2010 Monday, October 18, 2010
  45. 45. HELLO, DRAWING BOARD SURVEY OF DISTRIBUTED P2P INDEXING An overlay-dependent index works directly with nodes of the peer-to-peer network, defining its own overlay An over-DHT index overlays a more sophisticated data structure on top of a peer-to-peer distributed hash table STRANGE LOOP 2010 Monday, October 18, 2010
  46. 46. ANOTHER LOOK AT POSTGIS MIGHT WORK, BUT The relational transaction management system (which we’d want to change) and access methods (which we’d have to change) are tightly coupled (necessarily?) to other parts of the system Could work at a higher level and treat PostGIS as a black box • Now we’re back to implementing a peer-to-peer network with failure recovery, fault detection, etc... and Cassandra already had all that. • It’s probably clear by now that I think these problems are more difficult than actually storing structured data on disk STRANGE LOOP 2010 Monday, October 18, 2010
  47. 47. LET’S TAKE A STEP BACK STRANGE LOOP 2010 Monday, October 18, 2010
  48. 48. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  49. 49. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  50. 50. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  51. 51. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  52. 52. EARTH, TREE, RING STRANGE LOOP 2010 Monday, October 18, 2010
  53. 53. DATA MODEL { “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } } } STRANGE LOOP 2010 Monday, October 18, 2010
  54. 54. DATA MODEL { “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } } } STRANGE LOOP 2010 Monday, October 18, 2010
  55. 55. DATA MODEL { “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } } } STRANGE LOOP 2010 Monday, October 18, 2010
  56. 56. SPLITTING IT’S PRETTY MUCH JUST A CONCURRENT TREE Splitting shouldn’t lock the tree for reads or writes and failures shouldn’t cause corruption • Splits are optimistic, idempotent, and fail-forward • Instead of locking, writes are replicated to the splitting node and the relevant child[ren] while a split operation is taking place • Cleanup occurs after the split is completed and all interested nodes are aware that the split has occurred • Cassandra writes are idempotent, so splits are too - if a split fails, it is simply be retried Split size: A Tunable knob for balancing locality and distributedness The other hard problem with concurrent trees is rebalancing - we just don’t do it! (more on this later) STRANGE LOOP 2010 Monday, October 18, 2010
  57. 57. THE ROOT IS HOT MIGHT BE A DEAL BREAKER For a tree to be useful, it has to be traversed • Typically, tree traversal starts at the root • Root is the only discoverable node in our tree Traversing through the root meant reading the root for every read or write below it - unacceptable • Lots of academic solutions - most promising was a skip graph, but that required O(n log(n)) data - also unacceptable • Minimum tree depth was propsed, but then you just get multiple hot- spots at your minimum depth nodes STRANGE LOOP 2010 Monday, October 18, 2010
  58. 58. BACK TO THE BOOKS LOTS OF ACADEMIC WORK ON THIS TOPIC But academia is obsessed with provable, deterministic, asymptotically optimal algorithms And we only need something that is probably fast enough most of the time (for some value of “probably” and “most of the time”) • And if the probably good enough algorithm is, you know... tractable... one might even consider it qualitatively better! STRANGE LOOP 2010 Monday, October 18, 2010
  59. 59. REDUCTIONISM - vs - EMERGENCE STRANGE LOOP 2010 Monday, October 18, 2010
  60. 60. We have We want STRANGE LOOP 2010 Monday, October 18, 2010
  61. 61. THINKING HOLISTICALLY WE OBSERVED THAT Once a node in the tree exists, it doesn’t go away Node state may change, but that state only really matters locally - thinking a node is a leaf when it really has children is not fatal SO... WHAT IF WE JUST CACHED NODES THAT WERE OBSERVED IN THE SYSTEM!? CLIENT - XXX 2010 Monday, October 18, 2010
  62. 62. CACHE IT STUPID SIMPLE SOLUTION Keep an LRU cache of nodes that have been traversed Start traversals at the most selective relevant node If that node doesn’t satisfy you, traverse up the tree Along with your result set, return a list of nodes that were traversed so the caller can add them to its cache STRANGE LOOP 2010 Monday, October 18, 2010
  63. 63. TRAVERSAL NEAREST NEIGHBOR o o o xo STRANGE LOOP 2010 Monday, October 18, 2010
  64. 64. TRAVERSAL NEAREST NEIGHBOR o o o xo STRANGE LOOP 2010 Monday, October 18, 2010
  65. 65. TRAVERSAL NEAREST NEIGHBOR o o o xo STRANGE LOOP 2010 Monday, October 18, 2010
  66. 66. KEY CHARACTERISTICS PERFORMANCE Best case on the happy path (everything cached) has zero read overhead Worst case, with nothing cached, O(log(n)) read overhead RE-BALANCING SEEMS UNNECESSARY! Makes worst case more worser, but so far so good CLIENT - XXX 2010 Monday, October 18, 2010
  67. 67. DISTRIBUTED TREE SUPPORTED QUERIES EXACT MATCH RANGE PROXIMITY SOMETHING ELSE I HAVEN’T EVEN HEARD OF STRANGE LOOP 2010 Monday, October 18, 2010
  68. 68. DISTRIBUTED TREE SUPPORTED QUERIES MUL EXACT MATCH DI P RANGE NS NS! PROXIMITY SOMETHING ELSE I HAVEN’T EVEN HEARD OF STRANGE LOOP 2010 Monday, October 18, 2010
  69. 69. PEACE Integrity and Availability Locality and Distributedness Reductionism and Emergence STRANGE LOOP 2010 Monday, October 18, 2010
  70. 70. QUESTIONS? MIKE MALONE INFRASTRUCTURE ENGINEER mike@simplegeo.com @mjmalone STRANGE LOOP 2010 Monday, October 18, 2010
  71. 71. STRANGE LOOP 2010 Monday, October 18, 2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×