SlideShare a Scribd company logo
1 of 71
Download to read offline
o   o
                           o oo



            DIMENSIONAL DATA IN
            DISTRIBUTED HASH TABLES
                                   fturg Mike Male




                                           STRANGE LOOP 2010
Monday, October 18, 2010
SIMPLEGEO

                             We originally began as a mobile
                                gaming startup, but quickly
                           discovered that the location services
                           and infrastructure needed to support
                            our ideas didn’t exist. So we took
                             matters into our own hands and
                                began building it ourselves.


                             Mt Gaig          Joe Stump
                              CEO & co-founder   CTO & co-founder




                                                     STRANGE LOOP 2010
Monday, October 18, 2010
ABOUT ME

                           MIKE MALONE
                           INFRASTRUCTURE ENGINEER
                           mike@simplegeo.com
                           @mjmalone


        For the last 10 months I’ve been developing a spatial
        database at the core of SimpleGeo’s infrastructure.




                                                  STRANGE LOOP 2010
Monday, October 18, 2010
REQUIREMENTS
                                                Multidimensional
                            Efficiency   Query Speed         Complex Queries
                                                                 Decentralization
                 Locality                            Spatial
                                  Performance
   Fault Tolerance                                           Availability     Distributedness
                                               GOALS
  Replication
                            Reliability                          Simplicity
      Resilience                                                               Conceptual
                                  Integrity                Scalability
                                                                         Operational
                Consistency                         Data
                                         Coherent                             Concurrency
                                                               Query Volume
                            Durability
                                                    Linear

                                                                         STRANGE LOOP 2010
Monday, October 18, 2010
CONFLICT
        Integrity vs Availability

        Locality vs Distributedness

        Reductionism vs Emergence




                                      STRANGE LOOP 2010
Monday, October 18, 2010
DATABASE CANDIDATES
     THE TRANSACTIONAL RELATIONAL DATABASES
           They’re theoretically pure, well understood, and mostly
           standardized behind a relatively clean abstraction
           They provide robust contracts that make it easy to reason
           about the structure and nature of the data they contain
           They’re battle hardened, robust, durable, etc.
     OTHER STRUCTURED STORAGE OPTIONS
           Plain I see you, western youths, see you tramping with the
           foremost, Pioneers! O pioneers!




                                                            STRANGE LOOP 2010
Monday, October 18, 2010
INTEGRITY
                               - vs -
                           AVAILABILITY




                                          STRANGE LOOP 2010
Monday, October 18, 2010
ACID
        These terms are not formally defined - they’re a
        framework, not mathematical axioms

     ATOMICITY
         Either all of a transaction’s actions are visible to another transaction, or none are

     CONSISTENCY
         Application-specific constraints must be met for transaction to succeed

     ISOLATION
         Two concurrent transactions will not see one another’s transactions while “in flight”

     DURABILITY
         The updates made to the database in a committed transaction will be visible to
         future transactions



                                                                            STRANGE LOOP 2010
Monday, October 18, 2010
ACID HELPS
        ACID is a sort-of-formal contract that makes it
        easy to reason about your data, and that’s good

     IT DOES SOMETHING HARD FOR YOU
           With ACID, you’re guaranteed to maintain a persistent global
           state as long as you’ve defined proper constraints and your
           logical transactions result in a valid system state




                                                          STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM
        At PODC 2000 Eric Brewer told us there were three
        desirable DB characteristics. But we can only have two.

     CONSISTENCY
         Every node in the system contains the same data (e.g., replicas are
         never out of date)

     AVAILABILITY
         Every request to a non-failing node in the system returns a response

     PARTITION TOLERANCE
         System properties (consistency and/or availability) hold even when
         the system is partitioned and data is lost

                                                             STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT           SERVER        REPLICA




                                    STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT                  SERVER        REPLICA
                           wre




                                           STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT                  SERVER   plice        REPLICA
                           wre




                                                     STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT                  SERVER   plice        REPLICA
                           wre



                                             ack




                                                     STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT                    SERVER   plice        REPLICA
                            wre



                           aept              ack




                                                       STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT                   SERVER               REPLICA
                           wre
                                                FAIL!

                           ni
                                  UNAVAILAB!




                                                   STRANGE LOOP 2010
Monday, October 18, 2010
CAP THEOREM IN 30 SECONDS

          CLIENT                    SERVER              REPLICA
                            wre
                                                FAIL!

                           aept
                                    CSTT!




                                                   STRANGE LOOP 2010
Monday, October 18, 2010
ACID HURTS
        Certain aspects of ACID encourage (require?)
        implementors to do “bad things”

     Unfortunately, ANSI SQL’s definition of isolation...
         relies in subtle ways on an assumption that a locking scheme is
         used for concurrency control, as opposed to an optimistic or
         multi-version concurrency scheme. This implies that the
         proposed semantics are ill-defined.
                                    Joseph M. Hellerstein and Michael Stonebraker
                                                   Anatomy of a Database System




                                                              STRANGE LOOP 2010
Monday, October 18, 2010
BALANCE
     IT’S A QUESTION OF VALUES
           For traditional databases CAP consistency is the holy grail: it’s
           maximized at the expense of availability and partition
           tolerance
           At scale, failures happen: when you’re doing something a
           million times a second a one-in-a-million failure happens every
           second
           We’re witnessing the birth of a new religion...
            • CAP consistency is a luxury that must be sacrificed at scale in order to
              maintain availability when faced with failures




                                                                    STRANGE LOOP 2010
Monday, October 18, 2010
APACHE CASSANDRA
     A DISTRIBUTED HASH TABLE WITH SOME TRICKS
           Peer-to-peer gossip supports fault tolerance and decentralization
           with a simple operational model
           Random hash-based partitioning provides auto-scaling and
           efficient online rebalancing - read and write throughput increases
           linearly when new nodes are added
           Pluggable replication strategy for multi-datacenter replication
           making the system resilient to failure, even at the datacenter level
           Tunable consistency allow us to adjust durability with the value of
           data being written




                                                                STRANGE LOOP 2010
Monday, October 18, 2010
CASSANDRA
     DATA MODEL
     {                     column family
          “users”: {            key
              “alice”: {
                  “city”: [“St. Louis”, 1287040737182],
                                                          columns
                                                          (name, value, timestamp)
                  “name”: [“Alice” 1287080340940],
                                   ,
              },
              ...
          },
          “locations”: {
          },
          ...
     }


                                                                 STRANGE LOOP 2010
Monday, October 18, 2010
DISTRIBUTED HASH TABLE
     DESTROY LOCALITY (BY HASHING) TO ACHIEVE GOOD
     BALANCE AND SIMPLIFY LINEAR SCALABILITY

      {
                          column family
          “users”: {                                                     fffff 0
                                  key
             “
             alice”: {
                      “city”: [“St. Louis”, 1287040737182],
                                                              columns
                      “name”: [“Alice” 1287080340940],
                                       ,
                },
                ...
          },

      }
          ...
                                                 alice
                                                 bob s3b
                                                                  3e8           STRANGE LOOP 2010
Monday, October 18, 2010
HASH TABLE
     SUPPORTED QUERIES

                    EXACT MATCH
                    RANGE
                    PROXIMITY
                    ANYTHING THAT’S NOT
                    EXACT MATCH


                                          STRANGE LOOP 2010
Monday, October 18, 2010
LOCALITY
                                - vs -
                           DISTRIBUTEDNESS




                                             STRANGE LOOP 2010
Monday, October 18, 2010
THE ORDER PRESERVING
     PARTITIONER
     CASSANDRA’S PARTITIONING
     STRATEGY IS PLUGGABLE
           Partitioner maps keys to nodes
           Random partitioner destroys locality by hashing
           Order preserving partitioner retains locality, storing
           keys in natural lexicographical order around ring        z a
                                                                                      alice
                                                                    a
                                                                                      bob
                                                         u                      h
                                             sam                    m
                                                                          STRANGE LOOP 2010
Monday, October 18, 2010
ORDER PRESERVING PARTITIONER

                 EXACT MATCH
                 RANGE
                           On a single dimension
        ? PROXIMITY


                                            STRANGE LOOP 2010
Monday, October 18, 2010
SPATIAL DATA
     IT’S INHERENTLY MULTIDIMENSIONAL



                           2       x        2, 2


                           1

                               1       2


                                           STRANGE LOOP 2010
Monday, October 18, 2010
DIMENSIONALITY REDUCTION
     WITH SPACE-FILLING CURVES



                           1   2




                           3   4



                                   STRANGE LOOP 2010
Monday, October 18, 2010
Z-CURVE
     SECOND ITERATION




                           STRANGE LOOP 2010
Monday, October 18, 2010
Z-VALUE




                                14
                           x




                               STRANGE LOOP 2010
Monday, October 18, 2010
GEOHASH
     SIMPLE TO COMPUTE
           Interleave the bits of decimal coordinates
           (equivalent to binary encoding of pre-order
           traversal!)
           Base32 encode the result
     AWESOME CHARACTERISTICS
           Arbitrary precision
           Human readable
           Sorts lexicographically


                                      01101
                                                  e      STRANGE LOOP 2010
Monday, October 18, 2010
DATA MODEL
     {
         “record-index”: {
                                       key
                                            <geohash>:<id>
            “9yzgcjn0:moonrise hotel”: {
                “”: [“”, 1287040737182],
            },
            ...
         },
         “records”: {
            “moonrise hotel”: {
                “latitude”: [“38.6554420”, 1287040737182],
                “longitude”: [“-90.2992910”, 1287040737182],
                ...
             }
         }
     }



                                                               STRANGE LOOP 2010
Monday, October 18, 2010
BOUNDING BOX
     E.G., MULTIDIMENSIONAL RANGE

            Gie stuff  bg box!    Gie 2  3


                           1   2


                           3   4


                                      Gie 4  5
                                    STRANGE LOOP 2010
Monday, October 18, 2010
SPATIAL DATA
     STILL MULTIDIMENSIONAL
     DIMENSIONALITY REDUCTION ISN’T PERFECT
           Clients must
            • Pre-process to compose multiple queries
            • Post-process to filter and merge results
           Degenerate cases can be bad, particularly for nearest-neighbor
           queries




                                                           STRANGE LOOP 2010
Monday, October 18, 2010
Z-CURVE LOCALITY




                           STRANGE LOOP 2010
Monday, October 18, 2010
Z-CURVE LOCALITY



                               x
                           x


                                   STRANGE LOOP 2010
Monday, October 18, 2010
Z-CURVE LOCALITY



                               x
                           x


                                   STRANGE LOOP 2010
Monday, October 18, 2010
Z-CURVE LOCALITY



                                        x
                                o o o       x
                                 o
                               o o
                           o
                                                STRANGE LOOP 2010
Monday, October 18, 2010
THE WORLD
     IS NOT BALANCED




                           Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP Digital Archive



                                                                         STRANGE LOOP 2010
Monday, October 18, 2010
TOO MUCH LOCALITY


                           1        2

                    SAN FRANCISCO


                           3        4




                                        STRANGE LOOP 2010
Monday, October 18, 2010
TOO MUCH LOCALITY


                           1        2

                    SAN FRANCISCO


                           3        4




                                        STRANGE LOOP 2010
Monday, October 18, 2010
TOO MUCH LOCALITY


                           1        2   I’m sad.


                    SAN FRANCISCO


                           3        4




                                                   STRANGE LOOP 2010
Monday, October 18, 2010
TOO MUCH LOCALITY
                                                   I’m b.


                           1        2   I’m sad.               Me o.

                    SAN FRANCISCO


                           3        4

                                                    Let’s py xbox.


                                                   STRANGE LOOP 2010
Monday, October 18, 2010
A TURNING POINT



                                STRANGE LOOP 2010
Monday, October 18, 2010
HELLO, DRAWING BOARD
     SURVEY OF DISTRIBUTED P2P INDEXING
           An overlay-dependent index works directly with nodes of the
           peer-to-peer network, defining its own overlay
           An over-DHT index overlays a more sophisticated data
           structure on top of a peer-to-peer distributed hash table




                                                           STRANGE LOOP 2010
Monday, October 18, 2010
ANOTHER LOOK AT POSTGIS
     MIGHT WORK, BUT
           The relational transaction management system (which we’d
           want to change) and access methods (which we’d have to
           change) are tightly coupled (necessarily?) to other parts of
           the system
           Could work at a higher level and treat PostGIS as a black box
            • Now we’re back to implementing a peer-to-peer network with failure
              recovery, fault detection, etc... and Cassandra already had all that.
                • It’s probably clear by now that I think these problems are more
                  difficult than actually storing structured data on disk




                                                                     STRANGE LOOP 2010
Monday, October 18, 2010
LET’S TAKE A STEP BACK




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH




                           STRANGE LOOP 2010
Monday, October 18, 2010
EARTH, TREE, RING




                           STRANGE LOOP 2010
Monday, October 18, 2010
DATA MODEL
     {
         “record-index”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182],
                                                                    ,
                ...
            },
         },
         “record-index-meta”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “split”: [“false”, 1287040737182],
             }
             “layer-name: 37     .875, -90:42.265, -101.25” {
                 “split”: [“true”, 1287040737182],
                 “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182]
                                                                       ,
                 “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182]
                                                                          ,
              }
         }
     }




                                                                                          STRANGE LOOP 2010
Monday, October 18, 2010
DATA MODEL
     {
         “record-index”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182],
                                                                    ,
                ...
            },
         },
         “record-index-meta”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “split”: [“false”, 1287040737182],
             }
             “layer-name: 37     .875, -90:42.265, -101.25” {
                 “split”: [“true”, 1287040737182],
                 “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182]
                                                                       ,
                 “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182]
                                                                          ,
              }
         }
     }




                                                                                          STRANGE LOOP 2010
Monday, October 18, 2010
DATA MODEL
     {
         “record-index”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182],
                                                                    ,
                ...
            },
         },
         “record-index-meta”: {
            “layer-name:37     .875, -90:40.25, -101.25”: {
                “split”: [“false”, 1287040737182],
             }
             “layer-name: 37     .875, -90:42.265, -101.25” {
                 “split”: [“true”, 1287040737182],
                 “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182]
                                                                       ,
                 “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182]
                                                                          ,
              }
         }
     }




                                                                                          STRANGE LOOP 2010
Monday, October 18, 2010
SPLITTING
     IT’S PRETTY MUCH JUST A CONCURRENT TREE
           Splitting shouldn’t lock the tree for reads or writes and failures
           shouldn’t cause corruption
            • Splits are optimistic, idempotent, and fail-forward
            • Instead of locking, writes are replicated to the splitting node and the
              relevant child[ren] while a split operation is taking place
                • Cleanup occurs after the split is completed and all interested nodes are
                  aware that the split has occurred
                • Cassandra writes are idempotent, so splits are too - if a split fails, it is
                  simply be retried
           Split size: A Tunable knob for balancing locality and distributedness
           The other hard problem with concurrent trees is rebalancing - we
           just don’t do it! (more on this later)


                                                                               STRANGE LOOP 2010
Monday, October 18, 2010
THE ROOT IS HOT
     MIGHT BE A DEAL BREAKER
           For a tree to be useful, it has to be traversed
            • Typically, tree traversal starts at the root
            • Root is the only discoverable node in our tree
           Traversing through the root meant reading the root for every
           read or write below it - unacceptable
            • Lots of academic solutions - most promising was a skip graph, but
              that required O(n log(n)) data - also unacceptable
            • Minimum tree depth was propsed, but then you just get multiple hot-
              spots at your minimum depth nodes




                                                                  STRANGE LOOP 2010
Monday, October 18, 2010
BACK TO THE BOOKS
     LOTS OF ACADEMIC WORK ON THIS TOPIC
           But academia is obsessed with provable, deterministic,
           asymptotically optimal algorithms
           And we only need something that is probably fast enough
           most of the time (for some value of “probably” and “most of
           the time”)
            • And if the probably good enough algorithm is, you know... tractable...
              one might even consider it qualitatively better!




                                                                    STRANGE LOOP 2010
Monday, October 18, 2010
REDUCTIONISM
                               - vs -
                            EMERGENCE




                                          STRANGE LOOP 2010
Monday, October 18, 2010
We have   We want




                                    STRANGE LOOP 2010
Monday, October 18, 2010
THINKING HOLISTICALLY
     WE OBSERVED THAT
           Once a node in the tree exists, it doesn’t go away
           Node state may change, but that state only really matters
           locally - thinking a node is a leaf when it really has children is
           not fatal
     SO... WHAT IF WE JUST CACHED NODES THAT
     WERE OBSERVED IN THE SYSTEM!?




                                                                 CLIENT - XXX 2010
Monday, October 18, 2010
CACHE IT
     STUPID SIMPLE SOLUTION
           Keep an LRU cache of nodes that have been traversed
           Start traversals at the most selective relevant node
           If that node doesn’t satisfy you, traverse up the tree
           Along with your result set, return a list of nodes that were
           traversed so the caller can add them to its cache




                                                             STRANGE LOOP 2010
Monday, October 18, 2010
TRAVERSAL
     NEAREST NEIGHBOR


                           o   o
                           o xo




                                   STRANGE LOOP 2010
Monday, October 18, 2010
TRAVERSAL
     NEAREST NEIGHBOR


                           o   o
                           o xo




                                   STRANGE LOOP 2010
Monday, October 18, 2010
TRAVERSAL
     NEAREST NEIGHBOR


                           o   o
                           o xo




                                   STRANGE LOOP 2010
Monday, October 18, 2010
KEY CHARACTERISTICS
     PERFORMANCE
           Best case on the happy path (everything cached) has zero
           read overhead
           Worst case, with nothing cached, O(log(n)) read overhead
     RE-BALANCING SEEMS UNNECESSARY!
           Makes worst case more worser, but so far so good




                                                            CLIENT - XXX 2010
Monday, October 18, 2010
DISTRIBUTED TREE
     SUPPORTED QUERIES

                    EXACT MATCH
                    RANGE
                    PROXIMITY
                    SOMETHING ELSE I HAVEN’T
                    EVEN HEARD OF


                                        STRANGE LOOP 2010
Monday, October 18, 2010
DISTRIBUTED TREE
     SUPPORTED QUERIES
                                        MUL
                    EXACT MATCH      DI P
                    RANGE               NS
                                            NS!
                    PROXIMITY
                    SOMETHING ELSE I HAVEN’T
                    EVEN HEARD OF


                                        STRANGE LOOP 2010
Monday, October 18, 2010
PEACE
        Integrity and Availability

        Locality and Distributedness

        Reductionism and Emergence




                                       STRANGE LOOP 2010
Monday, October 18, 2010
QUESTIONS?

                           MIKE MALONE
                           INFRASTRUCTURE ENGINEER
                           mike@simplegeo.com
                           @mjmalone




                                               STRANGE LOOP 2010
Monday, October 18, 2010
STRANGE LOOP 2010
Monday, October 18, 2010

More Related Content

Viewers also liked

Transaction states and properties
Transaction states and propertiesTransaction states and properties
Transaction states and propertiesChetan Mahawar
 
Concurrency (Distributed computing)
Concurrency (Distributed computing)Concurrency (Distributed computing)
Concurrency (Distributed computing)Sri Prasanna
 
Transactions (Distributed computing)
Transactions (Distributed computing)Transactions (Distributed computing)
Transactions (Distributed computing)Sri Prasanna
 
Deadlock detection in distributed systems
Deadlock detection in distributed systemsDeadlock detection in distributed systems
Deadlock detection in distributed systemsReza Ramezani
 
Deadlock detection and recovery by saad symbian
Deadlock detection and recovery by saad symbianDeadlock detection and recovery by saad symbian
Deadlock detection and recovery by saad symbiansaad symbian
 
Deadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddikDeadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddikSaeed Siddik
 
Computer security ethics_and_privacy
Computer security ethics_and_privacyComputer security ethics_and_privacy
Computer security ethics_and_privacyArdit Meti
 
The Revolution from Transistor to Digital Electronics
The Revolution from Transistor to Digital ElectronicsThe Revolution from Transistor to Digital Electronics
The Revolution from Transistor to Digital ElectronicsSubhajit Bhattacharjee
 
Stack implementation using c
Stack implementation using cStack implementation using c
Stack implementation using cRajendran
 
Distributed Databases
Distributed DatabasesDistributed Databases
Distributed Databaseselliando dias
 
Deadlock detection
Deadlock detectionDeadlock detection
Deadlock detectionNadia Nahar
 

Viewers also liked (19)

Deadlock
DeadlockDeadlock
Deadlock
 
Transaction states and properties
Transaction states and propertiesTransaction states and properties
Transaction states and properties
 
Concurrency (Distributed computing)
Concurrency (Distributed computing)Concurrency (Distributed computing)
Concurrency (Distributed computing)
 
Distributed Transaction
Distributed TransactionDistributed Transaction
Distributed Transaction
 
Transactions (Distributed computing)
Transactions (Distributed computing)Transactions (Distributed computing)
Transactions (Distributed computing)
 
Deadlock detection in distributed systems
Deadlock detection in distributed systemsDeadlock detection in distributed systems
Deadlock detection in distributed systems
 
Deadlock detection and recovery by saad symbian
Deadlock detection and recovery by saad symbianDeadlock detection and recovery by saad symbian
Deadlock detection and recovery by saad symbian
 
Deadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddikDeadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddik
 
Computer security ethics_and_privacy
Computer security ethics_and_privacyComputer security ethics_and_privacy
Computer security ethics_and_privacy
 
The Revolution from Transistor to Digital Electronics
The Revolution from Transistor to Digital ElectronicsThe Revolution from Transistor to Digital Electronics
The Revolution from Transistor to Digital Electronics
 
Computer ethics
Computer ethicsComputer ethics
Computer ethics
 
Seminar algorithms of web
Seminar algorithms of webSeminar algorithms of web
Seminar algorithms of web
 
Stack implementation using c
Stack implementation using cStack implementation using c
Stack implementation using c
 
Computer security and
Computer security andComputer security and
Computer security and
 
Distributed Databases
Distributed DatabasesDistributed Databases
Distributed Databases
 
Dbms acid
Dbms acidDbms acid
Dbms acid
 
Deadlock detection
Deadlock detectionDeadlock detection
Deadlock detection
 
OS - Deadlock
OS - DeadlockOS - Deadlock
OS - Deadlock
 
Hash tables
Hash tablesHash tables
Hash tables
 

Similar to Working with Dimensional data in Distributed Hash Tables

Lean analytics for startups - Leweb2010
Lean analytics for startups - Leweb2010Lean analytics for startups - Leweb2010
Lean analytics for startups - Leweb2010Alistair Croll
 
T-DOSE 2010 - Agile Enterprise, CLouds and Devops
T-DOSE 2010 - Agile Enterprise, CLouds and DevopsT-DOSE 2010 - Agile Enterprise, CLouds and Devops
T-DOSE 2010 - Agile Enterprise, CLouds and DevopsChef Software, Inc.
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with PythonEnthought, Inc.
 
Akka scalaliftoff london_2010
Akka scalaliftoff london_2010Akka scalaliftoff london_2010
Akka scalaliftoff london_2010Skills Matter
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRusty Klophaus
 
Ignite: Devops - Why Should You Care
Ignite: Devops - Why Should You CareIgnite: Devops - Why Should You Care
Ignite: Devops - Why Should You CareJoshua L. Davis
 
BCS-JCF September 2010
BCS-JCF September 2010BCS-JCF September 2010
BCS-JCF September 2010Ian Burgess
 
GoLightly: Building VM-based language runtimes in Go
GoLightly: Building VM-based language runtimes in GoGoLightly: Building VM-based language runtimes in Go
GoLightly: Building VM-based language runtimes in GoEleanor McHugh
 

Similar to Working with Dimensional data in Distributed Hash Tables (8)

Lean analytics for startups - Leweb2010
Lean analytics for startups - Leweb2010Lean analytics for startups - Leweb2010
Lean analytics for startups - Leweb2010
 
T-DOSE 2010 - Agile Enterprise, CLouds and Devops
T-DOSE 2010 - Agile Enterprise, CLouds and DevopsT-DOSE 2010 - Agile Enterprise, CLouds and Devops
T-DOSE 2010 - Agile Enterprise, CLouds and Devops
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with Python
 
Akka scalaliftoff london_2010
Akka scalaliftoff london_2010Akka scalaliftoff london_2010
Akka scalaliftoff london_2010
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 
Ignite: Devops - Why Should You Care
Ignite: Devops - Why Should You CareIgnite: Devops - Why Should You Care
Ignite: Devops - Why Should You Care
 
BCS-JCF September 2010
BCS-JCF September 2010BCS-JCF September 2010
BCS-JCF September 2010
 
GoLightly: Building VM-based language runtimes in Go
GoLightly: Building VM-based language runtimes in GoGoLightly: Building VM-based language runtimes in Go
GoLightly: Building VM-based language runtimes in Go
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Working with Dimensional data in Distributed Hash Tables

  • 1. o o o oo DIMENSIONAL DATA IN DISTRIBUTED HASH TABLES fturg Mike Male STRANGE LOOP 2010 Monday, October 18, 2010
  • 2. SIMPLEGEO We originally began as a mobile gaming startup, but quickly discovered that the location services and infrastructure needed to support our ideas didn’t exist. So we took matters into our own hands and began building it ourselves. Mt Gaig Joe Stump CEO & co-founder CTO & co-founder STRANGE LOOP 2010 Monday, October 18, 2010
  • 3. ABOUT ME MIKE MALONE INFRASTRUCTURE ENGINEER mike@simplegeo.com @mjmalone For the last 10 months I’ve been developing a spatial database at the core of SimpleGeo’s infrastructure. STRANGE LOOP 2010 Monday, October 18, 2010
  • 4. REQUIREMENTS Multidimensional Efficiency Query Speed Complex Queries Decentralization Locality Spatial Performance Fault Tolerance Availability Distributedness GOALS Replication Reliability Simplicity Resilience Conceptual Integrity Scalability Operational Consistency Data Coherent Concurrency Query Volume Durability Linear STRANGE LOOP 2010 Monday, October 18, 2010
  • 5. CONFLICT Integrity vs Availability Locality vs Distributedness Reductionism vs Emergence STRANGE LOOP 2010 Monday, October 18, 2010
  • 6. DATABASE CANDIDATES THE TRANSACTIONAL RELATIONAL DATABASES They’re theoretically pure, well understood, and mostly standardized behind a relatively clean abstraction They provide robust contracts that make it easy to reason about the structure and nature of the data they contain They’re battle hardened, robust, durable, etc. OTHER STRUCTURED STORAGE OPTIONS Plain I see you, western youths, see you tramping with the foremost, Pioneers! O pioneers! STRANGE LOOP 2010 Monday, October 18, 2010
  • 7. INTEGRITY - vs - AVAILABILITY STRANGE LOOP 2010 Monday, October 18, 2010
  • 8. ACID These terms are not formally defined - they’re a framework, not mathematical axioms ATOMICITY Either all of a transaction’s actions are visible to another transaction, or none are CONSISTENCY Application-specific constraints must be met for transaction to succeed ISOLATION Two concurrent transactions will not see one another’s transactions while “in flight” DURABILITY The updates made to the database in a committed transaction will be visible to future transactions STRANGE LOOP 2010 Monday, October 18, 2010
  • 9. ACID HELPS ACID is a sort-of-formal contract that makes it easy to reason about your data, and that’s good IT DOES SOMETHING HARD FOR YOU With ACID, you’re guaranteed to maintain a persistent global state as long as you’ve defined proper constraints and your logical transactions result in a valid system state STRANGE LOOP 2010 Monday, October 18, 2010
  • 10. CAP THEOREM At PODC 2000 Eric Brewer told us there were three desirable DB characteristics. But we can only have two. CONSISTENCY Every node in the system contains the same data (e.g., replicas are never out of date) AVAILABILITY Every request to a non-failing node in the system returns a response PARTITION TOLERANCE System properties (consistency and/or availability) hold even when the system is partitioned and data is lost STRANGE LOOP 2010 Monday, October 18, 2010
  • 11. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA STRANGE LOOP 2010 Monday, October 18, 2010
  • 12. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA wre STRANGE LOOP 2010 Monday, October 18, 2010
  • 13. CAP THEOREM IN 30 SECONDS CLIENT SERVER plice REPLICA wre STRANGE LOOP 2010 Monday, October 18, 2010
  • 14. CAP THEOREM IN 30 SECONDS CLIENT SERVER plice REPLICA wre ack STRANGE LOOP 2010 Monday, October 18, 2010
  • 15. CAP THEOREM IN 30 SECONDS CLIENT SERVER plice REPLICA wre aept ack STRANGE LOOP 2010 Monday, October 18, 2010
  • 16. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA wre FAIL! ni UNAVAILAB! STRANGE LOOP 2010 Monday, October 18, 2010
  • 17. CAP THEOREM IN 30 SECONDS CLIENT SERVER REPLICA wre FAIL! aept CSTT! STRANGE LOOP 2010 Monday, October 18, 2010
  • 18. ACID HURTS Certain aspects of ACID encourage (require?) implementors to do “bad things” Unfortunately, ANSI SQL’s definition of isolation... relies in subtle ways on an assumption that a locking scheme is used for concurrency control, as opposed to an optimistic or multi-version concurrency scheme. This implies that the proposed semantics are ill-defined. Joseph M. Hellerstein and Michael Stonebraker Anatomy of a Database System STRANGE LOOP 2010 Monday, October 18, 2010
  • 19. BALANCE IT’S A QUESTION OF VALUES For traditional databases CAP consistency is the holy grail: it’s maximized at the expense of availability and partition tolerance At scale, failures happen: when you’re doing something a million times a second a one-in-a-million failure happens every second We’re witnessing the birth of a new religion... • CAP consistency is a luxury that must be sacrificed at scale in order to maintain availability when faced with failures STRANGE LOOP 2010 Monday, October 18, 2010
  • 20. APACHE CASSANDRA A DISTRIBUTED HASH TABLE WITH SOME TRICKS Peer-to-peer gossip supports fault tolerance and decentralization with a simple operational model Random hash-based partitioning provides auto-scaling and efficient online rebalancing - read and write throughput increases linearly when new nodes are added Pluggable replication strategy for multi-datacenter replication making the system resilient to failure, even at the datacenter level Tunable consistency allow us to adjust durability with the value of data being written STRANGE LOOP 2010 Monday, October 18, 2010
  • 21. CASSANDRA DATA MODEL { column family “users”: { key “alice”: { “city”: [“St. Louis”, 1287040737182], columns (name, value, timestamp) “name”: [“Alice” 1287080340940], , }, ... }, “locations”: { }, ... } STRANGE LOOP 2010 Monday, October 18, 2010
  • 22. DISTRIBUTED HASH TABLE DESTROY LOCALITY (BY HASHING) TO ACHIEVE GOOD BALANCE AND SIMPLIFY LINEAR SCALABILITY { column family “users”: { fffff 0 key “ alice”: { “city”: [“St. Louis”, 1287040737182], columns “name”: [“Alice” 1287080340940], , }, ... }, } ... alice bob s3b 3e8 STRANGE LOOP 2010 Monday, October 18, 2010
  • 23. HASH TABLE SUPPORTED QUERIES EXACT MATCH RANGE PROXIMITY ANYTHING THAT’S NOT EXACT MATCH STRANGE LOOP 2010 Monday, October 18, 2010
  • 24. LOCALITY - vs - DISTRIBUTEDNESS STRANGE LOOP 2010 Monday, October 18, 2010
  • 25. THE ORDER PRESERVING PARTITIONER CASSANDRA’S PARTITIONING STRATEGY IS PLUGGABLE Partitioner maps keys to nodes Random partitioner destroys locality by hashing Order preserving partitioner retains locality, storing keys in natural lexicographical order around ring z a alice a bob u h sam m STRANGE LOOP 2010 Monday, October 18, 2010
  • 26. ORDER PRESERVING PARTITIONER EXACT MATCH RANGE On a single dimension ? PROXIMITY STRANGE LOOP 2010 Monday, October 18, 2010
  • 27. SPATIAL DATA IT’S INHERENTLY MULTIDIMENSIONAL 2 x 2, 2 1 1 2 STRANGE LOOP 2010 Monday, October 18, 2010
  • 28. DIMENSIONALITY REDUCTION WITH SPACE-FILLING CURVES 1 2 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  • 29. Z-CURVE SECOND ITERATION STRANGE LOOP 2010 Monday, October 18, 2010
  • 30. Z-VALUE 14 x STRANGE LOOP 2010 Monday, October 18, 2010
  • 31. GEOHASH SIMPLE TO COMPUTE Interleave the bits of decimal coordinates (equivalent to binary encoding of pre-order traversal!) Base32 encode the result AWESOME CHARACTERISTICS Arbitrary precision Human readable Sorts lexicographically 01101 e STRANGE LOOP 2010 Monday, October 18, 2010
  • 32. DATA MODEL { “record-index”: { key <geohash>:<id> “9yzgcjn0:moonrise hotel”: { “”: [“”, 1287040737182], }, ... }, “records”: { “moonrise hotel”: { “latitude”: [“38.6554420”, 1287040737182], “longitude”: [“-90.2992910”, 1287040737182], ... } } } STRANGE LOOP 2010 Monday, October 18, 2010
  • 33. BOUNDING BOX E.G., MULTIDIMENSIONAL RANGE Gie stuff  bg box! Gie 2  3 1 2 3 4 Gie 4  5 STRANGE LOOP 2010 Monday, October 18, 2010
  • 34. SPATIAL DATA STILL MULTIDIMENSIONAL DIMENSIONALITY REDUCTION ISN’T PERFECT Clients must • Pre-process to compose multiple queries • Post-process to filter and merge results Degenerate cases can be bad, particularly for nearest-neighbor queries STRANGE LOOP 2010 Monday, October 18, 2010
  • 35. Z-CURVE LOCALITY STRANGE LOOP 2010 Monday, October 18, 2010
  • 36. Z-CURVE LOCALITY x x STRANGE LOOP 2010 Monday, October 18, 2010
  • 37. Z-CURVE LOCALITY x x STRANGE LOOP 2010 Monday, October 18, 2010
  • 38. Z-CURVE LOCALITY x o o o x o o o o STRANGE LOOP 2010 Monday, October 18, 2010
  • 39. THE WORLD IS NOT BALANCED Credit: C. Mayhew & R. Simmon (NASA/GSFC), NOAA/NGDC, DMSP Digital Archive STRANGE LOOP 2010 Monday, October 18, 2010
  • 40. TOO MUCH LOCALITY 1 2 SAN FRANCISCO 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  • 41. TOO MUCH LOCALITY 1 2 SAN FRANCISCO 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  • 42. TOO MUCH LOCALITY 1 2 I’m sad. SAN FRANCISCO 3 4 STRANGE LOOP 2010 Monday, October 18, 2010
  • 43. TOO MUCH LOCALITY I’m b. 1 2 I’m sad. Me o. SAN FRANCISCO 3 4 Let’s py xbox. STRANGE LOOP 2010 Monday, October 18, 2010
  • 44. A TURNING POINT STRANGE LOOP 2010 Monday, October 18, 2010
  • 45. HELLO, DRAWING BOARD SURVEY OF DISTRIBUTED P2P INDEXING An overlay-dependent index works directly with nodes of the peer-to-peer network, defining its own overlay An over-DHT index overlays a more sophisticated data structure on top of a peer-to-peer distributed hash table STRANGE LOOP 2010 Monday, October 18, 2010
  • 46. ANOTHER LOOK AT POSTGIS MIGHT WORK, BUT The relational transaction management system (which we’d want to change) and access methods (which we’d have to change) are tightly coupled (necessarily?) to other parts of the system Could work at a higher level and treat PostGIS as a black box • Now we’re back to implementing a peer-to-peer network with failure recovery, fault detection, etc... and Cassandra already had all that. • It’s probably clear by now that I think these problems are more difficult than actually storing structured data on disk STRANGE LOOP 2010 Monday, October 18, 2010
  • 47. LET’S TAKE A STEP BACK STRANGE LOOP 2010 Monday, October 18, 2010
  • 48. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  • 49. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  • 50. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  • 51. EARTH STRANGE LOOP 2010 Monday, October 18, 2010
  • 52. EARTH, TREE, RING STRANGE LOOP 2010 Monday, October 18, 2010
  • 53. DATA MODEL { “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } } } STRANGE LOOP 2010 Monday, October 18, 2010
  • 54. DATA MODEL { “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } } } STRANGE LOOP 2010 Monday, October 18, 2010
  • 55. DATA MODEL { “record-index”: { “layer-name:37 .875, -90:40.25, -101.25”: { “38.6554420, -90.2992910:moonrise hotel”: [“” 1287040737182], , ... }, }, “record-index-meta”: { “layer-name:37 .875, -90:40.25, -101.25”: { “split”: [“false”, 1287040737182], } “layer-name: 37 .875, -90:42.265, -101.25” { “split”: [“true”, 1287040737182], “child-left”: [“layer-name:37 .875, -90:40.25, -101.25” 1287040737182] , “child-right”: [“layer-name:40.25, -90:42.265, -101.25” 1287040737182] , } } } STRANGE LOOP 2010 Monday, October 18, 2010
  • 56. SPLITTING IT’S PRETTY MUCH JUST A CONCURRENT TREE Splitting shouldn’t lock the tree for reads or writes and failures shouldn’t cause corruption • Splits are optimistic, idempotent, and fail-forward • Instead of locking, writes are replicated to the splitting node and the relevant child[ren] while a split operation is taking place • Cleanup occurs after the split is completed and all interested nodes are aware that the split has occurred • Cassandra writes are idempotent, so splits are too - if a split fails, it is simply be retried Split size: A Tunable knob for balancing locality and distributedness The other hard problem with concurrent trees is rebalancing - we just don’t do it! (more on this later) STRANGE LOOP 2010 Monday, October 18, 2010
  • 57. THE ROOT IS HOT MIGHT BE A DEAL BREAKER For a tree to be useful, it has to be traversed • Typically, tree traversal starts at the root • Root is the only discoverable node in our tree Traversing through the root meant reading the root for every read or write below it - unacceptable • Lots of academic solutions - most promising was a skip graph, but that required O(n log(n)) data - also unacceptable • Minimum tree depth was propsed, but then you just get multiple hot- spots at your minimum depth nodes STRANGE LOOP 2010 Monday, October 18, 2010
  • 58. BACK TO THE BOOKS LOTS OF ACADEMIC WORK ON THIS TOPIC But academia is obsessed with provable, deterministic, asymptotically optimal algorithms And we only need something that is probably fast enough most of the time (for some value of “probably” and “most of the time”) • And if the probably good enough algorithm is, you know... tractable... one might even consider it qualitatively better! STRANGE LOOP 2010 Monday, October 18, 2010
  • 59. REDUCTIONISM - vs - EMERGENCE STRANGE LOOP 2010 Monday, October 18, 2010
  • 60. We have We want STRANGE LOOP 2010 Monday, October 18, 2010
  • 61. THINKING HOLISTICALLY WE OBSERVED THAT Once a node in the tree exists, it doesn’t go away Node state may change, but that state only really matters locally - thinking a node is a leaf when it really has children is not fatal SO... WHAT IF WE JUST CACHED NODES THAT WERE OBSERVED IN THE SYSTEM!? CLIENT - XXX 2010 Monday, October 18, 2010
  • 62. CACHE IT STUPID SIMPLE SOLUTION Keep an LRU cache of nodes that have been traversed Start traversals at the most selective relevant node If that node doesn’t satisfy you, traverse up the tree Along with your result set, return a list of nodes that were traversed so the caller can add them to its cache STRANGE LOOP 2010 Monday, October 18, 2010
  • 63. TRAVERSAL NEAREST NEIGHBOR o o o xo STRANGE LOOP 2010 Monday, October 18, 2010
  • 64. TRAVERSAL NEAREST NEIGHBOR o o o xo STRANGE LOOP 2010 Monday, October 18, 2010
  • 65. TRAVERSAL NEAREST NEIGHBOR o o o xo STRANGE LOOP 2010 Monday, October 18, 2010
  • 66. KEY CHARACTERISTICS PERFORMANCE Best case on the happy path (everything cached) has zero read overhead Worst case, with nothing cached, O(log(n)) read overhead RE-BALANCING SEEMS UNNECESSARY! Makes worst case more worser, but so far so good CLIENT - XXX 2010 Monday, October 18, 2010
  • 67. DISTRIBUTED TREE SUPPORTED QUERIES EXACT MATCH RANGE PROXIMITY SOMETHING ELSE I HAVEN’T EVEN HEARD OF STRANGE LOOP 2010 Monday, October 18, 2010
  • 68. DISTRIBUTED TREE SUPPORTED QUERIES MUL EXACT MATCH DI P RANGE NS NS! PROXIMITY SOMETHING ELSE I HAVEN’T EVEN HEARD OF STRANGE LOOP 2010 Monday, October 18, 2010
  • 69. PEACE Integrity and Availability Locality and Distributedness Reductionism and Emergence STRANGE LOOP 2010 Monday, October 18, 2010
  • 70. QUESTIONS? MIKE MALONE INFRASTRUCTURE ENGINEER mike@simplegeo.com @mjmalone STRANGE LOOP 2010 Monday, October 18, 2010
  • 71. STRANGE LOOP 2010 Monday, October 18, 2010