Lucandra Lucene + Cassandra http://github/tjake/Lucandra http://twitter.com/tjake Jake Luciani
What we'll cover today: Search use-cases Problems scaling and maintaining Lucene/Solr Cassandra Lucandra Lucandra in Action  Q&A
Types of search apps:  
Types of search apps:  
Lucene/Solr Scaling Problems Writes are expensive on a live system Merge, Reopen, Optimize, Sorting "Too many open files" Solr replication too many moving parts Scaling writes requires client side sharding Lots of grid management -> ZooKeeper? Backups? Monitoring? Failures? Ops Team? Oh my! This sounds a lot like mysql doesn't it?....
Cassandra - Love Child of BigTable and Dynamo Peer to peer (easy to add new nodes) CAP Configurable Multi-level TreeMap (sorta) Pluggable replication/sorting Writes are very fast! Low latency  Integrates with Hadoop  Major adoption and development
Cassandra's Data Model { "bloghost.com" :                                                   // Keyspace        { "Posts" :                                                            // ColumnFamily         { " tjake.bloghost.com " :                                   // Key             { "20100426-Lucandra" : "lucandra talk today!" } // Columns                }        },      { "Comments" :                                         // SuperColumnFamily          { " tjake.bloghost.com " :                        // Key             { "20100426-Lucandra-1":                // SuperColumn                {"From" : "Otis","Comment": "Don't Suck!"}, // Columns                },             { "20100426-Lucandra-2":                // SuperColumn                 {"From" : "Jake","Comment": "O.K."},  // Columns                          },       } }}
Cassandra - Partitioning
Cassandra - Scale Up / Scale Down
Cassandra - Replication
Solr/Lucene Components
Lucandra Components
How is an index stored? { "Lucandra" :    { "Docs" :                         {  "Index1/Doc1" :  { "Field1" : "T1 T2 T1", ... },        {  "Index1/Doc2" :  { "Field1" : "T3 T1", ... }     },     {"TermVectors" :        {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] },        {"Index1/Field1/T2" : { "Doc1": [1] },        {"Index1/Field1/T3" : { "Doc2": [1] },     } }
Lucandra Deployed
Lucandra In Action Sparse.ly and Wikassandra
sparse.ly -  twitter search for friends only ~4k Indexes on 2 boxes
Wikassandra - Search wikipedia 4 node cluster 3k writes per sec (over thrift from single node) Solr interface

Lucandra

  • 1.
    Lucandra Lucene +Cassandra http://github/tjake/Lucandra http://twitter.com/tjake Jake Luciani
  • 2.
    What we'll covertoday: Search use-cases Problems scaling and maintaining Lucene/Solr Cassandra Lucandra Lucandra in Action  Q&A
  • 3.
  • 4.
  • 5.
    Lucene/Solr Scaling ProblemsWrites are expensive on a live system Merge, Reopen, Optimize, Sorting "Too many open files" Solr replication too many moving parts Scaling writes requires client side sharding Lots of grid management -> ZooKeeper? Backups? Monitoring? Failures? Ops Team? Oh my! This sounds a lot like mysql doesn't it?....
  • 6.
    Cassandra - LoveChild of BigTable and Dynamo Peer to peer (easy to add new nodes) CAP Configurable Multi-level TreeMap (sorta) Pluggable replication/sorting Writes are very fast! Low latency  Integrates with Hadoop  Major adoption and development
  • 7.
    Cassandra's Data Model {"bloghost.com" :                                                   // Keyspace      { "Posts" :                                                            // ColumnFamily        { " tjake.bloghost.com " :                                   // Key            { "20100426-Lucandra" : "lucandra talk today!" } // Columns                }        },      { "Comments" :                                         // SuperColumnFamily          { " tjake.bloghost.com " :                        // Key            { "20100426-Lucandra-1":                // SuperColumn                {"From" : "Otis","Comment": "Don't Suck!"}, // Columns               },            { "20100426-Lucandra-2":                // SuperColumn                {"From" : "Jake","Comment": "O.K."},  // Columns                         },      } }}
  • 8.
  • 9.
    Cassandra - ScaleUp / Scale Down
  • 10.
  • 11.
  • 12.
  • 13.
    How is anindex stored? { "Lucandra" :    { "Docs" :                         {  "Index1/Doc1" :  { "Field1" : "T1 T2 T1", ... },        {  "Index1/Doc2" :  { "Field1" : "T3 T1", ... }    },    {"TermVectors" :        {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] },        {"Index1/Field1/T2" : { "Doc1": [1] },        {"Index1/Field1/T3" : { "Doc2": [1] },    } }
  • 14.
  • 15.
    Lucandra In ActionSparse.ly and Wikassandra
  • 16.
    sparse.ly -  twittersearch for friends only ~4k Indexes on 2 boxes
  • 17.
    Wikassandra - Searchwikipedia 4 node cluster 3k writes per sec (over thrift from single node) Solr interface

Editor's Notes