Cassandra integrations

5,726 views

Published on

Talk given at Hadoop DC meetup Dec 2010 about how cassandra integrates with other systems like hadoop and solr

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
5,726
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
138
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Cassandra integrations

    1. 1. Cassandra Integrations Hadoop, Solr, and more T Jake Luciani jake@riptano.com @tjake
    2. 2. Cassandra 101• Peer-to-peer database• Sorted tree like data model• Pluggable partitioning/replication strategies• Multi-master writes• Tunable consistency• Low latency
    3. 3. Cassandra Partitioners
    4. 4. Cassandra Scaling
    5. 5. Cassandra Replication
    6. 6. Cassandra Data Model
    7. 7. For End Users cassandra-cli has been rebuilt in 0.7 to support easy interaction with cassandra.[default@unknown] create keyspace Keyspace1;[default@unknown] use Keyspace1;[default@Keyspace1] create column family Users with comparator=UTF8Type;[default@Keyspace1] set Users[jsmith][first] = John;[default@Keyspace1] set Users[jsmith][last] = Smith;[default@Keyspace1] set Users[jsmith][age] = long(42);[default@Keyspace1] get Users[jsmith]; => (column=last, value=Smith, timestamp=1287604215498000) => (column=first, value=John, timestamp=1287604214111000) => (column=age, value=42, timestamp=1287604216661000)
    8. 8. For Developers• Thrift API supports 16 languages but is hard to use. Prefer well supported abstractions. • Java - hector (http://github.com/rantav/hector) • Python - pycassa (http://github.com/pycassa/pycassa) • Ruby - fauna C* (http://github.com/fauna/cassandra) • PHP - phpcassa (http://github.com/thobbs/phpcassa)• CQL - SQL like interface (in development) • select “col1”, “col2” from ColumnFamily1 where key=”row1”
    9. 9. For Batch Analytics• Hadoop Integration • Cassandra specific InputFormat and OutputFormat • Locality (TaskTrackers on C*Nodes) • Streaming Support (output) • Pig Support • Hive Support (HIVE-1434)
    10. 10. Hadoop+Cassandra Deployed
    11. 11. DEMOS!
    12. 12. For Real-Time• Flume Sink for Cassandra • https://github.com/thobbs/flume-cassandra-plugin• Lucene/Solr Integration • https://github.com/tjake/Lucandra
    13. 13. Solr Components
    14. 14. Lucandra• A Lucene IndexReader and Writer that communicates directly with Cassandra• Replaces Lucene index file format with Cassandra’s data model • Multi-master • Replication • Real-time • Can manage millions of small indexes
    15. 15. Lucandra Data Model
    16. 16. Solandra• Embeds Solr in Cassandra node • Removes RPC layer • Same JVM, in memory reads/writes • Solr becomes aware of Cassandra ring (Index Locality) • Manage N Solr schemas via in Cassandra• New IndexManager caps the number of docs in a given index. Split indexes > cap into sub-indexes• Keep all data for a sub-index on one node• Only one component to maintain!• Use Solr+C* ring to shuttle queries to nodes containing sub-indexes (without users knowledge)
    17. 17. Solandra Deployed
    18. 18. DEMOS!

    ×