Cassandra integrations
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Cassandra integrations

  • 6,156 views
Uploaded on

Talk given at Hadoop DC meetup Dec 2010 about how cassandra integrates with other systems like hadoop and solr

Talk given at Hadoop DC meetup Dec 2010 about how cassandra integrates with other systems like hadoop and solr

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
6,156
On Slideshare
6,153
From Embeds
3
Number of Embeds
2

Actions

Shares
Downloads
132
Comments
1
Likes
3

Embeds 3

https://1881dm.jira.com 2
http://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Cassandra Integrations Hadoop, Solr, and more T Jake Luciani jake@riptano.com @tjake
  • 2. Cassandra 101• Peer-to-peer database• Sorted tree like data model• Pluggable partitioning/replication strategies• Multi-master writes• Tunable consistency• Low latency
  • 3. Cassandra Partitioners
  • 4. Cassandra Scaling
  • 5. Cassandra Replication
  • 6. Cassandra Data Model
  • 7. For End Users cassandra-cli has been rebuilt in 0.7 to support easy interaction with cassandra.[default@unknown] create keyspace Keyspace1;[default@unknown] use Keyspace1;[default@Keyspace1] create column family Users with comparator=UTF8Type;[default@Keyspace1] set Users[jsmith][first] = John;[default@Keyspace1] set Users[jsmith][last] = Smith;[default@Keyspace1] set Users[jsmith][age] = long(42);[default@Keyspace1] get Users[jsmith]; => (column=last, value=Smith, timestamp=1287604215498000) => (column=first, value=John, timestamp=1287604214111000) => (column=age, value=42, timestamp=1287604216661000)
  • 8. For Developers• Thrift API supports 16 languages but is hard to use. Prefer well supported abstractions. • Java - hector (http://github.com/rantav/hector) • Python - pycassa (http://github.com/pycassa/pycassa) • Ruby - fauna C* (http://github.com/fauna/cassandra) • PHP - phpcassa (http://github.com/thobbs/phpcassa)• CQL - SQL like interface (in development) • select “col1”, “col2” from ColumnFamily1 where key=”row1”
  • 9. For Batch Analytics• Hadoop Integration • Cassandra specific InputFormat and OutputFormat • Locality (TaskTrackers on C*Nodes) • Streaming Support (output) • Pig Support • Hive Support (HIVE-1434)
  • 10. Hadoop+Cassandra Deployed
  • 11. DEMOS!
  • 12. For Real-Time• Flume Sink for Cassandra • https://github.com/thobbs/flume-cassandra-plugin• Lucene/Solr Integration • https://github.com/tjake/Lucandra
  • 13. Solr Components
  • 14. Lucandra• A Lucene IndexReader and Writer that communicates directly with Cassandra• Replaces Lucene index file format with Cassandra’s data model • Multi-master • Replication • Real-time • Can manage millions of small indexes
  • 15. Lucandra Data Model
  • 16. Solandra• Embeds Solr in Cassandra node • Removes RPC layer • Same JVM, in memory reads/writes • Solr becomes aware of Cassandra ring (Index Locality) • Manage N Solr schemas via in Cassandra• New IndexManager caps the number of docs in a given index. Split indexes > cap into sub-indexes• Keep all data for a sub-index on one node• Only one component to maintain!• Use Solr+C* ring to shuttle queries to nodes containing sub-indexes (without users knowledge)
  • 17. Solandra Deployed
  • 18. DEMOS!