TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s...
ABOUT MESystems AnalystProgrammingInformation Retrieval
Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point ...
WHO USES CASSANDRA?
SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Based on Solr• Replaced Lucene index with Cas...
SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggrega...
WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassan...
Information is distributed among Cassandra nodes.
Data can be written directly to Cassandra
Data is distributed according to row key hash and replicationfactor
DSE first writes toCassandra
And then updates theSolr index
The quorum responds with success / failure
Data is now distributed evenly
READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read...
Query is sent to node
Node uses GOSSIP to find who has the information
QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
Querying Cassandradirectly
Cassandra retrievesinformation from columnfamily
Querying Solr index
Solr checks its index, andqueries Cassandra forstored data
Cassandra node checks itscolumn family
Data is always synced (aslong as node is in sync)
Both nodes respond with information
Updates can be committed and searched over in real time
PRODUCTION USE• Will want a mix of analytics, search nodes
An OLTP - OLAP integrated solution
TRADEOFFS• Cassandra is no longer schema-optional: changing the Solrschema requires reindex (standard for Solr)• No multi-...
CONCLUSION• Useful for real-time text-only searches• DSE is promising, but needs to address drawbacks
Q&A@PatriciaGorlapgorla@o19s.com
Upcoming SlideShare
Loading in...5
×

An Introduction to Distributed Search with Cassandra and Solr

14,238

Published on

An Introduction to Distributed Search with Cassandra and Solr by Patricia Gorla

Published in: Technology
1 Comment
13 Likes
Statistics
Notes
No Downloads
Views
Total Views
14,238
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
170
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide

An Introduction to Distributed Search with Cassandra and Solr

  1. 1. TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s.com
  2. 2. ABOUT MESystems AnalystProgrammingInformation Retrieval
  3. 3. Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point of failureCASSANDRA
  4. 4. WHO USES CASSANDRA?
  5. 5. SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Based on Solr• Replaced Lucene index with Cassandra column families
  6. 6. SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
  7. 7. Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation
  8. 8. WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassandra
  9. 9. Information is distributed among Cassandra nodes.
  10. 10. Data can be written directly to Cassandra
  11. 11. Data is distributed according to row key hash and replicationfactor
  12. 12. DSE first writes toCassandra
  13. 13. And then updates theSolr index
  14. 14. The quorum responds with success / failure
  15. 15. Data is now distributed evenly
  16. 16. READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read direction affects performance• Data is stored in Cassandra
  17. 17. Query is sent to node
  18. 18. Node uses GOSSIP to find who has the information
  19. 19. QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
  20. 20. Querying Cassandradirectly
  21. 21. Cassandra retrievesinformation from columnfamily
  22. 22. Querying Solr index
  23. 23. Solr checks its index, andqueries Cassandra forstored data
  24. 24. Cassandra node checks itscolumn family
  25. 25. Data is always synced (aslong as node is in sync)
  26. 26. Both nodes respond with information
  27. 27. Updates can be committed and searched over in real time
  28. 28. PRODUCTION USE• Will want a mix of analytics, search nodes
  29. 29. An OLTP - OLAP integrated solution
  30. 30. TRADEOFFS• Cassandra is no longer schema-optional: changing the Solrschema requires reindex (standard for Solr)• No multi-valued fields or composite columns• All fields must be strings: no date facets for Solr, no countersfor Cassandra
  31. 31. CONCLUSION• Useful for real-time text-only searches• DSE is promising, but needs to address drawbacks
  32. 32. Q&A@PatriciaGorlapgorla@o19s.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×