An Introduction to Distributed Search with Cassandra and Solr

  • 10,004 views
Uploaded on

An Introduction to Distributed Search with Cassandra and Solr by Patricia Gorla

An Introduction to Distributed Search with Cassandra and Solr by Patricia Gorla

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
10,004
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
126
Comments
1
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s.com
  • 2. ABOUT MESystems AnalystProgrammingInformation Retrieval
  • 3. Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point of failureCASSANDRA
  • 4. WHO USES CASSANDRA?
  • 5. SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Based on Solr• Replaced Lucene index with Cassandra column families
  • 6. SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
  • 7. Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation
  • 8. WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassandra
  • 9. Information is distributed among Cassandra nodes.
  • 10. Data can be written directly to Cassandra
  • 11. Data is distributed according to row key hash and replicationfactor
  • 12. DSE first writes toCassandra
  • 13. And then updates theSolr index
  • 14. The quorum responds with success / failure
  • 15. Data is now distributed evenly
  • 16. READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read direction affects performance• Data is stored in Cassandra
  • 17. Query is sent to node
  • 18. Node uses GOSSIP to find who has the information
  • 19. QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
  • 20. Querying Cassandradirectly
  • 21. Cassandra retrievesinformation from columnfamily
  • 22. Querying Solr index
  • 23. Solr checks its index, andqueries Cassandra forstored data
  • 24. Cassandra node checks itscolumn family
  • 25. Data is always synced (aslong as node is in sync)
  • 26. Both nodes respond with information
  • 27. Updates can be committed and searched over in real time
  • 28. PRODUCTION USE• Will want a mix of analytics, search nodes
  • 29. An OLTP - OLAP integrated solution
  • 30. TRADEOFFS• Cassandra is no longer schema-optional: changing the Solrschema requires reindex (standard for Solr)• No multi-valued fields or composite columns• All fields must be strings: no date facets for Solr, no countersfor Cassandra
  • 31. CONCLUSION• Useful for real-time text-only searches• DSE is promising, but needs to address drawbacks
  • 32. Q&A@PatriciaGorlapgorla@o19s.com