An Introduction to Distributed Search with Datastax Enterprise Search

  • 2,209 views
Uploaded on

This is an overview of distributed search with Cassandra and Solr.

This is an overview of distributed search with Cassandra and Solr.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,209
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
23
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s.com
  • 2. ABOUT MESystems AnalystProgrammingInformation Retrieval
  • 3. Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point of failureCASSANDRA
  • 4. WHO USES CASSANDRA?
  • 5. SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Replaced Lucene index with Cassandra column families
  • 6. SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
  • 7. Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation
  • 8. SETTING UPTHE SCHEMA<fields><field name="id" type="string" indexed="true" stored="true"/><field name="name" type="text" indexed="true" stored="true"/><field name="body" type="text" indexed="true" stored="true"/><field name="title" type="text" indexed="true" stored="true"/><field name="date" type="string" indexed="true" stored="true"/></fields>
  • 9. WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassandra
  • 10. Cassandra nodes are set up according to row-key hash.
  • 11. Data can be written directly to Cassandra
  • 12. Data is distributed according to row key hash and replicationfactor
  • 13. DSE first writes toCassandra
  • 14. And then updates thesecondary index on Solr
  • 15. The quorum responds with success / failure
  • 16. Data is now distributed evenly
  • 17. READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read direction affects performance• Data is stored in Cassandra
  • 18. Query is sent to node
  • 19. Node uses gossip to find who has the information
  • 20. QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
  • 21. Querying Cassandradirectly
  • 22. Cassandra retrievesinformation from columnfamily
  • 23. Querying Solr index
  • 24. Row-key hashes arestored in Solr, andCassandra is queried forstored data
  • 25. Cassandra node sendsrequest to node with thecorresponding hash,returns information
  • 26. Data is always synced
  • 27. Both nodes respond with information
  • 28. Updates can be committed and searched over in real time
  • 29. PRODUCTION USE• Will want a mix of analytics, search nodes
  • 30. An OLTP - OLAP integrated solution
  • 31. TRADEOFFS• Changing the Solr schema requires reindex (standard for Solr)• No multi-valued fields or composite columns
  • 32. Q&A@PatriciaGorlapgorla@o19s.como19s.com/blog