Your SlideShare is downloading. ×
0
TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s...
ABOUT MESystems AnalystProgrammingInformation Retrieval
Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point ...
WHO USES CASSANDRA?
SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Replaced Lucene index with Cassandra column f...
SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggrega...
SETTING UPTHE SCHEMA<fields><field name="id" type="string" indexed="true" stored="true"/><field name="name" type="text" in...
WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassan...
Cassandra nodes are set up according to row-key hash.
Data can be written directly to Cassandra
Data is distributed according to row key hash and replicationfactor
DSE first writes toCassandra
And then updates thesecondary index on Solr
The quorum responds with success / failure
Data is now distributed evenly
READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read...
Query is sent to node
Node uses gossip to find who has the information
QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
Querying Cassandradirectly
Cassandra retrievesinformation from columnfamily
Querying Solr index
Row-key hashes arestored in Solr, andCassandra is queried forstored data
Cassandra node sendsrequest to node with thecorresponding hash,returns information
Data is always synced
Both nodes respond with information
Updates can be committed and searched over in real time
PRODUCTION USE• Will want a mix of analytics, search nodes
An OLTP - OLAP integrated solution
TRADEOFFS• Changing the Solr schema requires reindex (standard for Solr)• No multi-valued fields or composite columns
Q&A@PatriciaGorlapgorla@o19s.como19s.com/blog
Upcoming SlideShare
Loading in...5
×

An Introduction to Distributed Search with Datastax Enterprise Search

3,017

Published on

This is an overview of distributed search with Cassandra and Solr.

Published in: Technology

Transcript of "An Introduction to Distributed Search with Datastax Enterprise Search"

  1. 1. TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s.com
  2. 2. ABOUT MESystems AnalystProgrammingInformation Retrieval
  3. 3. Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point of failureCASSANDRA
  4. 4. WHO USES CASSANDRA?
  5. 5. SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Replaced Lucene index with Cassandra column families
  6. 6. SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
  7. 7. Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation
  8. 8. SETTING UPTHE SCHEMA<fields><field name="id" type="string" indexed="true" stored="true"/><field name="name" type="text" indexed="true" stored="true"/><field name="body" type="text" indexed="true" stored="true"/><field name="title" type="text" indexed="true" stored="true"/><field name="date" type="string" indexed="true" stored="true"/></fields>
  9. 9. WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassandra
  10. 10. Cassandra nodes are set up according to row-key hash.
  11. 11. Data can be written directly to Cassandra
  12. 12. Data is distributed according to row key hash and replicationfactor
  13. 13. DSE first writes toCassandra
  14. 14. And then updates thesecondary index on Solr
  15. 15. The quorum responds with success / failure
  16. 16. Data is now distributed evenly
  17. 17. READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read direction affects performance• Data is stored in Cassandra
  18. 18. Query is sent to node
  19. 19. Node uses gossip to find who has the information
  20. 20. QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
  21. 21. Querying Cassandradirectly
  22. 22. Cassandra retrievesinformation from columnfamily
  23. 23. Querying Solr index
  24. 24. Row-key hashes arestored in Solr, andCassandra is queried forstored data
  25. 25. Cassandra node sendsrequest to node with thecorresponding hash,returns information
  26. 26. Data is always synced
  27. 27. Both nodes respond with information
  28. 28. Updates can be committed and searched over in real time
  29. 29. PRODUCTION USE• Will want a mix of analytics, search nodes
  30. 30. An OLTP - OLAP integrated solution
  31. 31. TRADEOFFS• Changing the Solr schema requires reindex (standard for Solr)• No multi-valued fields or composite columns
  32. 32. Q&A@PatriciaGorlapgorla@o19s.como19s.com/blog
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×