• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An Introduction to Distributed Search with Cassandra and Solr
 

An Introduction to Distributed Search with Cassandra and Solr

on

  • 8,540 views

An Introduction to Distributed Search with Cassandra and Solr by Patricia Gorla

An Introduction to Distributed Search with Cassandra and Solr by Patricia Gorla

Statistics

Views

Total Views
8,540
Views on SlideShare
8,539
Embed Views
1

Actions

Likes
3
Downloads
91
Comments
1

1 Embed 1

http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An Introduction to Distributed Search with Cassandra and Solr An Introduction to Distributed Search with Cassandra and Solr Presentation Transcript

    • TOO BIGTO FAILAn Introduction to Distributed Search with Cassandra and SolrOpenSource Connections@PatriciaGorlapgorla@o19s.com
    • ABOUT MESystems AnalystProgrammingInformation Retrieval
    • Created at Facebook topower inbox searchDistributed data store runon commodity serversHighly availableNo one single point of failureCASSANDRA
    • WHO USES CASSANDRA?
    • SEARCH + CASSANDRA, 1• First implementation: Solandra (originally Lucandra)• Based on Solr• Replaced Lucene index with Cassandra column families
    • SEARCH + CASSANDRA, 2• DataStax Enterprise Search• Uses native Lucene index• All data is retrieved from Cassandra
    • Datastax Enterprise Search ClusterDistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation
    • WRITINGTO CLUSTER• Write to either Cassandra clients or Solr API• Write process is the same• True atomic updates to Cassandra
    • Information is distributed among Cassandra nodes.
    • Data can be written directly to Cassandra
    • Data is distributed according to row key hash and replicationfactor
    • DSE first writes toCassandra
    • And then updates theSolr index
    • The quorum responds with success / failure
    • Data is now distributed evenly
    • READING FROM CLUSTER• Read either Cassandra-side or through Solr API• Cassandra: fast reads*• Solr: full-text search• Read direction affects performance• Data is stored in Cassandra
    • Query is sent to node
    • Node uses GOSSIP to find who has the information
    • QUERYING CASSANDRA• Can query Solr or Cassandra directly• Limited syntax with CQL, can use solr_query parameter
    • Querying Cassandradirectly
    • Cassandra retrievesinformation from columnfamily
    • Querying Solr index
    • Solr checks its index, andqueries Cassandra forstored data
    • Cassandra node checks itscolumn family
    • Data is always synced (aslong as node is in sync)
    • Both nodes respond with information
    • Updates can be committed and searched over in real time
    • PRODUCTION USE• Will want a mix of analytics, search nodes
    • An OLTP - OLAP integrated solution
    • TRADEOFFS• Cassandra is no longer schema-optional: changing the Solrschema requires reindex (standard for Solr)• No multi-valued fields or composite columns• All fields must be strings: no date facets for Solr, no countersfor Cassandra
    • CONCLUSION• Useful for real-time text-only searches• DSE is promising, but needs to address drawbacks
    • Q&A@PatriciaGorlapgorla@o19s.com