Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Solr & Cassandra: Searching Cassandra with DataStax Enterprise

4,623 views

Published on

Wait! Back away from the Cassandra secondary index. It’s ok for some use cases, but it’s not an easy button. “But I need to search through a bunch of columns to look for the data… and I can’t model that in C*, even after watching all of Patrick McFadins data modeling videos. What do I do?” The answer, dear developer, is in DSE Search. With it’s easy Solr API, Lucene indexes (and fault tolerance) you can search data stored in your Cassandra database until your heart’s content. Take my hand. I will show you how.

Published in: Technology

Solr & Cassandra: Searching Cassandra with DataStax Enterprise

  1. 1. Searching Cassandra with Solr
 
 Rachel Pedreschi Lead Technical Evangelist- Datastax Enterprise
 @rachelpedreschi An Introductory Technical Overview of Datastax Enterprise Search
  2. 2. Confidential What is Search? 2
  3. 3. Confidential 3
  4. 4. Confidential 4 The bright blue butterfly hangs on the breeze. [the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze] Tokens
  5. 5. Confidential 5 Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html Terms
  6. 6. Confidential It can be lonely for Solr
 6
  7. 7. Confidential Cassandra 7 ✓ Highly available ✓ Linear scalability ✓ Low latency OLTP queries C*
  8. 8. Confidential 8 + =
  9. 9. Confidential Like… High Availability 9
  10. 10. Data Partitioning Application Data Center 1 hash(key) => token(43) 80 10 3050 70 60 40 20
  11. 11. Application Data Center 1 Replication hash(key) => token(43) replication factor = 3 80 10 3050 70 60 40 20
  12. 12. Multi-Data Center Replication Application Data Center 1 hash(key) => token(43) replication factor = 3 80 10 3050 70 60 40 20 Data Center 2 replication factor = 3 81 11 3151 71 61 41 21
  13. 13. Confidential How does DSE integrate Solr? 13 C* C*/ S O L R
  14. 14. Confidential 14
  15. 15. Confidential 15 SELECT * FROM killrvideo.videos WHERE solr_query='{ "q": "{! edismax qf="name^2 tags^1 description"}datastax" }'; SELECT id, value FROM keyspace.table WHERE token(id) >= -3074457345618258601 AND token(id) <= 3074457345618258603 AND solr_query='id:*'
  16. 16. Confidential Vocab 16 Cassandra term Solr term Column Family /Table Core Row Document Column Field SSTable Index
  17. 17. … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 17 Node memory Node file system Client partition key1 first:Oscar last:Orange level:42 partition key2 first:Ricky last:Red Memtable (corresponds to a CQL table) Coordinator CommitLog AppendOnly … … … … … … … … … … … … SSTables Flush current state to SSTable Compact related
 SSTables W rite 
 <3, Betty, Blue, 63> Acknowledge partition key3 first:Betty last:Blue level:63 Compaction Each write request … Periodically … Periodically …
  18. 18. … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 18 Node memory Node file system Client 1 best 1 2 bright 2,3 Ram Buffer Coordinator … … … … … … … … … … … … Segments Flushes current state to Segment (Softcommit) Write 
 <1,blue, 2,3> 3 blue 2,3 Merge (STW) Each write request … Periodically … On C* Memtable Flush, In memory segments hard commit to disk Shard Router … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …
  19. 19. … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 19 Node memory Node file system 1 best 1 2 bright 2,3 Ram Buffer … … … … … … … … … … … … Segments 3 blue 2,3 Not Searchable Searchable Coordinator Shard Router … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …
  20. 20. Confidential And… Scalability 20
  21. 21. Application 80 10 3050 70 60 40 20Data Center 1
  22. 22. Application Data Center 1 80 10 30 50 70 60 40 20
  23. 23. Application Data Center 1 80 8 32 56 72 64 48 16 24 4040 24
  24. 24. Confidential Even… Improved Performance 24
  25. 25. Confidential 25 Standard Solr Indexing DSE Search Live Indexing
  26. 26. Confidential 26
  27. 27. Confidential Let’s go code diving! 27
  28. 28. Confidential Behind the scenes… 28 // Videos by id CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); // Index for tag keywords CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, userid uuid, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) ); Not a great idea Possible Index
  29. 29. Confidential 29 //Videos by id CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) And this? This? This?
  30. 30. Confidential 30
  31. 31. Confidential 31 1) Spin up a new C* Cluster with search enabled using the DSE installer. $ sudo service dse cassandra -s 2) Run your schema DDL to create the C* keyspace and tables. 3) Run dse_tool on the videos table $ dsetool create_core killrvideo.videos generateResources=true 4) Use the Solr Admin to check sanity and make sure you have a core. 5) Write a CQL query with a Solr Search in it. SELECT * FROM killrvideo.videos WHERE solr_query='{ "q": "{!edismax qf="name^2 tags^1 description "}datastax" }';
  32. 32. Confidential Search all of the things in 5 easy steps… 32
  33. 33. Confidential Resources 33 www.killrvideo.com https://github.com/LukeTillman/killrvideo-csharp www.datastax.com
  34. 34. Confidential 34 Questions?
  35. 35. 35 50% off Priorty Pass: RachelP50 25% Certification: RachelPCert
  36. 36. Thank You @RachelPedreschi

×