Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enabling Search in your Cassandra Application with DataStax Enterprise

980 views

Published on

.

Published in: Technology
  • Be the first to comment

Enabling Search in your Cassandra Application with DataStax Enterprise

  1. 1. Solutions Engineer @MarcSelwan Marc Selwan Enabling Search in your Cassandra Application with Datastax Enterprise 1
  2. 2. Why Search? Confidential
  3. 3. Confidential
  4. 4. Confidential The bright blue butterfly hangs on the breeze. [the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze] Terms
  5. 5. Confidential Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
  6. 6. What is Solr Missing? Not a Database Doesn’t Cluster Not transparently sharded Requires ETL to injest application data Doesn’t Reindex
  7. 7. Confidential 7 OLTP DB Search Cluster Your Application DB API Search API Your ETL Transactional Workloads Search Workloads Open Source Search Reference Architecture
  8. 8. Confidential + =
  9. 9. DSE Search Reference Architecture Confidential 9 Search + Cassandra 80 10 3050 70 60 40 20 Your Application CQL Easy CQL API All the goodness of DataStax driver Distributed, Replicated, Always On Data locality and shared memory • Automatic indexing on db insert • Higher ingestion throughput • Distributed query optimization Compared to open source search • No separate search cluster to manage • Probably less total hardware required • No “Split Brain” data inconsistencies • No ETL or synch to build and maintain • No app level data management code
  10. 10. Data stored in Cassandra Indexes stored in Solr/Lucene
  11. 11. Disk Memory Solr Cassandra
  12. 12. Disk Memory Mem- Table Index Segments Ram Buffer Index Segments Index Segments Mem- Table Mem- table Index Segments SSTables Commit Log Coordinator Index Segments Shard Router UPDATE videos (videoid, tags) SET tags = {‘cat tubes’, ‘Al Gore’s Internet’, ‘NoSQL Fairytales’} WHERE voided = b3a76c6b-7c7f-4af6-964f-803a9283c401
  13. 13. OSS Solr Disk Memory Index Segments Ram Buffer Index Segments Index Segments Index Segments Index Segments Not Searchable Searchable
  14. 14. DSE Search Disk Memory Index Segments Ram Buffer Index Segments Index Segments Index Segments Index Segments Searchable
  15. 15. Confidential Let’s see this in action!
  16. 16. Search in Retail
  17. 17. Filter queries: These are awesome because the result set gets cached in memory. SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "fq":"categories:Books", "sort":"title asc"}' limit 10; Faceting: Get counts of fields SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "facet":{"field":"categories"}}' limit 10; Geospatial Searches: Supports box and radius SELECT * FROM amazon.clicks WHERE solr_query='{"q":"asin:*", "fq":"+{!geofilt pt="37.7484,-122.4156" sfield=location d=1}"}' limit 10; Joins: Not your relational joins. These queries 'borrow' indexes from other tables to add filter logic. These are fast! SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5; Fun all in one. SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "facet":{"field":"categories"}, "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;
  18. 18. How do you get started??
  19. 19. Confidential 1) Spin up a new C* Cluster with search enabled using the DSE installer. $ sudo service dse cassandra -s 2) Run your schema DDL to create the C* keyspace and tables. 3) Run dse_tool on the videos table* $ dsetool create_core keyspace.table generateResources=true reindex=true 4) Write a CQL query with a Solr Search in it. SELECT * FROM keyspace.table WHERE solr_query=‘column:*’ *This will create lucene indexes on ALL the columns in your table.
  20. 20. Behind the scenes… dse_tool schema.xml solrconfig.xml CQL Query $ dsetool create_core killrvideo.videos generateResources=true <?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> … <fields> <field indexed="true" multiValued="false" name="added_date" stored="true" type="TrieDateField"/> <field indexed="true" multiValued="false" name="location" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="preview_image_location" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="name" termVectors="true" stored="true" type="TextField"/> <field indexed="true" multiValued="true" name="tags" termVectors="true" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="userid" stored="true" type="UUIDField"/> <field indexed="true" multiValued="false" name="videoid" stored="true" type="UUIDField"/> <field indexed="true" multiValued="false" name="location_type" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="description" termVectors="true" stored="true" type="TextField"/> </fields> <uniqueKey>videoid</uniqueKey> </schema> <!-- ======= Copyright DataStax, Inc. Please see the included license file for details. --> <!-- For more details about configurations options that may appear in this file, see http://wiki.apache.org/solr/SolrConfigXml. --> <config> <!-- In all configuration below, a prefix of "solr." for class names is an alias that causes solr to search appropriate packages, including org.apache.solr.(search|update|request|core|analysis) You may also specify a fully qualified Java classname if you have your own custom plugins. --> … SELECT * FROM killrvideo.videos WHERE solr_query=‘name:*’
  21. 21. Thank you! 25

×