About Me
• Solr
• Cassandra
• Information retrieval

Paul Hostetler - phostetler.com

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?

All ancient Greek philosophers?

Coordinates...
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

...
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

Complex
Al...
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
q=Aristotle&fl=birthPlace

All ancient Greek philosophe...
Approaches to Search
• Google Site Search
• MySQL ‘like’ statements

Seth Casteel - littlefriendsphoto.com

#CASSANDRAEU

...
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

CASSANDRA...
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

and many ...
Inverted Index
[1] Pleasure in the job puts perfection in the work.
[2] Education is the best provision for the journey to...
Defining Field Types
<fieldType name="text_general" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="sol...
Index-side Analysis

Punctuation
Stop Words
Lowercase
Stemming

#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dre...
Query-side Analysis

Punctuation
Stop Words
Lowercase
Stemming
Synonyms
#CASSANDRAEU

Hope is a waking dream.
Hope is a wa...
Faceting
facet_fields: {
tags: [hunting: 1, education: 2, work: 2],
locations: [Stagira: 5, Chalcis: 3]
}

#CASSANDRAEU

C...
Approaches to Distribution
• Pay Google more $$
• MySQL ‘like’ shards
• Master/Slave replication
• SolrCloud

#CASSANDRAEU...
“Distributed search is hard.”

#CASSANDRAEU

CASSANDRASUMMITEU
Solr + Cassandra: Datastax Enterprise
• Full-text search
• Tokenization
• Stemming
• Date ranges
• Aggregation
• High Avai...
Examining DBpedia.org Datasets
Person
<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .
<http://www.w3.org/1999/02/22-rdf-...
“Love is a single soul
inhabiting two bodies.”
#CASSANDRAEU

CASSANDRASUMMITEU
Querying for data
curl http://localhost:8983/solr/solr.person/q=Aristotle

#CASSANDRAEU

CASSANDRASUMMITEU
Filtering by location
curl http://localhost:8983/solr/solr.location/select?q=*:*
&spatial=true&fq={!geofilt pt=40.53027,23...
“There is no great genius without a
mixture of madness.”
#CASSANDRAEU

CASSANDRASUMMITEU
Schema.xml
Unified Schema
<fields>
<field name="id" type="string" />
<field name="name" type="text" />
<dynamicField name=...
Upload to DSE, Create Core
curl http://localhost:8983/solr/resource/solr.location/schema.xml 
--data-binary @solr/location...
Cassandra Schema
Location Schema

gc_grace_seconds=864000 AND

cqlsh:solr> DESC COLUMNFAMILY location;

read_repair_chance...
What Changes
Solr

Cassandra

• No multiValued fields

• No composite columns

• No JOIN*

• No counter columns

#CASSANDRA...
Bringing it All Together
•

Fault tolerant, available
search

#CASSANDRAEU

CASSANDRASUMMITEU
“Thank you.”

#CASSANDRAEU

CASSANDRASUMMITEU
THANK YOU

pgorla@opensourceconnections.com
@patriciagorla
@o19s
All information, including slides, are on http://github.c...
Adventures in Discovery with Cassandra and Solr
Upcoming SlideShare
Loading in...5
×

Adventures in Discovery with Cassandra and Solr

1,708

Published on

For any venture, storing your data is just the first step in making sense of it. How do you make your system discoverable? How do you tune your relevancy to accommodate real-time updates? In this session, we explore pairing Cassandra with Solr using Datastax Enterprise Search, and look at different search paradigms to help your users find patterns in your data.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,708
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Adventures in Discovery with Cassandra and Solr

  1. 1. Adventures in Discoverability with C* and Solr Patricia Gorla, Systems Engineer @patriciagorla @o19s #CASSANDRAEU CASSANDRASUMMITEU
  2. 2. About Me • Solr • Cassandra • Information retrieval Paul Hostetler - phostetler.com #CASSANDRAEU CASSANDRASUMMITEU
  3. 3. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? All ancient Greek philosophers? Coordinates of Stagira? All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  4. 4. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? select birthPlace where name = “Aristotle”; All ancient Greek philosophers? Coordinates of Stagira? select coord where name = “Stagira”; All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  5. 5. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? select birthPlace where name = “Aristotle”; Complex All ancient Greek philosophers? create index on tag; select * where tag = “Greek philosophy”; Coordinates of Stagira? select coord where name = “Stagira”; #CASSANDRAEU All cities within 100km of Stagira ??? CASSANDRASUMMITEU
  6. 6. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? q=Aristotle&fl=birthPlace All ancient Greek philosophers? q=Greek philosophy Coordinates of Stagira? q=Stagira&fl=point All cities within 100km of Stagira q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  7. 7. Approaches to Search • Google Site Search • MySQL ‘like’ statements Seth Casteel - littlefriendsphoto.com #CASSANDRAEU CASSANDRASUMMITEU
  8. 8. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU CASSANDRASUMMITEU
  9. 9. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU and many more! CASSANDRASUMMITEU
  10. 10. Inverted Index [1] Pleasure in the job puts perfection in the work. [2] Education is the best provision for the journey to old age. [3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting. [4] It is the mark of an educated mind to be able to entertain a thought without absorbing it. Term Freq Documents education [2] [4] hunting 3 [3] perfection #CASSANDRAEU 2 1 [1] CASSANDRASUMMITEU
  11. 11. Defining Field Types <fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> #CASSANDRAEU CASSANDRASUMMITEU
  12. 12. Index-side Analysis Punctuation Stop Words Lowercase Stemming #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream CASSANDRASUMMITEU
  13. 13. Query-side Analysis Punctuation Stop Words Lowercase Stemming Synonyms #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream hope wake dream desire awake wish CASSANDRASUMMITEU
  14. 14. Faceting facet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3] } #CASSANDRAEU CASSANDRASUMMITEU
  15. 15. Approaches to Distribution • Pay Google more $$ • MySQL ‘like’ shards • Master/Slave replication • SolrCloud #CASSANDRAEU CASSANDRASUMMITEU
  16. 16. “Distributed search is hard.” #CASSANDRAEU CASSANDRASUMMITEU
  17. 17. Solr + Cassandra: Datastax Enterprise • Full-text search • Tokenization • Stemming • Date ranges • Aggregation • High Availability • Distributed Nature #CASSANDRAEU CASSANDRASUMMITEU
  18. 18. Examining DBpedia.org Datasets Person <http://xmlns.com/foaf/0.1/name> "Aristotle"@en . <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person . <http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en . <http://dbpedia.org/ontology/birthPlace> Stagira <http://dbpedia.org/ontology/deathPlace> Chalcis . Place <http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> . <http://dbpedia.org/resource/Stagira#lat> "40.5916667" . <http://dbpedia.org/resource/Stagira#long> "23.7947222" . <http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en . #CASSANDRAEU CASSANDRASUMMITEU
  19. 19. “Love is a single soul inhabiting two bodies.” #CASSANDRAEU CASSANDRASUMMITEU
  20. 20. Querying for data curl http://localhost:8983/solr/solr.person/q=Aristotle #CASSANDRAEU CASSANDRASUMMITEU
  21. 21. Filtering by location curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  22. 22. “There is no great genius without a mixture of madness.” #CASSANDRAEU CASSANDRASUMMITEU
  23. 23. Schema.xml Unified Schema <fields> <field name="id" type="string" /> <field name="name" type="text" /> <dynamicField name="*Date" type="date" /> <dynamicField name="*Place" type="text" /> <dynamicField name="*Point" type="location" /> <dynamicField name="*_tag" type="text" /> </fields> #CASSANDRAEU CASSANDRASUMMITEU
  24. 24. Upload to DSE, Create Core curl http://localhost:8983/solr/resource/solr.location/schema.xml --data-binary @solr/location_schema.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml --data-binary @solr/location_solrconfig.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location #CASSANDRAEU CASSANDRASUMMITEU
  25. 25. Cassandra Schema Location Schema gc_grace_seconds=864000 AND cqlsh:solr> DESC COLUMNFAMILY location; read_repair_chance=0.100000 AND CREATE TABLE location ( replicate_on_write='true' AND id text PRIMARY KEY, "_docBoost" text, "_dynFld" text, location text, populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; name text, solr_query text, tags text ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND CREATE INDEX solr_location__docBoost_index ON location ("_docBoost"); ... CREATE INDEX solr_location_solr_query_index ON location (solr_query); dclocal_read_repair_chance=0.000000 AND #CASSANDRAEU CASSANDRASUMMITEU
  26. 26. What Changes Solr Cassandra • No multiValued fields • No composite columns • No JOIN* • No counter columns #CASSANDRAEU CASSANDRASUMMITEU
  27. 27. Bringing it All Together • Fault tolerant, available search #CASSANDRAEU CASSANDRASUMMITEU
  28. 28. “Thank you.” #CASSANDRAEU CASSANDRASUMMITEU
  29. 29. THANK YOU pgorla@opensourceconnections.com @patriciagorla @o19s All information, including slides, are on http://github.com/pgorla/million-books #CASSANDRAEU CASSANDRASUMMITEU

×