Solr

Apache Solr Ratiﬁcation

cdevecchi@gmail.com

Solr – What is it?

• Apache Project
• Open source engine based in lucene
• APIs XML/HTTP e JSON

Features

• Lemmatization

• Hit Highlight

• Dictionaries

• Geosearch

• Faceted Search

• Caching

• Index Replication and Databases Integration

Characteristics

• Java -> Tomcat / Jboss / Jetty

• Schema

• Client solrj

• Jmx statistics

Query

• Highlighting
– Activated by query (hl=true)

• Text Analysis
– Use dictionary and thesaurus
– Relevancy searchs
– Spelling suggestions
– Search by similarity (“More like this”)
– Fuzzy (Damerau-Levenshtein distance)

Query

• Querying data
– Words
– Words by ﬁeld
– Orderly (sort)

• Faceted Search
– Categories

Query

• Faceted Search, the queries could be a problem?
– Exemple

http://localhost:8983/solr/select?

q=video&rows=0&facet=true&facet.ﬁeld=inStock

&facet.query=price:[*+TO+500]

&facet.query=price:[500+TO+*]

&facet.preﬁx=xx&facet.limit=5&facet.mincount=1

Data indexing

• Solr XML native
• CSV
• Database (DIH)
• Rich Documents
• Crawler

Index

• Index is being larger than you imagine?

• Could be adjusted in:
– Index size segments
– Merge index segments

Collections

• It is possible to create separated index by documents kind

Data Replication

• Master / Slave
- Index
- Conﬁg ﬁles

Sharding

• ZooKeeper
– http://hadoop.apache.org/zookeeper/

SolrCaching

• Put searched docs on cache
• Two implementations
– Solr.search.LRUCache (LRU= Least Recently Used in
memory)
– Solr.search.FastLRUCache (a partir da versão 1.4)
• How to use
– ﬁlterCache
– queryResultCache
– documentCache (sobe tudo em memória)

Cluster – Carrot2

• Search Results Clustering Engine

• Search in many nodes

• Live Demo

– http://search.carrot2.org/stable/search

Crawling

• Apache Nutch
– Search, parse and parallel indexing or distributed indexing
– Many formats
• Ex. plain text, html, xml, zip, .doc, javascript, rss, pdf, etc
– Cluster
– MapReduce
– Distributed Filesystem (via hadoop)

Backup / Snapshot

• Active by scripts (solr-tools)

• Index snapshots

• Diferencial backups

– $solr_data/yyyymmdd

Architecture (Índice Distribuído)

Indexing Tests

• Indexing tests
• 7k xml sized, with 111 ﬁelds

• 1,2 milion docs on index

• VM -> 2GB RAM, processor 2.33 Ghz

Indexing Tests

90

44

QPS

61
0
37
5

38 38

References

• http://lucene.apache.org/solr/

• http://wiki.apache.org/solr/

• http://project.carrot2.org/

• http://download.carrot2.org/head/manual/index.html#chapter.introduction

• http://wiki.apache.org/solr/ZooKeeperIntegration

Solr

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Solr

Similar to Solr (20)

Recently uploaded

Recently uploaded (20)

Solr

Editor's Notes