SlideShare a Scribd company logo
Apache Solr Technical Document
Contents
Requirements................................................................................................................................................3
Solution - Solr................................................................................................................................................3
Features ....................................................................................................................................................3
Typical Solr Setup Diagram .......................................................................................................................4
Basic Solr Concepts ...................................................................................................................................4
1. Indexing.............................................................................................................................................4
2. How Solr represents data..................................................................................................................5
Installing Solr.............................................................................................................................................7
Starting Solr...............................................................................................................................................7
Indexing Data............................................................................................................................................7
Searching...................................................................................................................................................8
Faceting.................................................................................................................................................9
Highlighting.........................................................................................................................................10
Spell Checking .....................................................................................................................................10
Relevance............................................................................................................................................10
Shutdown................................................................................................................................................10
Screen Shots............................................................................................................................................11
Apache SolrCloud........................................................................................................................................15
Features ..................................................................................................................................................15
Simple two shard cluster.........................................................................................................................15
Dealing with high volume of data...........................................................................................................18
Dealing with failure.................................................................................................................................19
Synchronization of data (added/updated in DB) with Solr.....................................................................20
Limitations ..............................................................................................................................................20
Screen Shots............................................................................................................................................21
Integration with .Net using SolrNet........................................................................................................23
Requirements
a. Fast and full text search capabilities
b. Optimization of huge data on web traffic
c. Highly and linearly scalable on demand
d. Plug with any platform
e. Near real time search and indexing
f. Flexible and Adaptable with XML,JSON,CSV configuration
Solution - Solr
Solr is a standalone enterprise search server with a REST-like API. You put documents in it
(called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and
receive XML, JSON, CSV or binary results.
Features
 Advanced Full-Text Search Capabilities
 Optimized for High Volume Web Traffic
 Standards Based Open Interfaces - XML, JSON and HTTP
 Comprehensive HTML Administration Interfaces
 Linearly scalable, auto index replication, auto failover and recovery
 Near Real-time indexing
 Flexible and Adaptable with XML configuration
 Extensible Plugin Architecture
 Easily manage multilingual support
Typical Solr Setup Diagram
Figure 1 Typical Solr Setup Diagram
Basic Solr Concepts
In this document, we'll cover the basics of what you need to know about Solr in order to use it.
1. Indexing
Solr is able to achieve fast search responses because, instead of searching the text directly, it
searches an index instead.
This is like retrieving pages in a book related to a keyword by scanning the index at the back of
a book, as opposed to searching every word of every page of the book.
This type of index is called an inverted index, because it inverts a page-centric data structure
(page->words) to a keyword-centric data structure (word->pages).
Solr stores this index in a directory called index in the data directory.
2. How Solr represents data
In Solr, a Document is the unit of search and index.
An index consists of one or more Documents, and a Document consists of one or more Fields.
Schema
Before adding documents to Solr, you need to specify the schema, represented in a file
called schema.xml. It is not advisable to change the schema after documents have been added
to the index.
The schema declares:
o what kinds of fields there are
o which field should be used as the unique/primary key
o which fields are required
o how to index and search each field
Field Types
In Solr, every field has a type.
Examples of basic field types available in Solr include:
o float
o long
o double
o date
o text
Defining a field
Here's what a field declaration looks like:
<field name="id" type="text" indexed="true" stored="true"multiValued="true"/>
o name: Name of the field
o type: Field type
o indexed: this field be added to the inverted index
o stored: the original value of this field be stored
o multivalued: this field have multiple values
The indexed and stored attributes are important.
Analysis
When data is added to Solr, it goes through a series of transformations before being added to
the index. This is called the analysis phase. Examples of transformations include lower-casing,
removing word stems etc. The end result of the analysis is a series of tokens which are then
added to the index. Tokens, not the original text, are what are searched when you perform a
search query.
Indexed fields are fields which undergo an analysis phase, and are added to the index.
Term Storage
When we displaying search results to users, they generally expect to see the original document,
not the machine-processed token.
That's the purpose of the stored attribute to tell Solr to store the original text in the index
somewhere.
Sometimes, there are fields which aren't searched, but need to display in the search results.
You accomplish that by setting the field attributes to stored=true and indexed=false.
So, why wouldn't you store all the fields all the time?
Because storing fields increases the size of the index, and the larger the index, the slower the
search. In terms of physical computing, we'd say that a larger index requires more disk seeks to
get to the same amount of data.
Installing Solr
You should also have JDK 5 or above installed.
Begin by unziping the Solr release and changing your working directory to be the "example"
directory.
unzip –q apache-solr-4.1.0.zip
cd apache-solr-4.1.0/example/
Starting Solr
Solr comes with an example directory which contains some sample files we can use.
We start this example server with java -jar start.jar.
cd example
java -jar start.jar
You should see something like this in the terminal.
2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT
....
2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983
Solr is now running! You can now access the Solr Admin webapp by loading
http://localhost:8983/solr/admin/ in your web browser.
Indexing Data
We're now going to add some sample data to our Solr instance.
The exampledocs folder contains some XML files we can posting them from the command line
cd exampledocs
java -jar post.jar solr.xml monitor.xml
That produces:
SimplePostTool: POSTing files to http://localhost:8983/solr/update.
SimplePostTool: POSTing file solr.xml
SimplePostTool: POSTing file monitor.xml
SimplePostTool: COMMITting Solr index changes.
This response tells us that the POST operation was successful.
You can also index all of the sample data, using the following command (assuming your
command line shell supports the *.xml notation):
cd exampledocs
java -jar post.jar *.xml
Searching
Let's see if we can retrieve the document we just added below URL on browser.
Since Solr accepts HTTP requests, you can use your web browser to communicate with
Solr: http://localhost:8983/solr/select?q=*:*&wt=json
This returns the following JSON result:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"wt": "json",
"q": "*:*"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "3007WFP",
"name": "Dell Widescreen UltraSharp 3007WFP",
"manu": "Dell, Inc.",
"includes": "USB cable",
"weight": 401.6,
"price": 2199,
"popularity": 6,
"inStock": true,
"store": "43.17614,-90.57341",
"cat": [
"electronics",
"monitor"
],
"features": [
"30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast"
]
}
]
}
}
Faceting
Faceting is the arrangement of search results into categories based on indexed terms. Searchers
are presented with the indexed terms along with numerical counts of how many matching
documents were found were each term. Faceting makes it easy for users to explore search
results, narrowing in on exactly the results they are looking for.
Highlighting
Highlighting in Solr allows fragments of documents that match the user's query to be included
with the query response. The fragments are included in a special section of the response
(the highlighting section), and the client uses the formatting clues also included to determine
how to present the snippets to users.
Spell Checking
The Spellcheck component is designed to provide inline query suggestions based on other,
similar, terms.
Relevance
Relevance is the degree to which a query response satisfies a user who is searching for
information.
The relevance of a query response depends on the context in which the query was performed.
A single search application may be used in different contexts by users with different needs and
expectations. For example, a search engine of climate data might be used by a university
researcher studying long-term climate trends, a farmer interested in calculating the likely date
of the last frost of spring, a civil engineer interested in rainfall patterns and the frequency of
floods, and a college student planning a vacation to a region and wondering what to pack.
Because the motivations of these users vary, the relevance of any particular response to a
query will vary as well.
Shutdown
To shut down Solr, from the terminal where you launched Solr, hit Ctrl+C. This will shut down
Solr cleanly.
Link: http://lucene.apache.org/solr/3_6_2/doc-files/tutorial.html
http://www.solrtutorial.com/
https://cwiki.apache.org/confluence/display/solr/
Screen Shots
Figure 2 Solr Admin UI-Dashboard Screen
Figure 3 Solr Admin UI-Collection Detail Screen
Figure 4 Solr Admin UI-Query Result Screen
Figure 5 Solr Admin UI-Fetching Data from Database Using DataImportHandler
Figure 6 Solr Admin UI-Schema.xml Screen
Figure 7 Solr Admin UI-SolrConfig.xml Screen
Figure 8 Solr Admin UI-Core Admin Detail Screen
Figure 9 Solr Admin UI-Java Properties Screen
Apache SolrCloud
SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to
enable these capabilities will enable you to set up a highly available, fault tolerant cluster of
Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and
search capabilities.
Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination -
think of it as a distributed filesystem that contains information about all of the Solr servers.
Note: reset all configurations and remove documents from the tutorial before going through
the cloud features.
Features
 Centralized Apache ZooKeeper based configuration
 Automated distributed indexing/sharding - send documents to any node and it will be
forwarded to correct shard
 Near Real-Time indexing
 Transaction log ensures no updates are lost even if the documents are not yet indexed to
disk
 Automated query failover, index leader election and recovery in case of failure
 No single point of failure
Simple two shard cluster
Figure 10 Simple Two Shard Cluster Image
This example simply creates a cluster consisting of two solr servers representing two different
shards of a collection.
Since we'll need two solr servers for this example, simply make a copy of the example directory
for the second server -- making sure you don't have any data already indexed.
rm -r example/solr/collection1/data/*
cp -r example example2
This command starts up a Solr server and bootstraps a new solr cluster.
cd example
java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -
DnumShards=2 -jar start.jar
 -DzkRun causes an embedded zookeeper server to be run as part of this Solr server.
 -Dbootstrap_confdir=./solr/collection1/conf, this parameter causes the local
configuration directory ./solr/conf to be uploaded as the "myconf" config. The name
"myconf" is taken from the "collection.configName" param below.
 -Dcollection.configName=myconf sets the config to use for the new collection.
 -DnumShards=2 the number of logical partitions we plan on splitting the index into.
Browse to http://localhost:8983/solr/#/~cloud to see the state of the cluster (the zookeeper
distributed filesystem).
You can see from the zookeeper browser that the Solr configuration files were uploaded under
"myconf", and that a new document collection called "collection1" was created. Under
collection1 is a list of shards, the pieces that make up the complete collection.
Now we want to start up our second server - it will automatically be assigned to shard2 because
we don't explicitly set the shard id.
Then start the second server, pointing it at the cluster:
cd example2
java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
 -Djetty.port=7574 is just one way to tell the Jetty servlet container to use a different
port.
 -DzkHost=localhost: 9983 points to the Zookeeper ensemble containing the cluster
state. In this example we're running a single Zookeeper server embedded in the first Solr
server. By default, an embedded Zookeeper server runs at the Solr port plus 1000, so
9983.
If you refresh the zookeeper browser, you should now see both shard1 and shard2 in
collection1. View http://localhost:8983/solr/#/~cloud.
Next, index some documents.
cd exampledocs
java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_video.xml
java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor.xml
java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar mem.xml
And now, a request to either server results in a distributed search that covers the entire
collection:
http://localhost:8983/solr/collection1/select?q=*:*
If at any point you wish to start over fresh or experiment with different configurations, you can
delete all of the cloud state contained within zookeeper by simply deleting the solr/zoo_data
directory after shutting down the servers.
Dealing with high volume of data
Solution: If the data volume goes high then creating more shards or splitting shard with
physical memory and storage in existing cluster cloud environment.
Figure 11 Creating Shard and Replica when volume goes high
Link: http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-from-
500000-volumes-5-million-volumes-and-beyond
Dealing with failure
Solution:
a. Failure of zookeeper: To avoid failure keeping zookeeper in two separate server so
if one goes down then other can work because zookeeper has maintain all the
cluster state and configuration information .
b. Failure of Solr shard: We can create the replica of each shard so if one shard goes
down then replica can do our job.
Figure 12 Diagram which handling failure scenario
Link:
https://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_a
nd_zookeeper_ensemble
Synchronization of data (added/updated in DB) with Solr
Solution:
a. We can create the cron job which can fetch data from database and updating
index in Solr.
b. Another option is that as and when data is added/update in frontend, after
inserting/updating data in database from business layer, we can add piece of code
which can add/update data using update Solr APIs (as we have integration with
.net we can use SolrNet library which provides such addition/updation APIs).
Link: http://wiki.apache.org/solr/DataImportHandler#Scheduling
http://stackoverflow.com/questions/6463844/how-to-index-data-in-solr-from-database-
automatically
Limitations
1. No more than 50 to 100 million documents per node.
2. No more than 250 fields per document.
3. No more than 250K characters per document.
4. No more than 25 faceted fields.
5. No more than 32 nodes in your SolrCloud cluster.
6. Don't return more than 250 results on a query.
A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two
separate things: One is the Java heap, the other is "free" memory for the OS disk cache.
It is strongly recommended that Solr runs on a 64-bit Java. A 64-bit Java requires a 64-bit
operating system, and a 64-bit operating system requires a 64-bit CPU. There's nothing wrong
with 32-bit software or hardware, but a 32-bit Java is limited to a 2GB heap, which can result in
artificial limitations that don't exist with a larger heap.
Link: http://lucene.472066.n3.nabble.com/Solr-limitations-td4076250.html
https://wiki.apache.org/solr/SolrPerformanceProblems
Screen Shots
Figure 13 Solr Admin UI-Cloud Screen
Figure 14 Solr Admin UI-Zookeeper maintains Cluster State Information that is shown in Tree Screen
Figure 15 Solr Admin UI-Cloud Graph Screen
Figure 16 Solr Admin UI-Cluster Information Screen
Integration with .Net using SolrNet
Solr exposes REST apis which can be used for interacting with Solr, however it needs serialization in
converting documents retuned as search result to fill in actual object container. Solrnet is .Net library for
interacting with Solr. It provides convenient and easy apis to search, add, update data in Solr. Further
information on SolrNet is available at https://github.com/mausch/SolrNet
Figure 17 Integration with .Net

More Related Content

What's hot

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Ecommerce Solution Provider SysIQ
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
pittaya
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
Net7
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
Biogeeks
 
Basic Oracle Usage v1
Basic Oracle Usage v1Basic Oracle Usage v1
Basic Oracle Usage v1
Mohamed Mohaideen Abbas
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
searchbox-com
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
lucenerevolution
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
OpenSource Connections
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
lucenerevolution
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
Lap Tran
 
24sax
24sax24sax
24sax
Adil Jafri
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
Abdelrahman Othman Helal
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
Manish kumar
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
Yasas Senarath
 

What's hot (20)

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Basic Oracle Usage v1
Basic Oracle Usage v1Basic Oracle Usage v1
Basic Oracle Usage v1
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
24sax
24sax24sax
24sax
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 

Similar to Apache solr tech doc

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
Gaurav Verma
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
Edureka!
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
JSGB
 
Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction
Sajindbg Dbg
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Applyinga blockcentricapproach
Applyinga blockcentricapproachApplyinga blockcentricapproach
Applyinga blockcentricapproach
oracle documents
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
DEEPAK KHETAWAT
 
Getting Started with Solr
Getting Started with SolrGetting Started with Solr
Getting Started with Solr
Travis Carlson
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
GokulD
 
Oracle sql quick reference
Oracle sql quick referenceOracle sql quick reference
Oracle sql quick reference
maddy9055
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Apache solr
Apache solrApache solr
Apache solr
Dipen Rangwani
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
Stefano Bargioni
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
Xuan-Chao Huang
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
Netcetera
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
 

Similar to Apache solr tech doc (20)

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Applyinga blockcentricapproach
Applyinga blockcentricapproachApplyinga blockcentricapproach
Applyinga blockcentricapproach
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Getting Started with Solr
Getting Started with SolrGetting Started with Solr
Getting Started with Solr
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Oracle sql quick reference
Oracle sql quick referenceOracle sql quick reference
Oracle sql quick reference
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache solr
Apache solrApache solr
Apache solr
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 

Recently uploaded

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 

Recently uploaded (20)

Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 

Apache solr tech doc

  • 2. Contents Requirements................................................................................................................................................3 Solution - Solr................................................................................................................................................3 Features ....................................................................................................................................................3 Typical Solr Setup Diagram .......................................................................................................................4 Basic Solr Concepts ...................................................................................................................................4 1. Indexing.............................................................................................................................................4 2. How Solr represents data..................................................................................................................5 Installing Solr.............................................................................................................................................7 Starting Solr...............................................................................................................................................7 Indexing Data............................................................................................................................................7 Searching...................................................................................................................................................8 Faceting.................................................................................................................................................9 Highlighting.........................................................................................................................................10 Spell Checking .....................................................................................................................................10 Relevance............................................................................................................................................10 Shutdown................................................................................................................................................10 Screen Shots............................................................................................................................................11 Apache SolrCloud........................................................................................................................................15 Features ..................................................................................................................................................15 Simple two shard cluster.........................................................................................................................15 Dealing with high volume of data...........................................................................................................18 Dealing with failure.................................................................................................................................19 Synchronization of data (added/updated in DB) with Solr.....................................................................20 Limitations ..............................................................................................................................................20 Screen Shots............................................................................................................................................21 Integration with .Net using SolrNet........................................................................................................23
  • 3. Requirements a. Fast and full text search capabilities b. Optimization of huge data on web traffic c. Highly and linearly scalable on demand d. Plug with any platform e. Near real time search and indexing f. Flexible and Adaptable with XML,JSON,CSV configuration Solution - Solr Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results. Features  Advanced Full-Text Search Capabilities  Optimized for High Volume Web Traffic  Standards Based Open Interfaces - XML, JSON and HTTP  Comprehensive HTML Administration Interfaces  Linearly scalable, auto index replication, auto failover and recovery  Near Real-time indexing  Flexible and Adaptable with XML configuration  Extensible Plugin Architecture  Easily manage multilingual support
  • 4. Typical Solr Setup Diagram Figure 1 Typical Solr Setup Diagram Basic Solr Concepts In this document, we'll cover the basics of what you need to know about Solr in order to use it. 1. Indexing Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This is like retrieving pages in a book related to a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book. This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages). Solr stores this index in a directory called index in the data directory.
  • 5. 2. How Solr represents data In Solr, a Document is the unit of search and index. An index consists of one or more Documents, and a Document consists of one or more Fields. Schema Before adding documents to Solr, you need to specify the schema, represented in a file called schema.xml. It is not advisable to change the schema after documents have been added to the index. The schema declares: o what kinds of fields there are o which field should be used as the unique/primary key o which fields are required o how to index and search each field Field Types In Solr, every field has a type. Examples of basic field types available in Solr include: o float o long o double o date o text Defining a field Here's what a field declaration looks like: <field name="id" type="text" indexed="true" stored="true"multiValued="true"/> o name: Name of the field o type: Field type o indexed: this field be added to the inverted index
  • 6. o stored: the original value of this field be stored o multivalued: this field have multiple values The indexed and stored attributes are important. Analysis When data is added to Solr, it goes through a series of transformations before being added to the index. This is called the analysis phase. Examples of transformations include lower-casing, removing word stems etc. The end result of the analysis is a series of tokens which are then added to the index. Tokens, not the original text, are what are searched when you perform a search query. Indexed fields are fields which undergo an analysis phase, and are added to the index. Term Storage When we displaying search results to users, they generally expect to see the original document, not the machine-processed token. That's the purpose of the stored attribute to tell Solr to store the original text in the index somewhere. Sometimes, there are fields which aren't searched, but need to display in the search results. You accomplish that by setting the field attributes to stored=true and indexed=false. So, why wouldn't you store all the fields all the time? Because storing fields increases the size of the index, and the larger the index, the slower the search. In terms of physical computing, we'd say that a larger index requires more disk seeks to get to the same amount of data.
  • 7. Installing Solr You should also have JDK 5 or above installed. Begin by unziping the Solr release and changing your working directory to be the "example" directory. unzip –q apache-solr-4.1.0.zip cd apache-solr-4.1.0/example/ Starting Solr Solr comes with an example directory which contains some sample files we can use. We start this example server with java -jar start.jar. cd example java -jar start.jar You should see something like this in the terminal. 2011-10-02 05:20:27.120:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-10-02 05:20:27.212:INFO::jetty-6.1-SNAPSHOT .... 2011-10-02 05:18:27.645:INFO::Started SocketConnector@0.0.0.0:8983 Solr is now running! You can now access the Solr Admin webapp by loading http://localhost:8983/solr/admin/ in your web browser. Indexing Data We're now going to add some sample data to our Solr instance. The exampledocs folder contains some XML files we can posting them from the command line cd exampledocs java -jar post.jar solr.xml monitor.xml
  • 8. That produces: SimplePostTool: POSTing files to http://localhost:8983/solr/update. SimplePostTool: POSTing file solr.xml SimplePostTool: POSTing file monitor.xml SimplePostTool: COMMITting Solr index changes. This response tells us that the POST operation was successful. You can also index all of the sample data, using the following command (assuming your command line shell supports the *.xml notation): cd exampledocs java -jar post.jar *.xml Searching Let's see if we can retrieve the document we just added below URL on browser. Since Solr accepts HTTP requests, you can use your web browser to communicate with Solr: http://localhost:8983/solr/select?q=*:*&wt=json This returns the following JSON result: { "responseHeader": { "status": 0, "QTime": 0, "params": { "wt": "json", "q": "*:*" } }, "response": {
  • 9. "numFound": 1, "start": 0, "docs": [ { "id": "3007WFP", "name": "Dell Widescreen UltraSharp 3007WFP", "manu": "Dell, Inc.", "includes": "USB cable", "weight": 401.6, "price": 2199, "popularity": 6, "inStock": true, "store": "43.17614,-90.57341", "cat": [ "electronics", "monitor" ], "features": [ "30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast" ] } ] } } Faceting Faceting is the arrangement of search results into categories based on indexed terms. Searchers are presented with the indexed terms along with numerical counts of how many matching documents were found were each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
  • 10. Highlighting Highlighting in Solr allows fragments of documents that match the user's query to be included with the query response. The fragments are included in a special section of the response (the highlighting section), and the client uses the formatting clues also included to determine how to present the snippets to users. Spell Checking The Spellcheck component is designed to provide inline query suggestions based on other, similar, terms. Relevance Relevance is the degree to which a query response satisfies a user who is searching for information. The relevance of a query response depends on the context in which the query was performed. A single search application may be used in different contexts by users with different needs and expectations. For example, a search engine of climate data might be used by a university researcher studying long-term climate trends, a farmer interested in calculating the likely date of the last frost of spring, a civil engineer interested in rainfall patterns and the frequency of floods, and a college student planning a vacation to a region and wondering what to pack. Because the motivations of these users vary, the relevance of any particular response to a query will vary as well. Shutdown To shut down Solr, from the terminal where you launched Solr, hit Ctrl+C. This will shut down Solr cleanly. Link: http://lucene.apache.org/solr/3_6_2/doc-files/tutorial.html http://www.solrtutorial.com/ https://cwiki.apache.org/confluence/display/solr/
  • 11. Screen Shots Figure 2 Solr Admin UI-Dashboard Screen Figure 3 Solr Admin UI-Collection Detail Screen
  • 12. Figure 4 Solr Admin UI-Query Result Screen Figure 5 Solr Admin UI-Fetching Data from Database Using DataImportHandler
  • 13. Figure 6 Solr Admin UI-Schema.xml Screen Figure 7 Solr Admin UI-SolrConfig.xml Screen
  • 14. Figure 8 Solr Admin UI-Core Admin Detail Screen Figure 9 Solr Admin UI-Java Properties Screen
  • 15. Apache SolrCloud SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities. Solr embeds and uses Zookeeper as a repository for cluster configuration and coordination - think of it as a distributed filesystem that contains information about all of the Solr servers. Note: reset all configurations and remove documents from the tutorial before going through the cloud features. Features  Centralized Apache ZooKeeper based configuration  Automated distributed indexing/sharding - send documents to any node and it will be forwarded to correct shard  Near Real-Time indexing  Transaction log ensures no updates are lost even if the documents are not yet indexed to disk  Automated query failover, index leader election and recovery in case of failure  No single point of failure Simple two shard cluster Figure 10 Simple Two Shard Cluster Image
  • 16. This example simply creates a cluster consisting of two solr servers representing two different shards of a collection. Since we'll need two solr servers for this example, simply make a copy of the example directory for the second server -- making sure you don't have any data already indexed. rm -r example/solr/collection1/data/* cp -r example example2 This command starts up a Solr server and bootstraps a new solr cluster. cd example java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun - DnumShards=2 -jar start.jar  -DzkRun causes an embedded zookeeper server to be run as part of this Solr server.  -Dbootstrap_confdir=./solr/collection1/conf, this parameter causes the local configuration directory ./solr/conf to be uploaded as the "myconf" config. The name "myconf" is taken from the "collection.configName" param below.  -Dcollection.configName=myconf sets the config to use for the new collection.  -DnumShards=2 the number of logical partitions we plan on splitting the index into. Browse to http://localhost:8983/solr/#/~cloud to see the state of the cluster (the zookeeper distributed filesystem). You can see from the zookeeper browser that the Solr configuration files were uploaded under "myconf", and that a new document collection called "collection1" was created. Under collection1 is a list of shards, the pieces that make up the complete collection. Now we want to start up our second server - it will automatically be assigned to shard2 because we don't explicitly set the shard id. Then start the second server, pointing it at the cluster: cd example2 java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar  -Djetty.port=7574 is just one way to tell the Jetty servlet container to use a different port.
  • 17.  -DzkHost=localhost: 9983 points to the Zookeeper ensemble containing the cluster state. In this example we're running a single Zookeeper server embedded in the first Solr server. By default, an embedded Zookeeper server runs at the Solr port plus 1000, so 9983. If you refresh the zookeeper browser, you should now see both shard1 and shard2 in collection1. View http://localhost:8983/solr/#/~cloud. Next, index some documents. cd exampledocs java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_video.xml java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor.xml java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar mem.xml And now, a request to either server results in a distributed search that covers the entire collection: http://localhost:8983/solr/collection1/select?q=*:* If at any point you wish to start over fresh or experiment with different configurations, you can delete all of the cloud state contained within zookeeper by simply deleting the solr/zoo_data directory after shutting down the servers.
  • 18. Dealing with high volume of data Solution: If the data volume goes high then creating more shards or splitting shard with physical memory and storage in existing cluster cloud environment. Figure 11 Creating Shard and Replica when volume goes high Link: http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-from- 500000-volumes-5-million-volumes-and-beyond
  • 19. Dealing with failure Solution: a. Failure of zookeeper: To avoid failure keeping zookeeper in two separate server so if one goes down then other can work because zookeeper has maintain all the cluster state and configuration information . b. Failure of Solr shard: We can create the replica of each shard so if one shard goes down then replica can do our job. Figure 12 Diagram which handling failure scenario Link: https://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_a nd_zookeeper_ensemble
  • 20. Synchronization of data (added/updated in DB) with Solr Solution: a. We can create the cron job which can fetch data from database and updating index in Solr. b. Another option is that as and when data is added/update in frontend, after inserting/updating data in database from business layer, we can add piece of code which can add/update data using update Solr APIs (as we have integration with .net we can use SolrNet library which provides such addition/updation APIs). Link: http://wiki.apache.org/solr/DataImportHandler#Scheduling http://stackoverflow.com/questions/6463844/how-to-index-data-in-solr-from-database- automatically Limitations 1. No more than 50 to 100 million documents per node. 2. No more than 250 fields per document. 3. No more than 250K characters per document. 4. No more than 25 faceted fields. 5. No more than 32 nodes in your SolrCloud cluster. 6. Don't return more than 250 results on a query. A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is "free" memory for the OS disk cache. It is strongly recommended that Solr runs on a 64-bit Java. A 64-bit Java requires a 64-bit operating system, and a 64-bit operating system requires a 64-bit CPU. There's nothing wrong with 32-bit software or hardware, but a 32-bit Java is limited to a 2GB heap, which can result in artificial limitations that don't exist with a larger heap. Link: http://lucene.472066.n3.nabble.com/Solr-limitations-td4076250.html https://wiki.apache.org/solr/SolrPerformanceProblems
  • 21. Screen Shots Figure 13 Solr Admin UI-Cloud Screen Figure 14 Solr Admin UI-Zookeeper maintains Cluster State Information that is shown in Tree Screen
  • 22. Figure 15 Solr Admin UI-Cloud Graph Screen Figure 16 Solr Admin UI-Cluster Information Screen
  • 23. Integration with .Net using SolrNet Solr exposes REST apis which can be used for interacting with Solr, however it needs serialization in converting documents retuned as search result to fill in actual object container. Solrnet is .Net library for interacting with Solr. It provides convenient and easy apis to search, add, update data in Solr. Further information on SolrNet is available at https://github.com/mausch/SolrNet Figure 17 Integration with .Net