Apache Solr

How to leverage a search engine that is
optimized to search large volumes of text-
centric data.
Apache
Solr
By

Kevin Wenger
Backend PHP Developer & Open
Source Advocate @ Antistatique
Webmardi organizer
Open Source author &
maintainer
WM
OSS
Articles writter
AW
School Tech Expert
CSI

Agenda
What is Apache Solr
Alternative
Installing and running Solr
Admin UI Tour
The Solr vocabulary
Steps to setup a Core
Useful ressources

What ?
What is Apache Solr and when
should you use it
1 / 32

Solr is a standalone search
server with a REST API
You put documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP. You query it
via HTTP GET and receive JSON, XML, CSV or binary results.
2 / 32

3 / 32
The fundamental premise of
Solr is simple.
You give it a lot of information, then later you can ask it questions and find the piece of information
you want.
The part where you feed in all the information is called indexing. When you ask a question, it’s called
a query.

4 / 32
CMS will push data into Solr
Data will then be indexed throught Analyzers
into Documents
A client application will Query Solr for results
Solr pass that query to Anlyzers to fetch
Documents
Results Documents are given to the end-client
Indexing
1.
2.
Querying
1.
2.
3.

2004
Solr was born
2012
Solr 4.0
5 / 32
2006
Open Sourced
2008
Solr 1.3
2009
Solr 1.4
2011
Solr 3.1
2015
Solr 5.0
2016
Solr 6.0
2017
Solr 7.0
2019
Solr 8.0
2021
Solr 8.11

MyISAM InnoDB Solr
750
500
250
0
Performance
Fulltext searching on shared hosting ~2000
Entires with around 500 words each.
6 / 32

When using
Solr ?
Some Solr in-real life use-case
7 / 32

8 / 32
Search
Engine
Search engine capabilities that helps people
find the information they are looking for
using natural language and facets.

9 / 32
Geospatial
Apache Solr supports geospatial search. It can bring
a rich capacity by linking assets.

10 / 32
Analytics
Solr can handle this massive amount of data and
provide efficient ingestion and search capabilities
in near real-time.

11 / 32
Do you speak
Solr ?
Understanding the Basic Concepts Used in Solr

Index
Solr is able to achieve fast search responses
because, instead of searching the text directly,
it searches an index instead.
An index consists of one or more Documents.
Field
The field stores the data in a document
holding a key-value pair, where key states the
field name and value the actual field data. Solr
supports different field types: float, long,
double, date, date, text, integer, boolean, etc.
Core
The term core is used to refer to a single index
and associated transaction log and
configuration files
12 / 32
Schema
A schema is a collection of constraints on data
record structure and data processing
instructions associated with elements of the
record structure.
Document
A document is a basic unit of information in
Solr that can be stored and indexed. They can
be added, deleted, and updated, typically
through indexation.
Query
A query can either be a request for data
results or an action on the data.

Analyzers
An analyzer examines the text of fields and
generates a token stream.
Filters
Filters examine a stream of tokens and keep
them, transform or discard them, or create
new ones.
13 / 32
Tokenizers
Tokenizers break field data into lexical units,
or tokens.

14 / 32
Installing &
running Solr
Step by step instructions to install Solr on Windows,
Linux & Docker

Install Java
Download Solr
Install Solr
Linux
15 / 32
Start Solr

Windows
Be sure Java is installed
Download Apache Solr Zip file
https://archive.apache.org/dist/lucene/solr/
16 / 32
Run Solr

Running latests Solr
Running Solr 8.11.1
Docker
17 / 32

18 / 32
Configuring
A Solr Core is a running instance of a Lucene index
that contains all the Solr configuration files required
to use it.
We need to create a Solr Core to perform
operations like indexing and analyzing.

Schemaless
Zero Configuration
Field discovering
Limited data processing
Custom schema
Complete Control
Zero surprise
Require Search Engine Skills
19 / 32

Create a Schemaless Core
Preset configuration
Add document(s)
Custom fields
1
2
3
5
4 Updated schema
20 / 32
(Optionally) Tweak analyzer(s)
6

21 / 32
Create a
Schemaless Core
A Solr Core is a running instance of a Lucene index that
contains all the Solr configuration files required to use it.
We need to create a Solr Core to perform operations like
indexing and analyzing.

22 / 32
Preset
configuration
Schemaless directory contains several configuration files
under a 'conf' directory. By default, this core is a
schemaless mode, and a managed-schema file and a
solrconfig.xml file are created

23 / 32
Add Documents
Accessing the Admin UI you may add some documents to
the index.
After adding the documents, you should notice that the
managed-schema file under the '/my-core/conf' directory
was modified.

24 / 32
Updated schema
Among several modifications, we can find new fields,
which correspond to the fields we used in a document. The
Solr determined each document's fields and their type, and
updated the managed-schema file.

25 / 32
Custom Fields
Schemaless mode is not required to define your own fields
manually, but you can. You create a core with the default
schemaless mode. Then, manually add fields to the
managed-schema file.

26 / 32
Tweak Anyalzers
An analyzer examines the text of fields and generates a
token stream.
In normal usage, only fields of type solr.TextField will
specify an analyzer

27 / 32
Admin UI
The Solr Web interface makes it easy to
view configuration details, run queries
and analyze document.

29 / 32
Debug Analyzers
Debug Query
Check config
Add Document

30 / 32
Alternatives
What are some alternatives to Apache
Solr?

Elasticsearch
More modern
Easier to manage
Ease of installation
DSL Query
Less Open-Source
Algolia
API Based
Require less knowledge
Proprietary
31 / 32

32 / 32
Useful Resources
Books
Apache Solr Essentials
Apache Solr for Indexing Data
Solr Cookbook
Talks
What is Apache Solr? | Apache Solr Tutorial for Beginners | Edureka
Berlin Buzzwords 2019: Erik Hatcher – Chatting with Solr Apache
Solr 8 - Getting Started Tutorial

Thank you !
Let's stay in touch
Email Linkedin Twitter
https://www.linkedin.com/in/kevinwenger/ @wengerk
kevin@antistatique.net
wenger.kev@gmail.com

Apache Solr

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Solr

Similar to Apache Solr (20)

More from Kevin Wenger

More from Kevin Wenger (6)

Recently uploaded

Recently uploaded (20)

Apache Solr