1. How to leverage a search engine that is
optimized to search large volumes of text-
centric data.
Apache
Solr
By
2. Kevin Wenger
Backend PHP Developer & Open
Source Advocate @ Antistatique
Webmardi organizer
Open Source author &
maintainer
WM
OSS
Articles writter
AW
School Tech Expert
CSI
3. Agenda
What is Apache Solr
Alternative
Installing and running Solr
Admin UI Tour
The Solr vocabulary
Steps to setup a Core
Useful ressources
4. What ?
What is Apache Solr and when
should you use it
1 / 32
5. Solr is a standalone search
server with a REST API
You put documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP. You query it
via HTTP GET and receive JSON, XML, CSV or binary results.
2 / 32
6. 3 / 32
The fundamental premise of
Solr is simple.
You give it a lot of information, then later you can ask it questions and find the piece of information
you want.
The part where you feed in all the information is called indexing. When you ask a question, it’s called
a query.
7. 4 / 32
CMS will push data into Solr
Data will then be indexed throught Analyzers
into Documents
A client application will Query Solr for results
Solr pass that query to Anlyzers to fetch
Documents
Results Documents are given to the end-client
Indexing
1.
2.
Querying
1.
2.
3.
11. 8 / 32
Search
Engine
Search engine capabilities that helps people
find the information they are looking for
using natural language and facets.
12. 9 / 32
Geospatial
Apache Solr supports geospatial search. It can bring
a rich capacity by linking assets.
13. 10 / 32
Analytics
Solr can handle this massive amount of data and
provide efficient ingestion and search capabilities
in near real-time.
14. 11 / 32
Do you speak
Solr ?
Understanding the Basic Concepts Used in Solr
15. Index
Solr is able to achieve fast search responses
because, instead of searching the text directly,
it searches an index instead.
An index consists of one or more Documents.
Field
The field stores the data in a document
holding a key-value pair, where key states the
field name and value the actual field data. Solr
supports different field types: float, long,
double, date, date, text, integer, boolean, etc.
Core
The term core is used to refer to a single index
and associated transaction log and
configuration files
12 / 32
Schema
A schema is a collection of constraints on data
record structure and data processing
instructions associated with elements of the
record structure.
Document
A document is a basic unit of information in
Solr that can be stored and indexed. They can
be added, deleted, and updated, typically
through indexation.
Query
A query can either be a request for data
results or an action on the data.
16. Analyzers
An analyzer examines the text of fields and
generates a token stream.
Filters
Filters examine a stream of tokens and keep
them, transform or discard them, or create
new ones.
13 / 32
Tokenizers
Tokenizers break field data into lexical units,
or tokens.
17. 14 / 32
Installing &
running Solr
Step by step instructions to install Solr on Windows,
Linux & Docker
21. 18 / 32
Configuring
A Solr Core is a running instance of a Lucene index
that contains all the Solr configuration files required
to use it.
We need to create a Solr Core to perform
operations like indexing and analyzing.
24. 21 / 32
Create a
Schemaless Core
A Solr Core is a running instance of a Lucene index that
contains all the Solr configuration files required to use it.
We need to create a Solr Core to perform operations like
indexing and analyzing.
25. 22 / 32
Preset
configuration
Schemaless directory contains several configuration files
under a 'conf' directory. By default, this core is a
schemaless mode, and a managed-schema file and a
solrconfig.xml file are created
26. 23 / 32
Add Documents
Accessing the Admin UI you may add some documents to
the index.
After adding the documents, you should notice that the
managed-schema file under the '/my-core/conf' directory
was modified.
27. 24 / 32
Updated schema
Among several modifications, we can find new fields,
which correspond to the fields we used in a document. The
Solr determined each document's fields and their type, and
updated the managed-schema file.
28. 25 / 32
Custom Fields
Schemaless mode is not required to define your own fields
manually, but you can. You create a core with the default
schemaless mode. Then, manually add fields to the
managed-schema file.
29. 26 / 32
Tweak Anyalzers
An analyzer examines the text of fields and generates a
token stream.
In normal usage, only fields of type solr.TextField will
specify an analyzer
30. 27 / 32
Admin UI
The Solr Web interface makes it easy to
view configuration details, run queries
and analyze document.
34. Elasticsearch
More modern
Easier to manage
Ease of installation
DSL Query
Less Open-Source
Algolia
API Based
Require less knowledge
Proprietary
31 / 32
35. 32 / 32
Useful Resources
Books
Apache Solr Essentials
Apache Solr for Indexing Data
Solr Cookbook
Talks
What is Apache Solr? | Apache Solr Tutorial for Beginners | Edureka
Berlin Buzzwords 2019: Erik Hatcher – Chatting with Solr Apache
Solr 8 - Getting Started Tutorial
36. Thank you !
Let's stay in touch
Email Linkedin Twitter
https://www.linkedin.com/in/kevinwenger/ @wengerk
kevin@antistatique.net
wenger.kev@gmail.com