SlideShare a Scribd company logo
1 of 36
Download to read offline
How to leverage a search engine that is
optimized to search large volumes of text-
centric data.
Apache
Solr
By
Kevin Wenger
Backend PHP Developer & Open
Source Advocate @ Antistatique
Webmardi organizer
Open Source author &
maintainer
WM
OSS
Articles writter
AW
School Tech Expert
CSI
Agenda
What is Apache Solr
Alternative
Installing and running Solr
Admin UI Tour
The Solr vocabulary
Steps to setup a Core
Useful ressources
What ?
What is Apache Solr and when
should you use it
1 / 32
Solr is a standalone search
server with a REST API
You put documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP. You query it
via HTTP GET and receive JSON, XML, CSV or binary results.
2 / 32
3 / 32
The fundamental premise of
Solr is simple.
You give it a lot of information, then later you can ask it questions and find the piece of information
you want.
The part where you feed in all the information is called indexing. When you ask a question, it’s called
a query.
4 / 32
CMS will push data into Solr
Data will then be indexed throught Analyzers
into Documents
A client application will Query Solr for results
Solr pass that query to Anlyzers to fetch
Documents
Results Documents are given to the end-client
Indexing
1.
2.
Querying
1.
2.
3.
2004
Solr was born
2012
Solr 4.0
5 / 32
2006
Open Sourced
2008
Solr 1.3
2009
Solr 1.4
2011
Solr 3.1
2015
Solr 5.0
2016
Solr 6.0
2017
Solr 7.0
2019
Solr 8.0
2021
Solr 8.11
MyISAM InnoDB Solr
750
500
250
0
Performance
Fulltext searching on shared hosting ~2000
Entires with around 500 words each.
6 / 32
When using
Solr ?
Some Solr in-real life use-case
7 / 32
8 / 32
Search
Engine
Search engine capabilities that helps people
find the information they are looking for
using natural language and facets.
9 / 32
Geospatial
Apache Solr supports geospatial search. It can bring
a rich capacity by linking assets.
10 / 32
Analytics
Solr can handle this massive amount of data and
provide efficient ingestion and search capabilities
in near real-time.
11 / 32
Do you speak
Solr ?
Understanding the Basic Concepts Used in Solr
Index
Solr is able to achieve fast search responses
because, instead of searching the text directly,
it searches an index instead.
An index consists of one or more Documents.
Field
The field stores the data in a document
holding a key-value pair, where key states the
field name and value the actual field data. Solr
supports different field types: float, long,
double, date, date, text, integer, boolean, etc.
Core
The term core is used to refer to a single index
and associated transaction log and
configuration files
12 / 32
Schema
A schema is a collection of constraints on data
record structure and data processing
instructions associated with elements of the
record structure.
Document
A document is a basic unit of information in
Solr that can be stored and indexed. They can
be added, deleted, and updated, typically
through indexation.
Query
A query can either be a request for data
results or an action on the data.
Analyzers
An analyzer examines the text of fields and
generates a token stream.
Filters
Filters examine a stream of tokens and keep
them, transform or discard them, or create
new ones.
13 / 32
Tokenizers
Tokenizers break field data into lexical units,
or tokens.
14 / 32
Installing &
running Solr
Step by step instructions to install Solr on Windows,
Linux & Docker
Install Java
Download Solr
Install Solr
Linux
15 / 32
Start Solr
Windows
Be sure Java is installed
Download Apache Solr Zip file
https://archive.apache.org/dist/lucene/solr/
16 / 32
Run Solr
Running latests Solr
Running Solr 8.11.1
Docker
17 / 32
18 / 32
Configuring
A Solr Core is a running instance of a Lucene index
that contains all the Solr configuration files required
to use it.
We need to create a Solr Core to perform
operations like indexing and analyzing.
Schemaless
Zero Configuration
Field discovering
Limited data processing
Custom schema
Complete Control
Zero surprise
Require Search Engine Skills
19 / 32
Create a Schemaless Core
Preset configuration
Add document(s)
Custom fields
1
2
3
5
4 Updated schema
20 / 32
(Optionally) Tweak analyzer(s)
6
21 / 32
Create a
Schemaless Core
A Solr Core is a running instance of a Lucene index that
contains all the Solr configuration files required to use it.
We need to create a Solr Core to perform operations like
indexing and analyzing.
22 / 32
Preset
configuration
Schemaless directory contains several configuration files
under a 'conf' directory. By default, this core is a
schemaless mode, and a managed-schema file and a
solrconfig.xml file are created
23 / 32
Add Documents
Accessing the Admin UI you may add some documents to
the index.
After adding the documents, you should notice that the
managed-schema file under the '/my-core/conf' directory
was modified.
24 / 32
Updated schema
Among several modifications, we can find new fields,
which correspond to the fields we used in a document. The
Solr determined each document's fields and their type, and
updated the managed-schema file.
25 / 32
Custom Fields
Schemaless mode is not required to define your own fields
manually, but you can. You create a core with the default
schemaless mode. Then, manually add fields to the
managed-schema file.
26 / 32
Tweak Anyalzers
An analyzer examines the text of fields and generates a
token stream.
In normal usage, only fields of type solr.TextField will
specify an analyzer
27 / 32
Admin UI
The Solr Web interface makes it easy to
view configuration details, run queries
and analyze document.
28 / 32
Core Selection
29 / 32
Debug Analyzers
Debug Query
Check config
Add Document
30 / 32
Alternatives
What are some alternatives to Apache
Solr?
Elasticsearch
More modern
Easier to manage
Ease of installation
DSL Query
Less Open-Source
Algolia
API Based
Require less knowledge
Proprietary
31 / 32
32 / 32
Useful Resources
Books
Apache Solr Essentials
Apache Solr for Indexing Data
Solr Cookbook
Talks
What is Apache Solr? | Apache Solr Tutorial for Beginners | Edureka
Berlin Buzzwords 2019: Erik Hatcher – Chatting with Solr Apache
Solr 8 - Getting Started Tutorial
Thank you !
Let's stay in touch
Email Linkedin Twitter
https://www.linkedin.com/in/kevinwenger/ @wengerk
kevin@antistatique.net
wenger.kev@gmail.com

More Related Content

What's hot

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Lucidworks
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsTiziano Fagni
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache luceneShrikrishna Parab
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Sajindbg Dbg
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

What's hot (20)

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Introduction
Solr IntroductionSolr Introduction
Solr Introduction
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Webinar: What's New in Solr 7
Webinar: What's New in Solr 7 Webinar: What's New in Solr 7
Webinar: What's New in Solr 7
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Introduction to apache lucene
Introduction to apache luceneIntroduction to apache lucene
Introduction to apache lucene
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction Coffee at DBG- Solr introduction
Coffee at DBG- Solr introduction
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Similar to Apache Solr

Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )'Moinuddin Ahmed
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Sitenyccamp
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 

Similar to Apache Solr (20)

Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Solr
SolrSolr
Solr
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 

More from Kevin Wenger

Workflows - The Rise of the Machines.pdf
Workflows - The Rise of the Machines.pdfWorkflows - The Rise of the Machines.pdf
Workflows - The Rise of the Machines.pdfKevin Wenger
 
Drupal & Composer - The romance recalibration
Drupal & Composer - The romance recalibrationDrupal & Composer - The romance recalibration
Drupal & Composer - The romance recalibrationKevin Wenger
 
Headless cms architecture
Headless cms architectureHeadless cms architecture
Headless cms architectureKevin Wenger
 
Workflows - The Rise of the Machines
Workflows - The Rise of the MachinesWorkflows - The Rise of the Machines
Workflows - The Rise of the MachinesKevin Wenger
 
Capistrano @antistatque - deploy to the moon
Capistrano @antistatque  - deploy to the moonCapistrano @antistatque  - deploy to the moon
Capistrano @antistatque - deploy to the moonKevin Wenger
 
Use drupal 8 as a framework the romance recalibration
Use drupal 8 as a framework   the romance recalibrationUse drupal 8 as a framework   the romance recalibration
Use drupal 8 as a framework the romance recalibrationKevin Wenger
 

More from Kevin Wenger (6)

Workflows - The Rise of the Machines.pdf
Workflows - The Rise of the Machines.pdfWorkflows - The Rise of the Machines.pdf
Workflows - The Rise of the Machines.pdf
 
Drupal & Composer - The romance recalibration
Drupal & Composer - The romance recalibrationDrupal & Composer - The romance recalibration
Drupal & Composer - The romance recalibration
 
Headless cms architecture
Headless cms architectureHeadless cms architecture
Headless cms architecture
 
Workflows - The Rise of the Machines
Workflows - The Rise of the MachinesWorkflows - The Rise of the Machines
Workflows - The Rise of the Machines
 
Capistrano @antistatque - deploy to the moon
Capistrano @antistatque  - deploy to the moonCapistrano @antistatque  - deploy to the moon
Capistrano @antistatque - deploy to the moon
 
Use drupal 8 as a framework the romance recalibration
Use drupal 8 as a framework   the romance recalibrationUse drupal 8 as a framework   the romance recalibration
Use drupal 8 as a framework the romance recalibration
 

Recently uploaded

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 

Apache Solr

  • 1. How to leverage a search engine that is optimized to search large volumes of text- centric data. Apache Solr By
  • 2. Kevin Wenger Backend PHP Developer & Open Source Advocate @ Antistatique Webmardi organizer Open Source author & maintainer WM OSS Articles writter AW School Tech Expert CSI
  • 3. Agenda What is Apache Solr Alternative Installing and running Solr Admin UI Tour The Solr vocabulary Steps to setup a Core Useful ressources
  • 4. What ? What is Apache Solr and when should you use it 1 / 32
  • 5. Solr is a standalone search server with a REST API You put documents in it (called "indexing") via JSON, XML, CSV or binary over HTTP. You query it via HTTP GET and receive JSON, XML, CSV or binary results. 2 / 32
  • 6. 3 / 32 The fundamental premise of Solr is simple. You give it a lot of information, then later you can ask it questions and find the piece of information you want. The part where you feed in all the information is called indexing. When you ask a question, it’s called a query.
  • 7. 4 / 32 CMS will push data into Solr Data will then be indexed throught Analyzers into Documents A client application will Query Solr for results Solr pass that query to Anlyzers to fetch Documents Results Documents are given to the end-client Indexing 1. 2. Querying 1. 2. 3.
  • 8. 2004 Solr was born 2012 Solr 4.0 5 / 32 2006 Open Sourced 2008 Solr 1.3 2009 Solr 1.4 2011 Solr 3.1 2015 Solr 5.0 2016 Solr 6.0 2017 Solr 7.0 2019 Solr 8.0 2021 Solr 8.11
  • 9. MyISAM InnoDB Solr 750 500 250 0 Performance Fulltext searching on shared hosting ~2000 Entires with around 500 words each. 6 / 32
  • 10. When using Solr ? Some Solr in-real life use-case 7 / 32
  • 11. 8 / 32 Search Engine Search engine capabilities that helps people find the information they are looking for using natural language and facets.
  • 12. 9 / 32 Geospatial Apache Solr supports geospatial search. It can bring a rich capacity by linking assets.
  • 13. 10 / 32 Analytics Solr can handle this massive amount of data and provide efficient ingestion and search capabilities in near real-time.
  • 14. 11 / 32 Do you speak Solr ? Understanding the Basic Concepts Used in Solr
  • 15. Index Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. An index consists of one or more Documents. Field The field stores the data in a document holding a key-value pair, where key states the field name and value the actual field data. Solr supports different field types: float, long, double, date, date, text, integer, boolean, etc. Core The term core is used to refer to a single index and associated transaction log and configuration files 12 / 32 Schema A schema is a collection of constraints on data record structure and data processing instructions associated with elements of the record structure. Document A document is a basic unit of information in Solr that can be stored and indexed. They can be added, deleted, and updated, typically through indexation. Query A query can either be a request for data results or an action on the data.
  • 16. Analyzers An analyzer examines the text of fields and generates a token stream. Filters Filters examine a stream of tokens and keep them, transform or discard them, or create new ones. 13 / 32 Tokenizers Tokenizers break field data into lexical units, or tokens.
  • 17. 14 / 32 Installing & running Solr Step by step instructions to install Solr on Windows, Linux & Docker
  • 18. Install Java Download Solr Install Solr Linux 15 / 32 Start Solr
  • 19. Windows Be sure Java is installed Download Apache Solr Zip file https://archive.apache.org/dist/lucene/solr/ 16 / 32 Run Solr
  • 20. Running latests Solr Running Solr 8.11.1 Docker 17 / 32
  • 21. 18 / 32 Configuring A Solr Core is a running instance of a Lucene index that contains all the Solr configuration files required to use it. We need to create a Solr Core to perform operations like indexing and analyzing.
  • 22. Schemaless Zero Configuration Field discovering Limited data processing Custom schema Complete Control Zero surprise Require Search Engine Skills 19 / 32
  • 23. Create a Schemaless Core Preset configuration Add document(s) Custom fields 1 2 3 5 4 Updated schema 20 / 32 (Optionally) Tweak analyzer(s) 6
  • 24. 21 / 32 Create a Schemaless Core A Solr Core is a running instance of a Lucene index that contains all the Solr configuration files required to use it. We need to create a Solr Core to perform operations like indexing and analyzing.
  • 25. 22 / 32 Preset configuration Schemaless directory contains several configuration files under a 'conf' directory. By default, this core is a schemaless mode, and a managed-schema file and a solrconfig.xml file are created
  • 26. 23 / 32 Add Documents Accessing the Admin UI you may add some documents to the index. After adding the documents, you should notice that the managed-schema file under the '/my-core/conf' directory was modified.
  • 27. 24 / 32 Updated schema Among several modifications, we can find new fields, which correspond to the fields we used in a document. The Solr determined each document's fields and their type, and updated the managed-schema file.
  • 28. 25 / 32 Custom Fields Schemaless mode is not required to define your own fields manually, but you can. You create a core with the default schemaless mode. Then, manually add fields to the managed-schema file.
  • 29. 26 / 32 Tweak Anyalzers An analyzer examines the text of fields and generates a token stream. In normal usage, only fields of type solr.TextField will specify an analyzer
  • 30. 27 / 32 Admin UI The Solr Web interface makes it easy to view configuration details, run queries and analyze document.
  • 31. 28 / 32 Core Selection
  • 32. 29 / 32 Debug Analyzers Debug Query Check config Add Document
  • 33. 30 / 32 Alternatives What are some alternatives to Apache Solr?
  • 34. Elasticsearch More modern Easier to manage Ease of installation DSL Query Less Open-Source Algolia API Based Require less knowledge Proprietary 31 / 32
  • 35. 32 / 32 Useful Resources Books Apache Solr Essentials Apache Solr for Indexing Data Solr Cookbook Talks What is Apache Solr? | Apache Solr Tutorial for Beginners | Edureka Berlin Buzzwords 2019: Erik Hatcher – Chatting with Solr Apache Solr 8 - Getting Started Tutorial
  • 36. Thank you ! Let's stay in touch Email Linkedin Twitter https://www.linkedin.com/in/kevinwenger/ @wengerk kevin@antistatique.net wenger.kev@gmail.com