SlideShare a Scribd company logo
www.edureka.co/apache-solr 
Introduction to APACHE SOLR 
View Apache Solr course details at www.edureka.co/apache-solr 
For Queries during the session and class recording: 
Post on Twitter @edurekaIN: #askEdureka 
Post on Facebook /edurekaIN 
For more details please contact us: 
US : 1800 275 9730 (toll free) 
INDIA : +91 88808 62004 
Email Us : sales@edureka.co
Slide 2 
LIVE Online Class 
Class Recording in LMS 
24/7 Post Class Support 
Module Wise Quiz 
Project Work 
Verifiable Certificate 
www.edureka.co/apache-solr 
How it Works?
Objectives 
At the end of this module, you will be able to: 
Understand the need for search engine for enterprise grade applications 
Understand the objectives & challenges of search engine 
What is Indexing & Searching & Why do you need them ? 
What is Lucene & its overview? 
How is Indexing & Searching Handled in Lucene 
What is Solr & its features? 
What is Solr schema & its structure? 
Understand how to achieve Bigdata/NoSQL needs using SolrCloud 
 Explore job opportunity for Solr Developers 
Slide 3 www.edureka.co/apache-solr
Introduction Apache Lucene 
Slide 4 www.edureka.co/apache-solr
What is Lucene ? 
 Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications 
 Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy ) 
 Scalable & High-performance Indexing 
 Powerful, Accurate and Efficient Search Algorithms 
 Cross-Platform Solution 
» Open Source & 100% pure Java 
» Implementations in other programming languages available that are index-compatible 
Doug Cutting “Creator” 
Slide 5 www.edureka.co/apache-solr
Why Indexing ? 
 Search engine indexing collects, parses, and stores data to facilitate fast and 
accurate information retrieval 
 The purpose of storing an index is to optimize speed and performance in 
finding relevant documents for a search query 
 Without an index, the search engine would scan every document in the 
corpus, which would require considerable time and computing power 
 For example, while an index of 10,000 documents can be queried within 
milliseconds, a sequential scan of every word in 10,000 large documents could 
take hours 
Slide 6 www.edureka.co/apache-solr
Indexing: Flow 
Tokens Inverted Index 
Document analysis indexing 
We can get a better idea of the flow of indexing from the following example: 
“edureka” 
Position:0 
Offset:0 
Length:7 
“hadoop” 
Position:1 
Offset:8 
Length:6 
“edureka hadoop” tokenization 
“Term Vector” “Term Vector” 
Slide 7 www.edureka.co/apache-solr
Lucene: Writing to Index 
Document 
Field 
Field 
Field 
Field 
Analyzer IndexWriter Directory 
Classes used when indexing documents with Lucene 
Slide 8 www.edureka.co/apache-solr
Lucene: Searching In Index 
 Query Parser translates a textual expression from the end into an arbitrarily complex query for searching 
Expression Query object 
QueryParser 
IndexSearcher Text fragments 
Analyzer 
Slide 9 www.edureka.co/apache-solr
Lucene: Inverted Indexing Technique 
1 1 1 
3 
1 1 1 
3 
1 1 1 
3 
1 1 1 
3 
1 1 
9 
 Indexing uses Inverted Index technique 
(Ex: Book Index). Because indexes are 
faster to read documents 
Write a new segment for each new 
document insertion 
 Merge the segments when too many of 
them into the index. (Merge-sort 
technique to merge the index in to the 
store.) 
 Single updates are costly, preferred bulk 
updates due to merging 
Slide 10 www.edureka.co/apache-solr
Lucene: Storage Schema 
 Like “databases” Lucene does not have common global schema 
 Lucene has indexes, which contains documents 
 Each document can have multiple fields 
 Each document can have different fields for every document 
 Fields can be only used to index & search or store it for retrieval 
 You can add new fields at any point of time 
Document-1 
<Field1> 
<Field2> 
<Field3> 
Document-2 
<Field2> 
<Field3> 
<Field4> 
Index-1 
Slide 11 www.edureka.co/apache-solr
Analyzers 
 Analyzers handle the job of analyzing text into tokens or keywords to be searched / indexed 
 An Analyzer builds TokenStreams, which analyze text and represents a policy for extracting index terms from 
text 
 There are few default Analyzers provided by Lucene, which can be used at the time of indexing or querying 
 Analyzers are provided to parse & analyze different languages like (Chinese, Japanese etc.,) 
Reader Tokenizer TokenFilter TokenFilter TokenFilter Tokens 
Slide 12 www.edureka.co/apache-solr
Analyzers (Contd.) 
Core Class Examples (org.apache.lucene.analysis.Analyzer) 
 SmartChineseAnalyzer 
 SnowballAnalyzer 
 SynonymAnalyzer 
 StandardAnalyzer 
 StopAnalyzer 
 WhitespaceAnalyzer 
LowerCaseFilter 
 PorterStemFilter 
 ChineseAnalyzer 
 CzechAnalyzer 
 ShingleAnalyzerWrapper 
 SimpleAnalyzer 
Slide 13 www.edureka.co/apache-solr
Querying: Key Types / Classes 
TermQuery 
 BooleanQuery 
 WildcardQuery 
 PhraseQuery 
 PrefixQuery 
 MultiPhraseQuery 
 FuzzyQuery 
RegexpQuery 
TermRangeQuery 
NumericRangeQuery 
 ConstantScoreQuery 
 DisjunctionMaxQuery 
MatchAllDocsQuery 
Query 
Slide 14 www.edureka.co/apache-solr
Scoring: Score Boosting 
 Document’s weight / score can be changed from default, which is called as boosting 
 Lucene allows influencing search results by "boosting" at different times: 
Scoring 
Index Time 
Query Time 
Index-time boost by calling Field.setBoost() before 
a document is added to the index 
Query-time boost by setting a boost on a query clause, 
calling Query.setBoost() 
Slide 15 www.edureka.co/apache-solr
Key Features 
Faceting 
Highlighting 
Grouping 
Joins 
Spatial Search 
Apache Tika Support 
Slide 16 www.edureka.co/apache-solr
Introduction Apache Solr 
Slide 17 www.edureka.co/apache-solr
Search Engine: Why do I need them? 
1. Text Based Search 
2. Filter 
3. Documents 
1 
2 
3 
Slide 18 www.edureka.co/apache-solr
Solr: Introduction 
 Solr is an open source enterprise search server / web application 
 Solr Uses the Lucene Search Library and extends it 
 Solr exposes lucene Java API’s as REST-Full services 
You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP 
You query it via HTTP GET and receive XML, JSON, CSV or binary results 
Slide 19 www.edureka.co/apache-solr
Solr: History 
 In 2004, Solr was created by “Yonik Seeley” at CNET Networks as an in-house project to add 
search capability for the company website 
 In January 2006, CNET Networks decided to openly publish the source code by donating it to 
the Apache Software Foundation under the Lucene top-level project 
 In September 2008, Solr 1.3 was released with many enhancements including distributed 
search capabilities and performance enhancements among many others 
 In October 2012 Solr version 4.0 was released, including the new SolrCloud feature 
Yonik Seeley 
Slide 20 www.edureka.co/apache-solr
Solr: Key Features 
Advanced Full-Text Search Capabilities 
Optimized for High Volume Web Traffic 
Standards Based Open Interfaces - XML, JSON and HTTP 
Comprehensive HTML Administration Interfaces 
Server statistics exposed over JMX for monitoring 
Near Real-time indexing and Adaptable with XML Configuration 
Linearly scalable, auto index replication, auto, Extensible Plugin Architecture 
Slide 21 www.edureka.co/apache-solr
Solr: Architecture 
Slide 22 www.edureka.co/apache-solr
Solr: Admin UI 
Slide 23 www.edureka.co/apache-solr
Solr 
Instance 
Solr: Schema Hierarchy 
Core/Index 
Documents 
Field Field 
Core/Index Core/Index 
Indexing & Querying 
Schema.xml 
Slide 24 www.edureka.co/apache-solr
Solr: Core 
 Solr Core: Also referred to as just a "Core" 
 This is a running instance of a Lucene index along with all the Solr configuration (SolrConfigXml, SchemaXml, etc...) 
required to use it 
 A single Solr application can contain 0 or more cores 
 Cores are run largely in isolation but can communicate with each other if necessary via the CoreContainer 
 Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality 
at the "core" of Solr 
Slide 25 www.edureka.co/apache-solr
Solr: Documents & Fields 
 Solr's basic unit of information is a document, which is a set of data that describes something 
Documents are composed of fields, which are more specific pieces of information 
 Fields can contain different kinds of data. A name field, for example, is text (character data) 
The field type tells Solr how to interpret the field and how it can be queried 
Slide 26 www.edureka.co/apache-solr
Solr: Indexing Data 
 A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data 
extracted from tables in a database, and files in common file formats such as Microsoft Word or PDFs 
Here are the three most common ways of loading data into a Solr index: 
 Uploading XML files by sending HTTP requests to the Solr 
 Using Index Handlers to Import from databases 
 Using the Solr Cell framework 
 Writing a custom Java application to ingest data through Solr's Java Client 
Slide 27 www.edureka.co/apache-solr
Analysis 
Analyzers 
Tokenizers 
Filters 
Solr: Analysis 
 There are three main concepts in analysis: analyzers, tokenizers, and filters 
 Analyzers are used both during, when a document is indexed, and at query 
time 
» The same analysis process need not be used for both operations 
» An analyzer examines the text of fields and generates a token stream 
» Analyzers may be a single class or they may be composed of a series 
of tokenizer and filter classes 
 Tokenizers break field data into lexical units, or tokens 
 Filters examine a stream of tokens and keep them, transform or discard 
them, or create new ones 
Slide 28 www.edureka.co/apache-solr
Solr: solrconfig.xml 
Lib directives 
indicates where 
Solr can find JAR 
files for extensions 
Register event handlers 
for searcher events; 
for example queries 
To execute to warm 
new searchers 
Activates version-dependent 
features in Lucene 
Index management 
settings 
Enable JMX 
instrumentation of 
Solr MBeans 
Update 
handler for 
indexing 
documents 
Cache-management 
settings 
Slide 29 www.edureka.co/apache-solr
Solr: Search Process 
qt: selects a RequestHandler for a query using/select(by default ,the DisMaxRequestHandler is used) 
Request 
Handler 
defType : selects a query parser for the query 
(by default, uses whatever has been 
configured for the RequestHandler) 
Query Parser 
Response 
Writer 
qf: selects which fields to query 
in the index(by default, all fields 
are required) 
Index 
wt: selects a response writer 
for formatting the query 
response 
fq: filters query by applying an additional query to 
the initial query’s results, caches the results 
Rows: 
specifies the 
number of rows 
to be displayed 
at one time 
Start: specifies an 
offset(by default 0) 
into the query results 
where the returned 
response should begin 
Slide 30 www.edureka.co/apache-solr
Solr Features 
 Faceting 
Highlighting 
 Spell Checking 
Query-Re-ranking 
Transforming 
 Suggestors 
More Like This 
 Pagination 
Grouping & Clustering 
 Spatial Search 
 Components 
Real time (Get & Update) 
 LABS 
Slide 31 www.edureka.co/apache-solr
Configuring Solr Instances / Cores 
Solr Configurations 
Solfrconfig.xml Solr.xml Core.properties Schema.xml 
Slide 32 www.edureka.co/apache-solr
SolrCloud Introduction 
 Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability 
called SolrCloud 
 SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas 
 Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas 
 Documents can be sent to any server and ZooKeeper will figure it out 
Slide 33 www.edureka.co/apache-solr
Features 
 Horizontal Scaling (For Sharding & Replication) 
 Elastic Scaling 
 High Availability 
 Distributed Indexing 
 Distribution Searching 
 Central Configuration For Entire Cluster 
 Automatic Load Balancing 
 Automatic Failover For Queries 
 Zookeeper Integration For Coordination & Configurations 
Slide 34 www.edureka.co/apache-solr
Architecture 
Slide 35 www.edureka.co/apache-solr
Job trends for Apache Solr 
Slide 36 www.edureka.co/apache-solr
Demo 
Slide 37 www.edureka.co/apache-solr
Disclaimer 
Criteria and guidelines mentioned in this presentation may change. Please visit our website for 
latest and additional information on Apache Solr 
Slide 38 www.edureka.co/apache-solr
Course Topics 
 Module 5 
» Solr Searching 
 Module 6 
» Solr Extended Features 
 Module 7 
» Solr Cloud & Administration 
 Module 8 
» Final Project 
 Module 1 
» Introduction to Apache Lucene 
 Module 2 
» Exploring Lucene 
 Module 3 
» Introduction to Apache Solr 
 Module 4 
» Solr Indexing 
Slide 39 www.edureka.co/apache-solr
References 
 http://www.indeed.com/jobtrends 
 Office.com Clip Art/ 
Slide 40 www.edureka.co/apache-solr
Apache Solr-Webinar

More Related Content

What's hot

RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Christos Manios
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
Spark Summit
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Apache Solr
Apache SolrApache Solr
Apache Solr
Minh Tran
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
pmanvi
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Always on in sql server 2017
Always on in sql server 2017Always on in sql server 2017
Always on in sql server 2017
Gianluca Hotz
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
confluent
 
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
XinliShang1
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Alex Levenson
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 

What's hot (20)

RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Always on in sql server 2017
Always on in sql server 2017Always on in sql server 2017
Always on in sql server 2017
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
ApacheCon 2022: From Column-Level to Cell-Level_ Towards Finer-grained Encryp...
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 

Similar to Apache Solr-Webinar

Apache Solr
Apache SolrApache Solr
Apache Solr
Kevin Wenger
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
YI-CHING WU
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
Abanti Aazmin
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
Francisco Gonçalves
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
dnaber
 
Solr 101
Solr 101Solr 101
Solr 101
Findwise
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
nyccamp
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索longkeyy
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 

Similar to Apache Solr-Webinar (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Solr 101
Solr 101Solr 101
Solr 101
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 

Recently uploaded (20)

Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 

Apache Solr-Webinar

  • 1. www.edureka.co/apache-solr Introduction to APACHE SOLR View Apache Solr course details at www.edureka.co/apache-solr For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co
  • 2. Slide 2 LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate www.edureka.co/apache-solr How it Works?
  • 3. Objectives At the end of this module, you will be able to: Understand the need for search engine for enterprise grade applications Understand the objectives & challenges of search engine What is Indexing & Searching & Why do you need them ? What is Lucene & its overview? How is Indexing & Searching Handled in Lucene What is Solr & its features? What is Solr schema & its structure? Understand how to achieve Bigdata/NoSQL needs using SolrCloud  Explore job opportunity for Solr Developers Slide 3 www.edureka.co/apache-solr
  • 4. Introduction Apache Lucene Slide 4 www.edureka.co/apache-solr
  • 5. What is Lucene ?  Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications  Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy )  Scalable & High-performance Indexing  Powerful, Accurate and Efficient Search Algorithms  Cross-Platform Solution » Open Source & 100% pure Java » Implementations in other programming languages available that are index-compatible Doug Cutting “Creator” Slide 5 www.edureka.co/apache-solr
  • 6. Why Indexing ?  Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval  The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query  Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power  For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours Slide 6 www.edureka.co/apache-solr
  • 7. Indexing: Flow Tokens Inverted Index Document analysis indexing We can get a better idea of the flow of indexing from the following example: “edureka” Position:0 Offset:0 Length:7 “hadoop” Position:1 Offset:8 Length:6 “edureka hadoop” tokenization “Term Vector” “Term Vector” Slide 7 www.edureka.co/apache-solr
  • 8. Lucene: Writing to Index Document Field Field Field Field Analyzer IndexWriter Directory Classes used when indexing documents with Lucene Slide 8 www.edureka.co/apache-solr
  • 9. Lucene: Searching In Index  Query Parser translates a textual expression from the end into an arbitrarily complex query for searching Expression Query object QueryParser IndexSearcher Text fragments Analyzer Slide 9 www.edureka.co/apache-solr
  • 10. Lucene: Inverted Indexing Technique 1 1 1 3 1 1 1 3 1 1 1 3 1 1 1 3 1 1 9  Indexing uses Inverted Index technique (Ex: Book Index). Because indexes are faster to read documents Write a new segment for each new document insertion  Merge the segments when too many of them into the index. (Merge-sort technique to merge the index in to the store.)  Single updates are costly, preferred bulk updates due to merging Slide 10 www.edureka.co/apache-solr
  • 11. Lucene: Storage Schema  Like “databases” Lucene does not have common global schema  Lucene has indexes, which contains documents  Each document can have multiple fields  Each document can have different fields for every document  Fields can be only used to index & search or store it for retrieval  You can add new fields at any point of time Document-1 <Field1> <Field2> <Field3> Document-2 <Field2> <Field3> <Field4> Index-1 Slide 11 www.edureka.co/apache-solr
  • 12. Analyzers  Analyzers handle the job of analyzing text into tokens or keywords to be searched / indexed  An Analyzer builds TokenStreams, which analyze text and represents a policy for extracting index terms from text  There are few default Analyzers provided by Lucene, which can be used at the time of indexing or querying  Analyzers are provided to parse & analyze different languages like (Chinese, Japanese etc.,) Reader Tokenizer TokenFilter TokenFilter TokenFilter Tokens Slide 12 www.edureka.co/apache-solr
  • 13. Analyzers (Contd.) Core Class Examples (org.apache.lucene.analysis.Analyzer)  SmartChineseAnalyzer  SnowballAnalyzer  SynonymAnalyzer  StandardAnalyzer  StopAnalyzer  WhitespaceAnalyzer LowerCaseFilter  PorterStemFilter  ChineseAnalyzer  CzechAnalyzer  ShingleAnalyzerWrapper  SimpleAnalyzer Slide 13 www.edureka.co/apache-solr
  • 14. Querying: Key Types / Classes TermQuery  BooleanQuery  WildcardQuery  PhraseQuery  PrefixQuery  MultiPhraseQuery  FuzzyQuery RegexpQuery TermRangeQuery NumericRangeQuery  ConstantScoreQuery  DisjunctionMaxQuery MatchAllDocsQuery Query Slide 14 www.edureka.co/apache-solr
  • 15. Scoring: Score Boosting  Document’s weight / score can be changed from default, which is called as boosting  Lucene allows influencing search results by "boosting" at different times: Scoring Index Time Query Time Index-time boost by calling Field.setBoost() before a document is added to the index Query-time boost by setting a boost on a query clause, calling Query.setBoost() Slide 15 www.edureka.co/apache-solr
  • 16. Key Features Faceting Highlighting Grouping Joins Spatial Search Apache Tika Support Slide 16 www.edureka.co/apache-solr
  • 17. Introduction Apache Solr Slide 17 www.edureka.co/apache-solr
  • 18. Search Engine: Why do I need them? 1. Text Based Search 2. Filter 3. Documents 1 2 3 Slide 18 www.edureka.co/apache-solr
  • 19. Solr: Introduction  Solr is an open source enterprise search server / web application  Solr Uses the Lucene Search Library and extends it  Solr exposes lucene Java API’s as REST-Full services You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP You query it via HTTP GET and receive XML, JSON, CSV or binary results Slide 19 www.edureka.co/apache-solr
  • 20. Solr: History  In 2004, Solr was created by “Yonik Seeley” at CNET Networks as an in-house project to add search capability for the company website  In January 2006, CNET Networks decided to openly publish the source code by donating it to the Apache Software Foundation under the Lucene top-level project  In September 2008, Solr 1.3 was released with many enhancements including distributed search capabilities and performance enhancements among many others  In October 2012 Solr version 4.0 was released, including the new SolrCloud feature Yonik Seeley Slide 20 www.edureka.co/apache-solr
  • 21. Solr: Key Features Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML, JSON and HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Near Real-time indexing and Adaptable with XML Configuration Linearly scalable, auto index replication, auto, Extensible Plugin Architecture Slide 21 www.edureka.co/apache-solr
  • 22. Solr: Architecture Slide 22 www.edureka.co/apache-solr
  • 23. Solr: Admin UI Slide 23 www.edureka.co/apache-solr
  • 24. Solr Instance Solr: Schema Hierarchy Core/Index Documents Field Field Core/Index Core/Index Indexing & Querying Schema.xml Slide 24 www.edureka.co/apache-solr
  • 25. Solr: Core  Solr Core: Also referred to as just a "Core"  This is a running instance of a Lucene index along with all the Solr configuration (SolrConfigXml, SchemaXml, etc...) required to use it  A single Solr application can contain 0 or more cores  Cores are run largely in isolation but can communicate with each other if necessary via the CoreContainer  Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality at the "core" of Solr Slide 25 www.edureka.co/apache-solr
  • 26. Solr: Documents & Fields  Solr's basic unit of information is a document, which is a set of data that describes something Documents are composed of fields, which are more specific pieces of information  Fields can contain different kinds of data. A name field, for example, is text (character data) The field type tells Solr how to interpret the field and how it can be queried Slide 26 www.edureka.co/apache-solr
  • 27. Solr: Indexing Data  A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDFs Here are the three most common ways of loading data into a Solr index:  Uploading XML files by sending HTTP requests to the Solr  Using Index Handlers to Import from databases  Using the Solr Cell framework  Writing a custom Java application to ingest data through Solr's Java Client Slide 27 www.edureka.co/apache-solr
  • 28. Analysis Analyzers Tokenizers Filters Solr: Analysis  There are three main concepts in analysis: analyzers, tokenizers, and filters  Analyzers are used both during, when a document is indexed, and at query time » The same analysis process need not be used for both operations » An analyzer examines the text of fields and generates a token stream » Analyzers may be a single class or they may be composed of a series of tokenizer and filter classes  Tokenizers break field data into lexical units, or tokens  Filters examine a stream of tokens and keep them, transform or discard them, or create new ones Slide 28 www.edureka.co/apache-solr
  • 29. Solr: solrconfig.xml Lib directives indicates where Solr can find JAR files for extensions Register event handlers for searcher events; for example queries To execute to warm new searchers Activates version-dependent features in Lucene Index management settings Enable JMX instrumentation of Solr MBeans Update handler for indexing documents Cache-management settings Slide 29 www.edureka.co/apache-solr
  • 30. Solr: Search Process qt: selects a RequestHandler for a query using/select(by default ,the DisMaxRequestHandler is used) Request Handler defType : selects a query parser for the query (by default, uses whatever has been configured for the RequestHandler) Query Parser Response Writer qf: selects which fields to query in the index(by default, all fields are required) Index wt: selects a response writer for formatting the query response fq: filters query by applying an additional query to the initial query’s results, caches the results Rows: specifies the number of rows to be displayed at one time Start: specifies an offset(by default 0) into the query results where the returned response should begin Slide 30 www.edureka.co/apache-solr
  • 31. Solr Features  Faceting Highlighting  Spell Checking Query-Re-ranking Transforming  Suggestors More Like This  Pagination Grouping & Clustering  Spatial Search  Components Real time (Get & Update)  LABS Slide 31 www.edureka.co/apache-solr
  • 32. Configuring Solr Instances / Cores Solr Configurations Solfrconfig.xml Solr.xml Core.properties Schema.xml Slide 32 www.edureka.co/apache-solr
  • 33. SolrCloud Introduction  Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability called SolrCloud  SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas  Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas  Documents can be sent to any server and ZooKeeper will figure it out Slide 33 www.edureka.co/apache-solr
  • 34. Features  Horizontal Scaling (For Sharding & Replication)  Elastic Scaling  High Availability  Distributed Indexing  Distribution Searching  Central Configuration For Entire Cluster  Automatic Load Balancing  Automatic Failover For Queries  Zookeeper Integration For Coordination & Configurations Slide 34 www.edureka.co/apache-solr
  • 35. Architecture Slide 35 www.edureka.co/apache-solr
  • 36. Job trends for Apache Solr Slide 36 www.edureka.co/apache-solr
  • 37. Demo Slide 37 www.edureka.co/apache-solr
  • 38. Disclaimer Criteria and guidelines mentioned in this presentation may change. Please visit our website for latest and additional information on Apache Solr Slide 38 www.edureka.co/apache-solr
  • 39. Course Topics  Module 5 » Solr Searching  Module 6 » Solr Extended Features  Module 7 » Solr Cloud & Administration  Module 8 » Final Project  Module 1 » Introduction to Apache Lucene  Module 2 » Exploring Lucene  Module 3 » Introduction to Apache Solr  Module 4 » Solr Indexing Slide 39 www.edureka.co/apache-solr
  • 40. References  http://www.indeed.com/jobtrends  Office.com Clip Art/ Slide 40 www.edureka.co/apache-solr