This document provides an introduction to Apache Lucene and Solr. It begins with an overview of information retrieval and some basic concepts like term frequency-inverse document frequency. It then describes Lucene as a fast, scalable search library and discusses its inverted index and indexing pipeline. Solr is introduced as an enterprise search platform built on Lucene that provides features like faceting, scalability and real-time indexing. The document concludes with examples of how Lucene and Solr are used in applications and websites for search, analytics, auto-suggestion and more.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
Got data? Let's make it searchable! This presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how to showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well
Intro talk for UNC School of Information and Library Science. Covers basics of Lucene and Solr as well as info on Lucene/Solr jobs, opportunities, etc.
Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Got data? Let's make it searchable! This presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how to showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well
Intro talk for UNC School of Information and Library Science. Covers basics of Lucene and Solr as well as info on Lucene/Solr jobs, opportunities, etc.
Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
All you need to start with Apache Solr (elastic search). This presentation includes all the information of Solr i.e. what it is, installation, indexing & searching for beginners.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
An overview of building and serving Lucene indexes on a Hadoop cluster with Solr for text and parametric searching, as presented at Cleveland Hadoop User Group on 13 January 2014.
Solr at zvents 6 years later & still going stronglucenerevolution
Presented by Amit Nithianandan, Lead Engineer Search/Analytics New Platforms, Zvents/Stubhub
Zvents has been a user of Apache Solr since 2007 when it was very early. Since then, the team has made extensive use of the various features and most recently completed an overhaul of the search engine to Solr 4.0. We'll touch on a variety of development/operational topics including how we manage the build lifecycle of the search application using Maven, release the deployment package using Capistrano and monitor using NewRelic as well as the extensive use of virtual machines to simplify node management. Also, we’ll talk about application level details such as our unique federated search product, and the integration of technologies such as Hypertable, RabbitMQ, and EHCache to power more real-time ranking and filtering based on traffic statistics and ticket inventory.
The Enterprise Data Lake has become the defacto repository of both structured and unstructured data within an enterprise. Being able to discover information across both structured and unstructured data using search is a key capability of enterprise data lake. In this workshop, we will provide an in-depth overview of HDP Search with focus on configuration, sizing and tuning. We will also deliver a working example to showcase the usage of HDP Search along with the rest of platform capabilities to deliver real world solution.
We got a huge collection of (un-labelled) documents and we would like to explore the knowledge inside. Imagine that we could run an unsupervised, automated pipeline to generate connections between them state-of-the art techniques to programmatically generating annotations for each of the texts inside big collections of documents in a way that is computationally affordable
Similar to Introduction to Apache Lucene/Solr (20)
Emerging technologies /frameworks in Big DataRahul Jain
A short overview presentation on Emerging technologies /frameworks in Big Data covering Apache Parquet, Apache Flink, Apache Drill with basic concepts of Columnar Storage and Dremel.
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
Slides from my talk on "Building a Large Scale SEO/SEM Application with Apache Solr" in Lucene/Solr Revolution 2014 where I talk how we handle Indexing/Search of 40 billion records (documents)/month in Apache Solr with 4.6 TB compressed index data.
Abstract: We are working on building a SEO/SEM application where an end user search for a "keyword" or a "domain" and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr.
Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
A hibernate tutorial for beginners. It describe the hibernate concepts in a lucid manner and and test project(User application with database) to get hands on over the same.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
2. Who am I?
Software Engineer @ IVY Comptech, Hyderabad
7 years of programming learning experience
Built a platform to search logs in Near real time with
volume of 1TB/day#
Worked on a Solr search based SEO/SEM software with
40 billion records/month (Topic of next talk?)
Areas of expertise/interest
High traffic web applications
JAVA/J2EE
Big data, NoSQL
Information-Retrieval, Machine learning
2# http://www.slideshare.net/lucenerevolution/building-a-near-real-time-search-engine-analytics-for-logs-using-solr
4. Information Retrieval (IR)
”Information retrieval is the activity of
obtaining information resources (in the
form of documents) relevant to an
information need from a collection of
information resources. Searches can
be based on metadata or on full-text
(or other content-based) indexing”
- Wikipedia
4
5. Basic Concepts
• tf (t in d) : term frequency in a document
• measure of how often a term appears in the document
• the number of times term t appears in the currently scored
document d
• idf (t) : inverse document frequency
• measure of whether the term is common or rare across all
documents, i.e. how often the term appears across the index
• obtained by dividing the total number of documents by the
number of documents containing the term, and then taking
the logarithm of that quotient.
• boost (index) : boost of the field at index-time
• boost (query) : boost of the field at query-time
5
6. Basic Concepts
TF - IDF
TF - IDF = Term Frequency X Inverse Document Frequency
Credit: http://http://whatisgraphsearch.com/
8. Apache Lucene
• Fast, high performance, scalable search/IR library
• Open source
• Initially developed by Doug Cutting (Also author
of Hadoop)
• Indexing and Searching
• Inverted Index of documents
• Provides advanced Search options like
synonyms, stopwords, based on
similarity, proximity.
• http://lucene.apache.org/ 8
9. Lucene Internals - Inverted Index
Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
9
10. Lucene Internals (Contd.)
• Defines documents Model
• Index contains documents.
• Each document consist of fields.
• Each Field has attributes.
– What is the data type (FieldType)
– How to handle the content (Analyzers, Filters)
– Is it a stored field (stored="true") or Index field (indexed="true")
10
11. Indexing Pipeline
• Analyzer : create tokens using a Tokenizer and/or applying
Filters (Token Filters)
• Each field can define an Analyzer at index time/query time or
the both at same time.
Credit : http://www.slideshare.net/otisg/lucene-introduction 11
12. Analysis Process - Tokenizer
WhitespaceAnalyzer
Simplest built-in analyzer
The quick brown fox jumps over the lazy dog.
[The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog.]
Tokens
13. Analysis Process - Tokenizer
SimpleAnalyzer
Lowercases, split at non-letter boundaries
The quick brown fox jumps over the lazy dog.
[the] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog]
Tokens
20. Solr – schema.xml
• Types with index and query Analyzers - similar to data
type
• Fields with name, type and options
• Unique Key : Unique Identifier of a document. For e.g. “id”
• Dynamic Fields : Dynamic fields allow Solr to index fields that you did not
explicitly define in your schema. For e.g. fieldName: *_i or *_txts
• Copy Fields : Solr has a mechanism for making copies of fields so that you can apply
several distinct field types to a single piece of incoming information. field ‘a‘ populates field ‘b’ with
its value before tokenizing (having different analyzer/filter).
20
21. Solr – Content Analysis
• Field Attributes
Name : Name of the field
Type : Data-type (FieldType) of the field
Indexed : Should it be indexed (indexed="true/false")
Stored : Should it be stored (stored="true/false")
Required : is it a mandatory field
(required="true/false")
Multi-Valued : Would it will contains multiple values
e.g. text: pizza, food (multiValued="true/false")
e.g. <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
21
22. Solr – solrconfig.xml
• Data dir: where all index data will be stored
• Index configuration
• Cache configurations
• Request Handler configuration
• Search components, response writers, query
parsers
22
23. Query Types
• Single and multi term queries
• ex fieldname:value or title: software engineer
• +, -, AND, OR NOT operators.
• ex. title: (software AND engineer)
• Range queries on date or numeric fields,
• ex: timestamp: [ * TO NOW ] or price: [ 1 TO 100 ]
• Boost queries:
• e.g. title:Engineer ^1.5 OR text:Engineer
• Fuzzy search : is a search for words that are similar in
spelling
• e.g. roam~0.8 => noam
• Proximity Search : with a sloppy phrase query. The
close together the two terms appear, higher the score.
• ex “apache lucene”~20 : will look for all documents where
“apache” word occurs within 20 words of “lucene”
23
24. Solr/Lucene Use-cases
• Search
• Analytics
• NoSQL datastore
• Auto-suggestion / Auto-correction
• Recommendation Engine (MoreLikeThis)
• Relevancy Engine (Feedback to other applications)
• Solr as a White-List
• GeoSpatial based Search
24
27. Faceting
Source: www.career9.com, www.indeed.com 27
• Grouping results based on field
value
• Facet on: field
terms, queries, date ranges
• &facet=on
&facet.field=job_title
&facet.query=salary:[30000 TO
100000]
• http://wiki.apache.org/solr/Sim
pleFacetParameters
28. Analytics
Analytics source : Kibana.org based on ElasticSearch and Logstash
Image Source : http://semicomplete.com/presentations/logstash-monitorama-2013/#/8
28
32. Solr/Lucene Meetup
• Building Big Data Analytics Platforms using Elasticsearch
(Kibana)
• Saturday, April 19, 2014 10:00 AM
• IIIT Hyderabad
• URL: http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/events/150134392/
OR
• Search on Google …
33. Thanks!
@rahuldausa on twitter and slideshare
http://www.linkedin.com/in/rahuldausa
Find Interesting ?
Join us @ http://www.meetup.com/Hyderabad-Apache-Solr-Lucene-Group/
33