This document provides an overview of Lucene and Solr for developers. It discusses Lucene core components like IndexWriter and analyzers. It also covers Solr architecture and customizing Solr through extension points like query parsers, update processors, and search components. The document provides examples and recommendations for testing and exploring the Lucene/Solr code and documentation.
This session will introduce and demonstrate several techniques for enhancing the search experience by augmenting documents during indexing. First we'll survey the analysis components available in Solr, and then we'll delve into using Solr's update processing pipeline to modify documents on the way in. The session will build on Erik's "Poor Man's Entity Extraction" blog at http://www.searchhub.org/2013/06/27/poor-mans-entity-extraction-with-solr/
Think *inside* the box. Inside the *search* box, that is.
The "best"* search results incorporate many more factors than (just) textual matching and relevancy. Search experience owners manage query context rules, signals automatically feed back machine learned factors, users implicit and explicit behaviors filter and weight future interactions. Synergy emerges with several cooperating (just) searches.
This talk will showcase and detail several (just) search examples including rules, typeahead/suggest, signals, and location awareness, bringing them all together into a cohesive search experience.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
This session will introduce and demonstrate several techniques for enhancing the search experience by augmenting documents during indexing. First we'll survey the analysis components available in Solr, and then we'll delve into using Solr's update processing pipeline to modify documents on the way in. The session will build on Erik's "Poor Man's Entity Extraction" blog at http://www.searchhub.org/2013/06/27/poor-mans-entity-extraction-with-solr/
Think *inside* the box. Inside the *search* box, that is.
The "best"* search results incorporate many more factors than (just) textual matching and relevancy. Search experience owners manage query context rules, signals automatically feed back machine learned factors, users implicit and explicit behaviors filter and weight future interactions. Synergy emerges with several cooperating (just) searches.
This talk will showcase and detail several (just) search examples including rules, typeahead/suggest, signals, and location awareness, bringing them all together into a cohesive search experience.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
Ingesting and Manipulating Data with JavaScriptLucidworks
Data in the wild isn’t always in the right format we need for search or even mere usability. Lucidworks Fusion offers powerful pipelines, parsers, and stages to wrangle your data into the right format to make it more findable and friendly. However, there are some cases where more obscure data will require the power of scripting.
Your data may need a complex transformation, a custom decryption algorithm, or you may already have existing code for handling a piece of data. Even in these more complex cases, Fusion’s JavaScript capabilities have got you covered.
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Apache Solr is a search engine that can scale from a personal project to a multi-terabyte cloud hosted cluster. At the same time, this ability to scale, tune and adjust to the clients' needs, can make it hard to understand the right aspects of Solr to bring to the problem.
In this session, Alexandre Rafalovitch (an Apache Solr committer) will do a speed run demonstrating how to create and tune a Solr 7.3 instance for a hypothetical Corporate Phone Directory application. It will cover:
*) The smallest learning schema/configuration required
*) Rapid schema evolution workflow
*) Dealing with multiple languages
*) Dealing with misspellings in search
*) Searching phone numbers
Presented at Solr meetup in Montreal, in May 2018.
Backing GitHub repository is: https://github.com/arafalov/solr-presentation-2018-may
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
From content to search: speed-dating Apache Solr (ApacheCON 2018)Alexandre Rafalovitch
While fully nuanced search implementation takes time, getting basic data ingestion, schema design and critical-path insights does not have to be a painful experience.
This talk uses several real-life datasets (from Data is Plural mailing list) and shows "Rapid Application Development"-style workflow to get the data into Solr and shape it ready for initial searchability and relevancy analysis.
Different stages of content ingestion, pre-processing, analysis, and querying are explained, together with trade-offs of different built-in approaches. Relevant document links are also included for more in-depth research.
This talk is for everybody who wants Search in their development stack but is not sure exactly where to start.
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
Ingesting and Manipulating Data with JavaScriptLucidworks
Data in the wild isn’t always in the right format we need for search or even mere usability. Lucidworks Fusion offers powerful pipelines, parsers, and stages to wrangle your data into the right format to make it more findable and friendly. However, there are some cases where more obscure data will require the power of scripting.
Your data may need a complex transformation, a custom decryption algorithm, or you may already have existing code for handling a piece of data. Even in these more complex cases, Fusion’s JavaScript capabilities have got you covered.
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Apache Solr is a search engine that can scale from a personal project to a multi-terabyte cloud hosted cluster. At the same time, this ability to scale, tune and adjust to the clients' needs, can make it hard to understand the right aspects of Solr to bring to the problem.
In this session, Alexandre Rafalovitch (an Apache Solr committer) will do a speed run demonstrating how to create and tune a Solr 7.3 instance for a hypothetical Corporate Phone Directory application. It will cover:
*) The smallest learning schema/configuration required
*) Rapid schema evolution workflow
*) Dealing with multiple languages
*) Dealing with misspellings in search
*) Searching phone numbers
Presented at Solr meetup in Montreal, in May 2018.
Backing GitHub repository is: https://github.com/arafalov/solr-presentation-2018-may
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
From content to search: speed-dating Apache Solr (ApacheCON 2018)Alexandre Rafalovitch
While fully nuanced search implementation takes time, getting basic data ingestion, schema design and critical-path insights does not have to be a painful experience.
This talk uses several real-life datasets (from Data is Plural mailing list) and shows "Rapid Application Development"-style workflow to get the data into Solr and shape it ready for initial searchability and relevancy analysis.
Different stages of content ingestion, pre-processing, analysis, and querying are explained, together with trade-offs of different built-in approaches. Relevant document links are also included for more in-depth research.
This talk is for everybody who wants Search in their development stack but is not sure exactly where to start.
Apache Solr is the popular, blazing fast open source enterprise search platform; it uses
Lucene as its core search engine. Solr’s major features include powerful full-text search, hit
highlighting, faceted search, dynamic clustering, database integration, and complex queries.
Solr is highly scalable, providing distributed search and index replication, and it powers the
search and navigation features of many of the world's largest internet sites.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
Introduction to Solr. A brief introduction to Solr for the resources who wants to get trained on Solr.
1. Introduction to Solr
2. Solr Terminologies
3.Installation and Configuration
4. Configuration files schema.xml and solrconfig.xml
5. Features of SOLR
a. Hit Highlighting
Auto Complete / Suggester
Stop words
Synonyms
SpellCheck
Geo Spatial Search
Result Grouping
Query Syntax
Query Boosting
Content Spotlighting
Block Record / Remove URL Feature
Content Spotlighting / Merchandising / Banner / Elevate
Block Record / Remove URL Feature
6. Indexing the Data
7. Search Queries
8. DataImportHandler - DIH
9. Plugins to index various types of Data (XML, CSV, DB, Filesystem)
10. Solr Client APIs
11. Overview of SOLRJ API
12. Running Solr on Tomcat
13. Enabling SSL on Solr
14. Zookeeper Configuration
15. Solr Cloud Deployment
16. Production Indexing Architecture
17. Production Serving Architecture
18. Solr Upgradation
19. References
Tutorial on developing a Solr search component pluginsearchbox-com
In this set of slides we give a step by step tutorial on how to develop a fully functional solr search component plugin. Additionally we provide links to full source code which can be used as a template to rapidly start creating your own search components.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
ERRest - Designing a good REST serviceWO Community
Learn how to design your REST service with the correct HTTP verbs and codes. We also talk on how to manage optimistic locking and caching for your REST services.
Spoon is an open-source library that enables you to transform and analyze Java source code. Due to a complete and fine-grained Java metamodel, you can read and write the AST built by Spoon. In this talk, you'll see all strong concepts and API with an example. Then, you'll see how you can integrate this project in yours.
Solr now smoothly integrates with Lucene-level payloads.
Payloads provide optional per-term metadata, numeric or otherwise. Payloads help solve challenging use cases such as per-store product pricing and per-term confidence/weighting.
This session will present the payload feature from the Lucene layer up to the Solr integration, including per-store pricing, per-term weighting, and more.
Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications.
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
Solr is continually improving. Solr 4 was recently released, bringing dramatic changes in the underlying Lucene library and Solr-level features. It's tough for us all to keep up with the various versions and capabilities.
This talk will blaze through the highlights of new features and improvements in Solr 4 (and up). Topics will include: SolrCloud, direct spell checking, surround query parser, and many other features. We will focus on the features library coders really need to know about.
In this talk, Solr's built-in query parsers will be detailed included when and how to use them. Solr has nested query parsing capability, allowing for multiple query parsers to be used to generate a single query. The nested query parsing feature will be described and demonstrated. In many domains, e-commerce in particular, parsing queries often means interpreting which entities (e.g. products, categories, vehicles) the user likely means; this talk will conclude with techniques to achieve richer query interpretation.
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward. Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Come learn how you can get your content into Solr and integrate it into your applications!
Got data? Let's make it searchable! This presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how to showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
Solr powers library, government, and enterprise search systems in thousands of applications. This talk showcases various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr.
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
Solr powers library, government, and enterprise search systems in thousands of applications. This talk will showcase the various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr. Several of the full featured open-source library Solr front-ends will be shown, including Blacklight and VuFind. We’ll also demonstrate several front-end frameworks including:
• SolrJS - a JavaScript widget library
• Solr Flare - a Ruby on Rails plugin featuring Simile Timeline integration, Ajax suggest, and more
• Solritas - a built-in lightweight UI templating framework
Additionally, we’ll take a look under the covers of http://search.lucidimagination.com and see what makes it shine.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
1. Lucene for Solr
Developers
uberconf - July 14, 2011
Presented by Erik Hatcher
erik.hatcher@lucidimagination.com
Lucid Imagination
http://www.lucidimagination.com
4. Customizing - Don't do it!
• Unless you need to.
• In other words... ensure you've given the built-in
capabilities a try, asked on the e-mail list, and
spelunked into at least Solr's code a bit to make
some sense of the situation.
• But we're here to roll up our sleeves, because we
need to...
5. But first...
• Look at Lucene and/or Solr source code as
appropriate
• Carefully read javadocs and wiki pages - lots of tips
there
• And, hey, search for what you're trying to do...
• Google, of course
• But try out LucidFind and other Lucene ecosystem
specific search systems -
http://www.lucidimagination.com/search/
7. Factories
• FooFactory (most) everywhere.
Sometimes there's BarPlugin style
• for sake of discussion... let's just skip the
"factory" part
• In Solr, Factories and Plugins are used by
configuration loading to parameterize and
construct
8. "Installing" plugins
• Compile .java to .class, JAR it up
• Put JAR files in either:
• <solr-home>/lib
• a shared lib when using multicore
• anywhere, and register location in
solrconfig.xml
• Hook in plugins as appropriate
14. CharFilter
• extend BaseCharFilter
• enables pre-tokenization filtering/morphing
of incoming field value
• only affects tokenization, not stored value
• Built-in CharFilters: HTMLStripCharFilter,
PatternReplaceCharFilter, and
MappingCharFilter
15. Tokenizer
• common to extend CharTokenizer
• implement -
• protected abstract boolean isTokenChar(int c);
• optionally override -
• protected int normalize(int c)
• extend Tokenizer directly for finer control
• Popular built-in Tokenizers include: WhitespaceTokenizer,
StandardTokenizer, PatternTokenizer, KeywordTokenizer,
ICUTokenizer
16. TokenFilter
• a TokenStream whose input is another
TokenStream
• Popular TokenFilters include:
LowerCaseFilter, CommonGramsFilter,
SnowballFilter, StopFilter,
WordDelimiterFilter
17. Lucene's analysis APIs
• tricky business, what with Attributes
(Source/Factory's), State, characters, code
points,Version, etc...
• Test!!!
• BaseTokenStreamTestCase
• Look at Lucene and Solr's test cases
22. Built-in QParsers
from QParserPlugin.java
/** internal use - name to class mappings of builtin parsers */
public static final Object[] standardPlugins = {
LuceneQParserPlugin.NAME, LuceneQParserPlugin.class,
OldLuceneQParserPlugin.NAME, OldLuceneQParserPlugin.class,
FunctionQParserPlugin.NAME, FunctionQParserPlugin.class,
PrefixQParserPlugin.NAME, PrefixQParserPlugin.class,
BoostQParserPlugin.NAME, BoostQParserPlugin.class,
DisMaxQParserPlugin.NAME, DisMaxQParserPlugin.class,
ExtendedDismaxQParserPlugin.NAME, ExtendedDismaxQParserPlugin.class,
FieldQParserPlugin.NAME, FieldQParserPlugin.class,
RawQParserPlugin.NAME, RawQParserPlugin.class,
TermQParserPlugin.NAME, TermQParserPlugin.class,
NestedQParserPlugin.NAME, NestedQParserPlugin.class,
FunctionRangeQParserPlugin.NAME, FunctionRangeQParserPlugin.class,
SpatialFilterQParserPlugin.NAME, SpatialFilterQParserPlugin.class,
SpatialBoxQParserPlugin.NAME, SpatialBoxQParserPlugin.class,
JoinQParserPlugin.NAME, JoinQParserPlugin.class,
};
23. Local Parameters
• {!qparser_name param=value}expression
• or
• {!qparser_name param=value v=expression}
• Can substitute $references from request
parameters
25. Custom QParser
• Implement a QParserPlugin that creates your
custom QParser
• Register in solrconfig.xml
• <queryParser name="myparser"
class="com.mycompany.MyQParserPlugin"/>
27. Built-in Update
Processors
• RunUpdateProcessor
• Actually performs the operations, such as
adding the documents to the index
• LogUpdateProcessor
• Logs each operation
• SignatureUpdateProcessor
• duplicate detection and optionally rejection
29. Update Processor
Chain
• UpdateProcessor's sequence into a chain
• Each processor can abort the entire update
or hand processing to next processor in
the chain
• Chains, of update processor factories, are
specified in solrconfig.xml
• Update requests can specify an
update.processor parameter
30. Default update
processor chain
From SolrCore.java
// construct the default chain
UpdateRequestProcessorFactory[] factories =
new UpdateRequestProcessorFactory[]{
new RunUpdateProcessorFactory(),
new LogUpdateProcessorFactory()
};
Note: these steps have been swapped on trunk recently
31. Example Update
Processor
• What are the best facets to show for a particular
query? Wouldn't it be nice to see the distribution of
document "attributes" represented across a result
set?
• Learned this trick from the Smithsonian, who were
doing it manually - add an indexed field containing the
field names of the interesting other fields on the
document.
• Facet on that field "of field names" initially, then
request facets on the top values returned.
33. FieldsUsedUpdateProcessorFactory
public class FieldsUsedUpdateProcessorFactory extends UpdateRequestProcessorFactory {
private String fieldsUsedFieldName;
private Pattern fieldNamePattern;
public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp,
UpdateRequestProcessor next) {
return new FieldsUsedUpdateProcessor(req, rsp, this, next);
}
// ... next slide ...
}
34. FieldsUsedUpdateProcessorFactory
@Override
public void init(NamedList args) {
if (args == null) return;
SolrParams params = SolrParams.toSolrParams(args);
fieldsUsedFieldName = params.get("fieldsUsedFieldName");
if (fieldsUsedFieldName == null) {
throw new SolrException
(SolrException.ErrorCode.SERVER_ERROR,
"fieldsUsedFieldName must be specified");
}
// TODO check that fieldsUsedFieldName is a valid field name and multiValued
String fieldNameRegex = params.get("fieldNameRegex");
if (fieldNameRegex == null) {
throw new SolrException
(SolrException.ErrorCode.SERVER_ERROR,
"fieldNameRegex must be specified");
}
fieldNamePattern = Pattern.compile(fieldNameRegex);
super.init(args);
}
35. class FieldsUsedUpdateProcessor extends UpdateRequestProcessor {
public FieldsUsedUpdateProcessor(SolrQueryRequest req,
SolrQueryResponse rsp,
FieldsUsedUpdateProcessorFactory factory,
UpdateRequestProcessor next) {
super(next);
}
@Override
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();
Collection<String> incomingFieldNames = doc.getFieldNames();
Iterator<String> iterator = incomingFieldNames.iterator();
ArrayList<String> usedFields = new ArrayList<String>();
while (iterator.hasNext()) {
String f = iterator.next();
if (fieldNamePattern.matcher(f).matches()) {
usedFields.add(f);
}
}
doc.addField(fieldsUsedFieldName, usedFields.toArray());
super.processAdd(cmd);
}
}
38. Example - auto facet
select
• It sure would be nice if you could have Solr automatically
select field(s) for faceting based dynamically off the
profile of the results. For example, you're indexing
disparate types of products, all with varying attributes
(color, size - like for apparel, memory_size - for
electronics, subject - for books, etc), and a user searches
for "ipod" where most products match products with
color and memory_size attributes... let's automatically
facet on those fields.
• https://issues.apache.org/jira/browse/SOLR-2641
39. AutoFacetSelection
Component
• Too much code for a slide, let's take a look in
an IDE...
• Basically -
• process() gets autofacet.field and autofacet.n
request params, facets on field, takes top N
values, sets those as facet.field's
• Gotcha - need to call rb.setNeedDocSet
(true) in prepare() as faceting needs it