Erik Hatcher presented on Solr and Lucene. He discussed what Solr is, how it is built on Apache Lucene, and how it provides a search server with features like scalability, fast performance, and extensibility. He provided examples of starting Solr, indexing and searching documents, and the various configuration files and components used.
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
Solr powers library, government, and enterprise search systems in thousands of applications. This talk showcases various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr.
This session will introduce and demonstrate several techniques for enhancing the search experience by augmenting documents during indexing. First we'll survey the analysis components available in Solr, and then we'll delve into using Solr's update processing pipeline to modify documents on the way in. The session will build on Erik's "Poor Man's Entity Extraction" blog at http://www.searchhub.org/2013/06/27/poor-mans-entity-extraction-with-solr/
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
Solr powers library, government, and enterprise search systems in thousands of applications. This talk showcases various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr.
This session will introduce and demonstrate several techniques for enhancing the search experience by augmenting documents during indexing. First we'll survey the analysis components available in Solr, and then we'll delve into using Solr's update processing pipeline to modify documents on the way in. The session will build on Erik's "Poor Man's Entity Extraction" blog at http://www.searchhub.org/2013/06/27/poor-mans-entity-extraction-with-solr/
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
Solr powers library, government, and enterprise search systems in thousands of applications. This talk will showcase the various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr. Several of the full featured open-source library Solr front-ends will be shown, including Blacklight and VuFind. We’ll also demonstrate several front-end frameworks including:
• SolrJS - a JavaScript widget library
• Solr Flare - a Ruby on Rails plugin featuring Simile Timeline integration, Ajax suggest, and more
• Solritas - a built-in lightweight UI templating framework
Additionally, we’ll take a look under the covers of http://search.lucidimagination.com and see what makes it shine.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
Solr powers library, government, and enterprise search systems in thousands of applications. This talk will showcase the various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr. Several of the full featured open-source library Solr front-ends will be shown, including Blacklight and VuFind. We’ll also demonstrate several front-end frameworks including:
• SolrJS - a JavaScript widget library
• Solr Flare - a Ruby on Rails plugin featuring Simile Timeline integration, Ajax suggest, and more
• Solritas - a built-in lightweight UI templating framework
Additionally, we’ll take a look under the covers of http://search.lucidimagination.com and see what makes it shine.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward.
Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
A brief introduction to using Apache Solr for implementing search for your website.
Download the ppt to see comments which add more detail.
Presented at eBig Java SIG, Oakland, CA. June 2008
These slide belonged to the presentation I hold to my colleagues in Göttingen as an introduction to Apache Solr open source search engine. In the structure I followed Trey Grainger and Timothy Potter excellent Solr in Action book (Manning, 2014), and I took some of the examples form there. Some others come from the examples bundeled with Solr, and from the projects I had opportunity to work with in the past (eXtensible Catalog and Europeana).
These slides don't go too deep, if you want to know more about the topic, just drop me an email, or consult with the references on the last slide.
Happy searching!
Webinar: Solr's example/files: From bin/post to /browse and BeyondLucidworks
Join Lucidworks cofounder, Sr. Solutions Architect, and Lucene/Solr committer, Erik Hatcher for a webinar to explore how to build a personal document search app with the ease and power of Solr.
Scaling search to a million pages with Solr, Python, and Djangotow21
A talk given to DJUGL on the 26th July 2010, describing and introducing Solr, and discussing how we use it at Timetric to drive navigation across over a million dataseries.
Presentación curso de Apache Solr. A través de la realización de ejercicios prácticos, se obtendrán conocimientos sobre la implantación de tecnologías Solr, configuración, indexación, análisis y resolución de problemas comunes.
Seminario de Apache Solr, organizado por Paradigma Tecnologico y Javahispano, presentado por Marco Martinez y Alejandro Marques el 8 de Junio de 2010
Mas info:
Formación de Solr Avanzado, incluye muchos aspectos sobre Solr desde la arquitectura, la clusterización o el sharding, hasta las políticas de indexación y búsqueda distribuida, así como los componentes y handlers para la búsqueda avanzada (Faceting, Grouping, Sorting, Highlighting, Spellchecking, More like this, etc...)
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
The traditional and typical search use case is the one large search collection distributed among many nodes and shared by all users. However, there is a class of applications which need a large number of small or medium collections which can be used, managed and scaled separately. This talk will cover our effort in helping a client set up a large scale SolrCloud setup with thousands of collections running on hundreds of nodes. I will describe the bottlenecks that we found in SolrCloud when running a large number of collections. I will also take you through the multiple features and optimizations that we contributed to Apache Solr to reduce or remove the choke points in the system. Finally, I will talk about the benchmarking process and the lessons learned from supporting such an installation in production.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
* Open source search with Solr/Lucene gives you the power to turn a wide range of information into fast, useful, relevant results!
* LucidWorks for Solr gives you a tested, release-stable certified distribution of open source search with enhanced tools and installation for building search apps quickly and reliably.
http://www.lucidimagination.com/How-We-Can-Help/webinar-from-search-to-found
The Enterprise Data Lake has become the defacto repository of both structured and unstructured data within an enterprise. Being able to discover information across both structured and unstructured data using search is a key capability of enterprise data lake. In this workshop, we will provide an in-depth overview of HDP Search with focus on configuration, sizing and tuning. We will also deliver a working example to showcase the usage of HDP Search along with the rest of platform capabilities to deliver real world solution.
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchNetConstructor, Inc.
Search faceting presentation and comparison of BigData textual content indexing/search analytics solutions. Presentation focuses on the comparison of the open source solutions provided by Apache Solr to those of ElasticSearch.
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
OpenCms 8.5 integrates Apache Solr. And not only for full text search, but as a powerful query engine as well.
Imagine you want to show a list of "all resources of type news, that have changed since yesterday, where property X has the value Y" on your web page. Sure, there are API methods in OpenCms to load resources based on the type, on the date of change, or on the value of a specific property. But for many common use case combinations, there is no single API call. This means if you create a collector, you often end up sorting out the results of the initial API query in code.
In this session, Rüdiger will show how Apache Solr has been integrated in OpenCms 8.5. He will explain how to create improved front-end full text search functions with advanced options like faceting and spell check suggestions. And he will explain how to use Solr to directly read resources from the OpenCms VFS, allowing query combinations that combine resource attributes, properties and content in a powerful new way.
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Solr now smoothly integrates with Lucene-level payloads.
Payloads provide optional per-term metadata, numeric or otherwise. Payloads help solve challenging use cases such as per-store product pricing and per-term confidence/weighting.
This session will present the payload feature from the Lucene layer up to the Solr integration, including per-store pricing, per-term weighting, and more.
Think *inside* the box. Inside the *search* box, that is.
The "best"* search results incorporate many more factors than (just) textual matching and relevancy. Search experience owners manage query context rules, signals automatically feed back machine learned factors, users implicit and explicit behaviors filter and weight future interactions. Synergy emerges with several cooperating (just) searches.
This talk will showcase and detail several (just) search examples including rules, typeahead/suggest, signals, and location awareness, bringing them all together into a cohesive search experience.
Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications.
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
Solr is continually improving. Solr 4 was recently released, bringing dramatic changes in the underlying Lucene library and Solr-level features. It's tough for us all to keep up with the various versions and capabilities.
This talk will blaze through the highlights of new features and improvements in Solr 4 (and up). Topics will include: SolrCloud, direct spell checking, surround query parser, and many other features. We will focus on the features library coders really need to know about.
In this talk, Solr's built-in query parsers will be detailed included when and how to use them. Solr has nested query parsing capability, allowing for multiple query parsers to be used to generate a single query. The nested query parsing feature will be described and demonstrated. In many domains, e-commerce in particular, parsing queries often means interpreting which entities (e.g. products, categories, vehicles) the user likely means; this talk will conclude with techniques to achieve richer query interpretation.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward. Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Come learn how you can get your content into Solr and integrate it into your applications!
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Got data? Let's make it searchable! This presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how to showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Solr Powered Lucene
1. Solr Powered Lucene
Erik Hatcher
Lucid Imagination
erik.hatcher@lucidimagination.com
Northern Virginia Java Users Group
December 16, 2009
1
2. Erik Hatcher
• Member of Technical Staff, Lucid
Imagination
• Apache Lucene/Solr Committer
• Member, Apache Software Foundation
• Co-author, Lucene in Action and Java
Development with Ant (Manning)
2
3. A word from our
sponsor...
• Pizza!
• Lucid Imagination
• commercial entity exclusively dedicated to Apache Lucene/
Solr open source search technology
• Services: Technical Support, Expert Link, Training, Consulting
• Tools: LucidGaze for Lucene and Solr
• Free certified distributions of Solr and Lucene
• more to come...
3
4. What is Solr?
• Search server
• Built upon Apache Lucene (Java)
• Fast, very
• Scalable, query load and collection size
• Interoperable
• Extensible
4
7. Solr History
• Created by Yonik Seeley for CNET
• Contributed to Apache in January 2006
• December 2006:Version 1.
• June 2007:Version 1.2
• September 2008:Version 1.3
• November 2009:Version 1.4
7
8. Features
• Lucene power exposed over HTTP
• Spell checking, highlighting,
more-like-this
• Caching
• Replication
• Faceting
• Distributed search
8
9. Solr APIs
• HTTP GET/POST (curl or any other HTTP
client)
• JSON
• SolrJ (embedded or HTTP)
• Ruby: solr-ruby, RSolr, etc
• Many others: python, PHP, solrsharp, XSLT
9
10. Deployment
Architecture
• Scales from:
• single Solr server
• master/replicants(slaves)
• distributed shards
• Each Solr instance can also
have multiple cores
10
13. Inverted Index
From "Taming Text" by Grant Ingersoll and Tom Morton
• Commonly used search
engine data structure
• Efficient lookup of terms
across large number of
documents
• Usually stores positional
information to enable
phrase/proximity queries
13
17. Relevance
• Term frequency (TF): number of times a term
appears in a document
• Inverse document frequency (IDF): One over
number of times term appears in the index (1/
df)
• Field length normalization: control affect field
length, in number of terms, has on score
• Boost factors: terms, fields, or documents
17
18. Solr Core
• single primary index
• schema.xml / solrconfig.xml
• (optionally) multiple cores per Solr
instance, configured in solr.xml
• other configuration and data files
18
19. schema.xml
• Field types
• Fields
• Unique key (optional*)
* I've never seen a case that didn't require a
unique identifier per document
• copy fields
• similarity and Solr query parser configuration
19
26. Content Streams
• Allows Solr server to fetch local or remote data
itself. Must enable remote streaming in
solrconfig.xml
• http://localhost:8983/solr/update
• ?stream.file=<local Solr path to
exampledocs>/ipod_video.xml
• ?stream.url=<url to content>
• Security warning: allows Solr to fetch arbitrary
server-side file or network URL content
26
27. Indexing with SolrJ
SolrServer solr =
new CommonsHttpSolrServer(
new URL("http://localhost:8983/solr"));
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "EXAMPLEDOC01");
doc.addField("title", "NOVAJUG SolrJ Example");
solr.add(doc);
solr.commit(); // after a batch, not per document
solr.optimize(); // periodically, if/when needed
27
28. Indexing with solr-ruby
solr = Connection.new(
'http://localhost:8983/solr',
:autocommit => :on
solr.add(:id => 123,
:title => 'Solr in Action')
solr.optimize # periodically, as needed
28
29. delete, update, etc
• Delete:
• <delete><id>05991</id></delete>
• <delete>
<query>category:Unused</query>
</delete>
• java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>"
• Update: simply <add> doc with same unique key
• <commit/> pending documents
• <optimize/> index, squeezes out deleted documents, collapses segments
• <rollback/> to last commit point
Update commands via GET: http://localhost:8983/solr/update?stream.body=<commit/>
29
30. Data Import Handler
• Indexes relational database, XML data, and e-
mail sources
• Supports full and incremental/delta indexing
• Highly extensible with custom data sources,
transformers, etc
• http://wiki.apache.org/solr/DataImportHandler
30
31. DIH details
• Put JDBC driver JAR in <solr-home>/lib,
configure dataimport request handler
• http://localhost:8983/solr/db/admin/
dataimport.jsp - debugging console
• http://localhost:8983/solr/db/dataimport?
command=full-import - removes all
documents and imports from scratch
31
32. Solr Cell
• aka ExtractingRequestHandler
• leveraging Tika, extracts and indexes rich
documents such as Word, PDF, HTML, and
many other types
• curl 'http://localhost:8983/solr/update/
extract?literal.id=doc1&commit=true' -F
"myfile=@tutorial.html"
• http://wiki.apache.org/solr/
ExtractingRequestHandler
32
33. Standard Search
Request
• http://localhost:8983/solr/select?q=query
33
34. Debug Query
• &debugQuery=true is your friend
• Includes parsed query, explanations, and
search component timings in response
34
35. Searching
• Send GET HTTP requests
• http://localhost:8983/solr/select?
q=solr&start=0&rows=10&fl=id,name
• start: zero-based starting result
• rows: number of hits to return
• fl: list of stored fields to return
35
36. Query Parser
• Controlled by defType parameter
• &defType=lucene (actually a Solr
extension of Lucene’s QueryParser)
• &defType=dismax
• Local {!...} override syntax
36
37. Solr Query Parser
• http://lucene.apache.org/java/2_9_1/
queryparsersyntax.html+ Solr extensions
• Kitchen sink parser, includes advanced user-
unfriendly syntax
• Syntax errors throw parse exceptions back to
client
• Example: title:ipod* AND price:[0 TO 100]
• http://wiki.apache.org/solr/SolrQuerySyntax
37
38. Dismax Query Parser
• Simplified syntax:
loose text “quote phrases” -prohibited
+required
• Spreads query terms across query fields
(qf) with dynamic boosting per field, phrase
construction (pf), and boosting query and
function capabilities (bq and bf)
38
39. Searching with SolrJ
SolrServer server = new
CommonsHttpSolrServer("http://localhost:8983/solr");
SolrQuery params = new SolrQuery("author:John");
params.setFields("*,score");
params.setRows(3);
QueryResponse response = server.query(params);
for (SolrDocument document : response.getResults()) {
System.out.println("Doc: " + document);
}
39
40. Searching with Ruby
conn = Connection.new(
'http://localhost:8983/solr')
conn.query('my query') do |hit|
puts hit.inspect
end
40
45. More Like This
• http://localhost:8983/solr/select?
q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min
df=1&mlt.mintf=1&fl=id,score,name
• http://wiki.apache.org/solr/MoreLikeThis
45
46. Query Elevation
• http://localhost:8983/solr/elevate?
q=ipod&debugQuery=true&enableElevation
=true
• Configure an “elevate.xml” to boost/
exclude specific documents
• http://wiki.apache.org/solr/
QueryElevationComponent
46
47. Clustering
• Dynamic grouping of documents into labeled
sets
• http://localhost:8983/solr/clustering?
q=*:*&rows=10
• http://wiki.apache.org/solr/
ClusteringComponent
• Requires additional steps to install (see
documentation) with Apache Solr distro
47
49. Term Vectors
• Details term vector information: term
frequency, document frequency, position
and offset information
• http://localhost:8983/solr/select/?q=*
%3A*&qt=tvrh&tv=true&tv.all=true
49
50. stats.jsp
• Not technically a “request handler”, outputs
only XML
• http://localhost:8983/solr/admin/stats.jsp
• Index stats such as number of documents,
searcher open time
• Request handler details, number of
requests and errors, average request time,
50
51. Replication
• Master is polled
• Replicant pulls Lucene index and optionally
also Solr configuration files
• Query throughput scaling: replicate and
load balance
• http://wiki.apache.org/solr/SolrReplication
51
52. Distributed Search
• Distribute documents to same-schema
shards
• Scaling for when single index becomes too
large, or a single query becomes too slow
• http://wiki.apache.org/solr/
DistributedSearch
52
53. What’s new in Solr 1.4?
• Java-based replication • StatsComponent
• VelocityResponseWriter • TermVectorComponent
(Solritas)
• Configurable Directory
• AJAX-Solr provider
• Logging switched to
SLF4J
• Rollback, since last
commit
53
54. Lucene 2.9
• IndexReader#reopen()
• Faster filter performance, by 300% in some cases
• Per-segment FieldCache
• Reusable token streams
• Faster numeric/date range queries, thanks to trie
• and tons more, see Lucene 2.9's CHANGES.txt
54