Apache Solr - search for everyone!

•Download as PPTX, PDF•

1 like•1,634 views

Apache Solr is an open-source enterprise search platform built on Apache Lucene. It started as an in-house project at CNET for adding search functionality to their website and was donated to the Apache Software Foundation in 2006. Key features of Solr include faceted search, filtering, hit highlighting, dynamic clustering, database integration, and replication to support scalability.

Technology

Apache Solr
- search for everyone!

http://www.flickr.com/photos/malikdhadha/

• Co-founder and R&D Director at Integrasco

• Founder and developer of Notpod

• Leader of javaBin Sørlandet

• Programmer and Open Source enthusiast

Jaran Nilsen
twitter.com/jarannilsen

What is search?
http://www.flickr.com/photos/denverjeffrey/5133538450/

http://www.flickr.com/photos/somegeekintn/3709203268/

This is Apache Solr
• Open Source enterprise search server from
Apache

• Built on Apache Lucene

• Offers additional features to those of Lucene

• Started out as a in-house CNET project for adding
search functionality to the CNET website in 2004

• Started out as a in-house CNET project for adding
search functionality to the CNET website.

• Donated to Apache Software Foundation in 2006

• Started out as a in-house CNET project for adding
search functionality to the CNET website.

• Donated to Apache Software Foundation in 2006

• Graduated from incubation status in 2007

• Since version 3.1 (March 2011), Solr and Lucene are
now sharing the same codebase.

+

• Since version 3.1 (March 2011), Solr and Lucene are
now sharing the same codebase.

• Meaning sharing of features and fixes between the
projects at a much higher rate

+

wget http://apache.uib.no/lucene/solr/3.6.1/apache-solr-3.6.1.tgz

tar xvf apache-solr-3.6.1.tgz

cd apache-solr-3.6.1/example/

java -jar start.jar

4 small steps...

cd exampledocs/

./post.sh ipod_other.xml

The obvious part – full text searching

http://www.flickr.com/photos/49889874@N05/6877840735/

• q=yourquery

• Example:
q=android AND ios&rows=100

The Schema
http://www.flickr.com/photos/14804582@N08/2111269218/

Key elements of schema.xml

• Unique identifer

• Default search field

• Types

• Fields and dynamic fields

• Copy fields

Solr configuration
http://www.flickr.com/photos/esetianto/4099842490/

Key elements of solrconfig.xml

• Settings for your search index

• Warm-up routines

• Cache settings

• Replication

• Update chain

Just add this to your URL:

• facet=true&facet.field=field

• Example:
facet=true&facet.field=language

Facet queries

&facet=true
&facet.query=price:[* TO 100]
&facet.query=price:[100 TO 200]
&facet.query=price:[200 TO 300]
&facet.query=price:[300 TO 400]
&facet.query=price:[400 TO 500]
&facet.query=price:[500 TO *]

Now you want to drill down!
http://www.flickr.com/photos/kk/4712925031/

Just add this to your URL:

• fq=field:value

• Example:
fq=source:facebook.com

TermVectorComponent

Term vector information aggregator

Scalability
http://www.flickr.com/photos/dickyfeng/3249837481/

Index sharding strategy

Solr Solr Solr Solr Solr
instance 1 instance 2 instance 3 instance 4 instance N

Index sharding strategy

Solr Solr Solr Solr Solr
instance 1 instance 2 instance 3 instance 4 instance N

ipod OR iphone Search

Just add this to your URL:

• shards=shard1,shard2

• Example:
q=android&shards=solr1.node.com/s
olr,solr2.node.com/solr,solr3.node.co
m/solr

Replication

Indexer Master

Slave android Search

Integration of Solr

http://www.flickr.com/photos/certified_su/229016531/

Solr has support for many different
languages
• Ruby
• PHP
• Java
• Scala
• Python
• .NET
• Perl
• JavaScript

Tips & Gotcha’s
Or; how to avoid the sinkholes!
http://www.flickr.com/photos/67165210@N00/4661419386/

«Figure out what kind of
searches you will be
doing»

«Spend a siginficant
amount of time
designing schema.xml»

«Add dynamic fields for
ALL your field types»

«Do not use Solr as your
primary data store!»

http://www.flickr.com/photos/11304375@N07/2046228644

http://www.flickr.com/photos/davidw/2201099990/

Thank you!

http://www.jeremiahblatz.com/personal/pics/Australia_Travel_Pictures_2009/day12/164_Sunrise_Great_Barrier_Reef.html

Presentation at FOSSETCON 2015 http://www.fossetcon.org/2015/sessions/getting-started-solr-open-source-search-platform-0 Solr is a very popular open source search engine which builds upon the capabilities of Lucene. It's the perfect tool to index loads of text and make it easily searchable. And it's very fast! Powerful features such as facets, typeahead, and "did you mean" help your users to quickly navigate through a very large dataset and find what they're looking for. A REST-style JSON interface makes it language-agnostic, you can even work with it straight from the command line using curl! A flexible plugin mechanism lets you augment your searches with complementary tools such as rich document parsing, text analysis, or your own custom code. In this session, learn the basics of making your content searchable with Solr.

Solr introduction

Lap Tran

Apache Solr - An Experience Report

Netcetera

1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr. After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr. In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them. The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.

Getting started with apache solr

Humayun Kabir

Introduction to Lucene and Solr - 1

YI-CHING WU

Andrzej bialecki lr-2013-dublin

lucenerevolution

Presented by Andrzej Bialecki, LucidWorks This session presents a set of Solr components for easy management of "sidecar indexes" - indexes that extend the main index with additional stored and / or indexed fields. Conceptually this can be viewed as an extension of the ExternalFileField or as a static join between documents from two collections. This functionality is useful in applications that require very different update regimes for the two parts of the index (e.g. main catalogue items combined with clickthroughs).

Introduction to Apache solr

Knoldus Inc.

Building your own search engine with Apache Solr

Biogeeks

Andrew Clegg : Building your own search engine with Apache Solr Apache Solr (http://lucene.apache.org/solr/) is an open-source search engine based on the popular Lucene library with a huge variety of features. In this talk, Andrew describes how he used it to build a high-performance search tool for protein and domain structures at CATH, and talks about some of the suprisingly cool things you can do with it beyond simple searching.

ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc

Introduction Apache Solr & PHP

Hiraq Citra M

Apache Solr Workshop

Saumitra Srivastav

Introduction to Apache Solr.

Solr Recipes

Apache Solr

Solr: 4 big features

Solr Presentation

Introduction to Apache Lucene/SolrRahul Jain

Apache Solr crash courseTommaso Teofili

Using Apache Solr

pittaya

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Ecommerce Solution Provider SysIQ

Solr Masterclass Bangkok, June 2014

Alexandre Rafalovitch

Lucene for Solr Developers

Erik Hatcher

You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.

Dev8d Apache Solr TutorialSourcesense

Apache Solr! Enterprise Search Solutions at your Fingertips!

Murshed Ahmmad Khan

Lucene's Latest (for Libraries)

Erik Hatcher

Integrating the Solr search engine

th0masr

New-Age Search through Apache Solr

Edureka!

SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).

Get the most out of Solr search with PHP

Paul Borgermans

After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well

ApacheCon Europe 2012 -Big Search 4 Big Data

OpenSource Connections

Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr.

Big Search with Big Data Principles

OpenSource Connections

What's hot

Building Intelligent Search Applications with Apache Solr and PHP5

israelekpo

Introduction Apache Solr & PHP

Hiraq Citra M

Apache Solr Workshop

Saumitra Srivastav

Introduction to Apache Solr.

Solr Recipes

Apache Solr

Solr: 4 big features

Solr Presentation

Introduction to Apache Lucene/SolrRahul Jain

Apache Solr crash courseTommaso Teofili

Using Apache Solr

pittaya

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Ecommerce Solution Provider SysIQ

Solr Masterclass Bangkok, June 2014

Alexandre Rafalovitch

Lucene for Solr Developers

Erik Hatcher

Dev8d Apache Solr TutorialSourcesense

Apache Solr! Enterprise Search Solutions at your Fingertips!

Murshed Ahmmad Khan

Lucene's Latest (for Libraries)

Erik Hatcher

Integrating the Solr search engine

th0masr

New-Age Search through Apache Solr

Edureka!

Get the most out of Solr search with PHP

Paul Borgermans

What's hot (20)

Building Intelligent Search Applications with Apache Solr and PHP5

Introduction Apache Solr & PHP

Apache Solr Workshop

Introduction to Apache Solr.

Solr Recipes

Apache Solr

Solr: 4 big features

Solr Presentation

Introduction to Apache Lucene/Solr

Apache Solr crash course

Using Apache Solr

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Solr Masterclass Bangkok, June 2014

Lucene for Solr Developers

Dev8d Apache Solr Tutorial

Apache Solr! Enterprise Search Solutions at your Fingertips!

Lucene's Latest (for Libraries)

Integrating the Solr search engine

New-Age Search through Apache Solr

Get the most out of Solr search with PHP

Similar to Apache Solr - search for everyone!

ApacheCon Europe 2012 -Big Search 4 Big Data

OpenSource Connections

Big Search with Big Data Principles

OpenSource Connections

Rails and the Apache SOLR Search Engine

David Keener

What good is content if nobody can find it? Many information sites are like icebergs, with only a limited amount of content directly accessible to users and the rest, the "underwater" potion, only available through searches. This talk shows how Rails web sites can take advantage of the world-class Apache SOLR search engine to provide sophisticated and customizable search features. We'll cover how to get started with SOLR, integrating with SOLR using the Sunspot gem, indexing, hit highlighting and other topics.

Yahoo! Search monkey API - CEBIT 2008Eric D.

Meet Solr For The Tirst Again

Varun Thacker

Apache Solr for TYPO3 what's new 2018

timohund

[LDSP] Search Engine Back End API Solution for Fast Prototyping

Jimmy Lai

Rapid Prototyping with Solr

Lucidworks (Archived)

Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.

Rapid prototyping with solr - By Erik Hatcher

lucenerevolution

Solr @ eBay Kleinanzeigen

Lucidworks (Archived)

Contributing to rails

Lukas Eppler

There are 900 Tickets currently in the Rails Lighthouse, some big, many small, and some relevant for Rails 3. Many of those issues could be fixed by people like us without too much effort. Using online resources and a short demo we want do find out how to use the Rails Lighthouse, how to clone edge rails and how to run the test suites, and how one can create patches out of the fixes to make them available to the developers.

Practical IronRuby

Shay Friedman

Intro to Apache Solr

Shalin Shekhar Mangar

State-of-the-Art Drupal Search with Apache Solr

guest432cd6

State-of-the-Art Drupal Search with Apache Solr

Robert Douglass

Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands

Ingo Renner

Solr search engine with multiple table relation

Jay Bharat

Apache Solr 5.0 and beyond

Anshum Gupta

SOLR

Matthew McCullough

The Guardian Open Platform Content API: Implementation

The Guardian Open Platform

The Guardian's Open Platform initiative enables partners to build applications with The Guardian. As part of this initiative, The Guardian provides the Content API - a rich interface to all The Guardian's content and metadata back to 1991 - over 1 million documents. This talk starts with a brief overview of the latest iteration of the content API. It will then cover how we implemented this in Scala using Solr, addressing real-world problems in creating an index of content: how we represented a complex relational database model in Solr how we keep the index up to date, meeting a sub-5 minute end-to-end update requirement how we update the schema as the API evolves, with zero downtime how we scale in response to unpredictable demand, using cloud services

Similar to Apache Solr - search for everyone! (20)

ApacheCon Europe 2012 -Big Search 4 Big Data

Big Search with Big Data Principles

Rails and the Apache SOLR Search Engine

Yahoo! Search monkey API - CEBIT 2008

Meet Solr For The Tirst Again

Apache Solr for TYPO3 what's new 2018

[LDSP] Search Engine Back End API Solution for Fast Prototyping

Rapid Prototyping with Solr

Rapid prototyping with solr - By Erik Hatcher

Solr @ eBay Kleinanzeigen

Contributing to rails

Practical IronRuby

Intro to Apache Solr

State-of-the-Art Drupal Search with Apache Solr

Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands

Solr search engine with multiple table relation

Apache Solr 5.0 and beyond

SOLR

The Guardian Open Platform Content API: Implementation

Recently uploaded

The Future of Platform Engineering

Jemma Hussein Allen

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

DevOps and Testing slides at DASA Connect

Kari Kakkonen

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

Recently uploaded (20)

The Future of Platform Engineering

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Accelerate your Kubernetes clusters with Varnish Caching

Essentials of Automations: Optimizing FME Workflows with Parameters

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

GraphRAG is All You need? LLM & Knowledge Graph

UiPath Test Automation using UiPath Test Suite series, part 4

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Key Trends Shaping the Future of Infrastructure.pdf

When stars align: studies in data quality, knowledge graphs, and machine lear...

Epistemic Interaction - tuning interfaces to provide information for AI support

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

DevOps and Testing slides at DASA Connect

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Apache Solr - search for everyone!

1. Apache Solr - search for everyone! http://www.flickr.com/photos/malikdhadha/

2. • Co-founder and R&D Director at Integrasco • Founder and developer of Notpod • Leader of javaBin Sørlandet • Programmer and Open Source enthusiast Jaran Nilsen twitter.com/jarannilsen

3. A global leader in social intelligence

4. What is search? http://www.flickr.com/photos/denverjeffrey/5133538450/

7. http://www.flickr.com/photos/somegeekintn/3709203268/

8. This is Apache Solr • Open Source enterprise search server from Apache • Built on Apache Lucene • Offers additional features to those of Lucene

9. First, a little history...

10. • Started out as a in-house CNET project for adding search functionality to the CNET website in 2004

11. • Started out as a in-house CNET project for adding search functionality to the CNET website. • Donated to Apache Software Foundation in 2006

12. • Started out as a in-house CNET project for adding search functionality to the CNET website. • Donated to Apache Software Foundation in 2006 • Graduated from incubation status in 2007

13. • Since version 3.1 (March 2011), Solr and Lucene are now sharing the same codebase. +

14. • Since version 3.1 (March 2011), Solr and Lucene are now sharing the same codebase. • Meaning sharing of features and fixes between the projects at a much higher rate +

15.

16. wget http://apache.uib.no/lucene/solr/3.6.1/apache-solr-3.6.1.tgz tar xvf apache-solr-3.6.1.tgz cd apache-solr-3.6.1/example/ java -jar start.jar 4 small steps...

17. ...and we’re up!

18.

19. cd exampledocs/ ./post.sh ipod_other.xml

20.

21. The obvious part – full text searching http://www.flickr.com/photos/49889874@N05/6877840735/

22. • q=yourquery • Example: q=android AND ios&rows=100

23.

24.

25.

26.

27. Don’t worry - it’s not just XML!

28. The Schema http://www.flickr.com/photos/14804582@N08/2111269218/

29. Key elements of schema.xml • Unique identifer • Default search field • Types • Fields and dynamic fields • Copy fields

30.

31.

32.

33. Solr configuration http://www.flickr.com/photos/esetianto/4099842490/

34. Key elements of solrconfig.xml • Settings for your search index • Warm-up routines • Cache settings • Replication • Update chain

35. Features http://xkcd.com/619/

36. Facets

37. Facets

38. Facets

39. Just add this to your URL: • facet=true&facet.field=field • Example: facet=true&facet.field=language

40.

41. Facet queries

42. Facet queries &facet=true &facet.query=price:[* TO 100] &facet.query=price:[100 TO 200] &facet.query=price:[200 TO 300] &facet.query=price:[300 TO 400] &facet.query=price:[400 TO 500] &facet.query=price:[500 TO *]

43. Now you want to drill down! http://www.flickr.com/photos/kk/4712925031/

44. Filter queries

45. Filter queries

46. Filter queries

47. Just add this to your URL: • fq=field:value • Example: fq=source:facebook.com

48. Produce «word clouds»

49. •TermsComponent •TermVectorComponent

50. TermVectorComponent Term vector information aggregator

51.

52. Scalability http://www.flickr.com/photos/dickyfeng/3249837481/

53. •Sharding •Replication

54. Index sharding strategy Solr Solr Solr Solr Solr instance 1 instance 2 instance 3 instance 4 instance N

55. Index sharding strategy Solr Solr Solr Solr Solr instance 1 instance 2 instance 3 instance 4 instance N ipod OR iphone Search

56. Index sharding strategy Solr Solr Solr Solr Solr instance 1 instance 2 instance 3 instance 4 instance N ipod OR iphone Search

57. Just add this to your URL: • shards=shard1,shard2 • Example: q=android&shards=solr1.node.com/s olr,solr2.node.com/solr,solr3.node.co m/solr

58. Replication Indexer Master Slave android Search

59. Replication configuration

60. Integration of Solr http://www.flickr.com/photos/certified_su/229016531/

61. Solr has support for many different languages • Ruby • PHP • Java • Scala • Python • .NET • Perl • JavaScript

62.

63. Tips & Gotcha’s Or; how to avoid the sinkholes! http://www.flickr.com/photos/67165210@N00/4661419386/

64. «What data do your clients need?»

65. «Figure out what kind of searches you will be doing»

66. «Spend a siginficant amount of time designing schema.xml»

67. «Add dynamic fields for ALL your field types»

68. «Do not use Solr as your primary data store!»

69. «The 20 million mark»

70. But most importantly... Don’t panic!

71. http://www.flickr.com/photos/11304375@N07/2046228644

72. http://www.flickr.com/photos/davidw/2201099990/

73. Thank you! http://www.jeremiahblatz.com/personal/pics/Australia_Travel_Pictures_2009/day12/164_Sunrise_Great_Barrier_Reef.html

Editor's Notes

Welcome...Purpose of this talk is to show you how easy it is to get started and give you an idea of some of the cool things you can do with Solr without too much effort. I love to get quick and easy to understand introductions to tools, that can help me get started quickly. Hopefully that’s what I’ll be able to provide you guys with today.Solr is a big project and it would be silly to attempt to cover everything in one evening, so I am going to focus on some of the features that I believe are the easiest to get started with and which I also am well familiar with and have had good experience with.So let’s get cracking...
Most of you already know me, but I see there are some new faces...
Since 2004, Integrasco has been providing social media methodologies and technologies to corporations, agencies, government regulators and other institutionsDedicated team of vertical expertsTechnology platform for analysis of Social Media where Apache Solr is a vital component.
But enough about me and Integrasco, i bet that’s not why most of you came to this meetup. Let’s talk search. What is search?
I bet many of you think of get this image in your head when you’re thinking about search. For most people today, the term search is equal to the name Google. Because of this «to google» has even entered our dictionaries as an officially accepted verb.Search equals input box and search button! In in many cases this is true. But as Google has become expert on, and which hopefully we will discover Solr can help us do as well – it’s not about searching... ->
It’s all about finding – helping your users find the information they are seeking – not having them search for it! You may think I am just playing with words here, but in my opinion there is a big different between «searching» and «finding». Let’s not fall into a discussion about semantics here, but let’s just say that our job as engineers is to help people find the information they are looking for, and spend our time efficiently doing that instead of spending our time designing search boxes and buttons!
You don’t want your users having spend hours searching on your site (or perhaps you do, if your revenue is driven by advertisement, but let’s also put that discussion on the shelf for later). We want to have our users finding what they’re looking at with little effort and give them a good experience. The great thing is that Solr can help you with just that, to find stuff. It has a lot of features that can improve your user experience and bring value to your data without too much effort. And as we will see it does not have to be linked to search engines at all! That’s eactly what I hope to show you some of today.
Apache Solr is an open source...... But before we get into the juicy details...
... lets learn a little history.
Commo codebase since March 2011Which means...
... They are now sharing features and fixes at a much higher rate than what was the case before.Many often wonder what’s the difference between Solr and Lucene, and up until this merge Solr was often considered to be an additional layer on top of Lucene, providing additional functionality that was not available in Lucene.
What I often find frustrating when looking at new frameworks and technologies, is in many cases the amount of time and resources you have to invest in order to try it out. I am not talking about reading documentation to get a deeper understanding of it – which you eventually have to, but the time you have to invest in order to just get started... I love those 3-point very quick «getting started» guides that just works. That’s what I hope to leave behind here today.To get started with Solr is actually very, very simple. Even though my examples here have been taken from a Unix environment, I’ve tried to make them as platform and language independent as possible. I myself work in a Java environment, but Solr is fully possible to use in many other environments - just as well as from Java.
I’ve tried to shave the process of getting Solr up and running down as much as possible, and I actually came down to these four steps. This is actually all you need to be up and running with a working instance of Solr. There is obviously a lot of configuration and customization you can do to tailor Solr to your specific needs, but to get started playing this is actually all you need to do.
Solr is served by Jetty on port 8983 by default, and opening the solr admin application in a browser yields this view. It’s by no means i candy, but then again – that is YOUR part of the JOB, to create a good looking application that helps find the valuable information Solr can serve you.As you see in the middle here, there’s a search box and a search button – who would have thought that? Let’s cick it!
Voila – there’s not much here yet.Obviously becaus we haven’t actually indexed anything yet. So let’s add some data.
Solr comes with a good set of example data that you can easily import and index to play around and see some of Solr capabilities. These example documents comes with the downloaded package and can be imported using the bash scripts available in the exampledocs folder. Let’s import a couple of ipod related documents and see what happens!
We refresh our search from before and... Behold! We have search results. Don’t be scared by the XML output here, there’s several tools available to work with the response!... So, now that we have some data – it takes us to the obvious part of Solr
Full text searching!This is what Solr was made for and whenever you have a set of documents that you want to query against, you should consider adding a solr instance to your system and query against the documents there – rather against a database. You will quickly see the value it adds! Now, as with most other things in feature, the querying is done via a parameter in the URL...
Very simple!Now lets go back to our admin user interface for another example – just to make sure we don’t scare anyone off with URL parameters in addition to the XML!
If you remember the query from before, it was «asterix, colon, asterix». Solr INDEXES are composed of FIELDS, and you can specify what field you want to search by querying «field, colon, value». So when we searched for «asterix, colon, asterix» we were searching for all values in all fields – hence giving us all documents in the index.Let’s see what happens when we search for documents where the PRICE field contains the value 19.95.
... Unsurprisingly we get documents matching the price 19.95. Ok, but what if you want to add your own test data to play around with? Let’s look quickly at the input format that were used.
It looks like this. A very simpe XML structure that you can easily modify to fit your needs.
Here is the full ipod example we just imported an tested with.
As I said earlier – do not be scared by the XML if you feel it’s overwhelming. There are several different options both consuming the response, as well as for indexing data into Solr. And you do not need to specify all your input data in files, there are various means for importing from databases and other commonly used data sources. The example we have looked at so far has been centered around products from a shop. Obviously this is not likely to fit all your needs. As with any other technology for handling data, you need to define a DOMAIN, or a SCHEMA....
In Solr, the Schema is one of the core configurations you need to master. It’s at the core of your Solr solution and it’s important to spend time designing this schema. Let’s see what it looks like...
Of the key elements, we find....
We define the different types we want to have available in our schema...
... And finally we define all the fields that we want in our schema. One neat thing with Solr, is the elements defined towards the bottom here – the DYNAMIC FIELDS. These allows us to specify any field we want on a document basis, after the index is up and running. This gives us good flexibility in cases where only some document contains some special data. An EXAMPLE from one of our projects at INTEGRASCO was when we received a batch of data we needed to make available for analysis together with our social data. This data contained a lot of META data that did not make sense for social data, but we needed to be able to perform queries across everything. In this situation, our dynamic fields became very handy, as we could simply define them when indexing the new Frankenstein data and it fit nicely in with the rest of our social data.
The second important part of Solr.
Solrconfig.xml is a complex XML document, but as important to spend time with as the Schema.xml – however, we will go into detail on it today as we don’t have time, but here’s some of the key elements that you configure in this file: Settings... How solr should handle your index files etc, Update chain ... You can define your own components for introducing into the indexing processing.
Now that we’ve got solr configured and up and running, lets look closer at what it can DO beyond simple searching!
The first thing I want to show is FACETS.Facets are category counts for search results.Meaning you can provide details about which categories your search results falls within. And as we will see, these categories can be a lot of things. Facets is in many situations also mentioned as FACETED NAVIGATION. Facets are a very powerful feature of solr when it comes to navigating your search results, and as we said in the beginning – it’s not just about searching, it’s about finding. And facets are a good way to provide your users with neat ways to navigate your data.
Here is another good example of how facets CAN be used to enable navigation. This is taken from Finn.no. Now, I do not know whether Finn.no is USING Solr and facets for this information, but it’s a good example of how facets can be used to enable powerful and user friendly navigation.
Yet another example of search facets, or faceted navigation. This time from the webshop Komplett.no, where – after you have done a search – you will get details about CATEGORIES, PRODUCERS and PRICE RANGES.
And this is how you do this withSolr. You simply add a few parameters to your QUERY URL, specifying what field, or fields you wish you receive facet information for – and you’re good to go!To illustrate the output I’ve done a search in our system using the example parameters here, and this would yield....
... This output. As you see here, we get the top 10 languages for my search. You’re not limited to only getting top 10, you choose how many facet counts you want.
Another cool thing is that you are not limited to just getting facets for predefined fields, you can also use FACET QUERIES to provide unique value for your data. Here’s one example from our system, where we use facet queries on date fields to generate statistics for provided queries and display trends over time.The way we do this is to add a facet query for each of the timespans we want in our chart. Then we simply render those facet query counts in a chart.Another very common way to use facet queries, is for instance to generate price range information in online stores – like we saw in the screenshot from Komplett.no. (which we have an example query sting for here ... On the next slide)
So, once we have our FACETs, we may want to drill further down into our data. This is done via FILTER QUERIES. These are constraints you apply to your query to filter the results you get back from Solr.
Here we are going to filter on our facets as we do it in our solution. We have performed a search, and we wish to drill down into this by only looking at Facebook data....
So we choose the facebook Facet...
... and this adds a filter to our existing query – limiting our search within the MEDIA type FACEBOOK.Very easy. And the way we actually do this, in Solr....
… is by adding “fq” parameters to our query. In this case, we added a filter for Facebook. And you can add as many of these as you want, so you’re not limited to filtering on only one thing at the time!Another important thing about filter queries are that they are a big help for optimizing your queries. The reason for this is that they are applied BEFORE other queries are done, which means they limit the number of documents Solr has to query.
The next thing I want to talk about, which in some cases is a bit more tricky to get working, but is definitively worth it for your users – is WORD CLOUDS (or tag clouds, term clouds – whatever you like to call them)We like to call them BUZZ CLOUDS because they tell us what’s buzzing around i the Social Media space.
The way you do this is via SPECIAL SEARCH COMPONENTS in Solr. These are components that you enable in your Solr config which provides access to additional information about the terms and term vectors Solr is using for your index and individual documents. The TERMS COMPONENT is a way to get information from Solr about all the terms available in your index, and how many documents they appear in. So you get the DOCUMENT FREQUENCY for all your terms. This can be used for creating an auto-suggest feature for your search box, although newer versions of Solr contains a special componet for doing just that – so it’s recommended to use that in stead. However, the reason I mention it here is that it is absolutely possible to use this to create such a word cloud for your index. What we decided was a better solution, althought a bit more processing heavy solution, was to use the TermVectorComponent. This component gives you term vector information for individual documents, rather than the entire index. This means we can get information about the term which are included in our resultset and provide a word cloud for that, rather than the entire index.
They way we are doing this is by performing a query, then aggregating the term vector information for the documents i our search result. This means a bit more processing since we have to traverse the documents we’re interested in – and aggregate the term frequencies for each document – we are counting how often the terms appear in our resultset. We then use this information to render the terms based on how often they appear....
... Which in the end enables us to display clouds like this
Now that we’ve looked at some of the powerful, easy to use features of Solr – how do we scale it?The question is, do we continue to build towards the sky, adding more and more memory and processing power, or do you spread it out across smaller instances and distribute across them? How do we ensure that our indexes keeps a decent size and that we can distribute our search? There’s two key features which are useful for this, and which are easily available in Solr...
... That’s SHARDING and REPLICATION.
From an INDEXINGperspective sharding is about determining where to index documents. You do this using a SHARDING STRATEGY. This can be anything from an ID BASED strategy, where you place documents in different Solr instances based on a unique ID. Other strategies can be by GEOGRAPHY, USERNAMES – or as we do at Integrasco, build a strategy based on DATES.The drawback here, is that Solr does not support sharded indexing out of the box, so you need to develop the framework for this yourself. The way we have done this is by creating an index writer for each shard and connecting these index writers to our sharding strategy so it selects the right one to dispatch documents to.
From the SEARCH perspective sharding is very easy. You simply provide your COORDINATING INSTANCE (which can be any of your available instances), with a list of the URLs to the shards you wish to distribute the search across. The coordinating instance will then handle DISTRIBUTION of queries to all the specified shards, and CONSOLIDATE the result before it’s send back to you.
And you do not have to query all shards, it’s a fairly easy job to get your client to only specify relevant shards when performing the query. For instance in the case of Integrasco, where we have sharded based on date – we really don’t have to query the entire cluster when we know we only want to search data for 2011. Then we can select the shards for 2011 and only specify those as the shards we wish the query sent off to.
And how do we do this? Again, by adding some parameters to our query URL containing the addresses of the shards we want to distribute the query across.
When it comes to replication, we build a common MASTER SLAVE relationship, where we have a set of masters where we do all our WRITING and a set of slaves where we do all our SEARCHING. This way we can keep the masters fairly cheap on resources as they do not require that we keep caches up to date in memory, as they do not need to handle any queries. Multiple slavesRepeaters
And the configuration for this is just a cople of lines of XML code in your SOLRCONFIG xml file, as you see examples of here.The only bad thing with the replication feature in Solr that we have seen is that is is very resource hungry when it is replicating. It sucks as much bandwidth and writes to disk as fast as it can, and in cases where there are large amounts of data in each replication batch this can quickly lead to unwanted starvation. So make sure you properly test your replication setup before going into production.
When creating slide decks, I love to go on CreativeCommons.org and search for the main topic of the section and use one of the first pictures as the background. That sometimes leads to interesting slides like this one...Anyways, it’s about integration of Solr – how do you integrate it in your project. And not surprisingly, Solr comes with a lot of client libraries for different languages...
As you can see Solr has good support for many different languages, and most of this are easy-to-use client libraries offering METHODS FOR ACCESSING all of the features we’ve covered in this talk. At Integrasco we’re mostly using Java, so the SolrJ library is what we’re using and it’s providing a very easy to use interface for accessing most of the Solr features we need – more or less right out of the box. I think the only two things we have had to do MORE EXTENSIVE DEVELOPMENT of on top of Solr and SolrJ is SHARDED INDEXING and the WORD CLOUDS using the TermVectorComponent.
A very quick example of the use of SolrJ in Java.
Use sufficient time to analyze and figure out what data your clients need and are interested in. This should be at the core of your planning and help shape the design of your schema and configuration.
Similarly, figure out what kind of searches you will be doing. And make sure your schema allows for these queries to happen. Also, make sure you configure the appropriate warm-up queries to ensure your cache is performing optimally for the type of queries you are doing.
Once you have the previous two covered, make sure you spend significant time designing your schema.xml. As I said this is at the core of your Solr solution and can be difficult to change once you have a lot of data indexed.
This is a very good practice to follow. You never know when you might need a new field for a document, make sure you’re prepared with dynamic fields.
Solr is not for storing data! The documentation even says you should not do it because you should expect the index to become corrupted at one time or another.
As discovered by Twitter (and perhaps others), 20 million documents is the max size you should aim for when it comes to your shard size.
There is a lot of levers and switches in Solr to use for optimization and shaping your search solution to exactly fit your needs. However, don’t mind this in the beginning. Mix and match from the default schema to get familiar with how it works. Start out very simple and build on it. You’ll learn it very quickly and see that search functionality is not just for the large players!
Looking a little further ahead into the future, there’s a lot of exciting things going on with Solr. One of the most exciting developments from my point of view is the Solr Cloud product – which is built on ZooKeeper and will offer a much better setup for large search clusters. Also, there’s work going on with building a Solr distribution based on Hadoop for large scale distributed indexing and search. So it will be very interesting to see where Solr takes us over the next couple of years!
This has been a little taste of what Solr has to offer – and hopefully shown you that it’s not just for large enterprises or people with huge amounts of data. Hopefully you have seen that Solr can fit very well into situation where you normally would not think of placing a search server, and that you start thinking of new ways help your users FIND what they’re looking for – or even better, help them find what they did not know they needed to look for 

Apache Solr - search for everyone!

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Solr - search for everyone!

Similar to Apache Solr - search for everyone! (20)

More from Jaran Flaath

More from Jaran Flaath (10)

Recently uploaded

Recently uploaded (20)

Apache Solr - search for everyone!

Editor's Notes