Apache Solr is an open-source enterprise search platform built on Apache Lucene. It started as an in-house project at CNET for adding search functionality to their website and was donated to the Apache Software Foundation in 2006. Key features of Solr include faceted search, filtering, hit highlighting, dynamic clustering, database integration, and replication to support scalability.
Presentation at FOSSETCON 2015
http://www.fossetcon.org/2015/sessions/getting-started-solr-open-source-search-platform-0
Solr is a very popular open source search engine which builds upon the capabilities of Lucene. It's the perfect tool to index loads of text and make it easily searchable. And it's very fast!
Powerful features such as facets, typeahead, and "did you mean" help your users to quickly navigate through a very large dataset and find what they're looking for.
A REST-style JSON interface makes it language-agnostic, you can even work with it straight from the command line using curl!
A flexible plugin mechanism lets you augment your searches with complementary tools such as rich document parsing, text analysis, or your own custom code.
In this session, learn the basics of making your content searchable with Solr.
1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr.
After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr.
In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them.
The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.
Presented by Andrzej Bialecki, LucidWorks
This session presents a set of Solr components for easy management of "sidecar indexes" - indexes that extend the main index with additional stored and / or indexed fields. Conceptually this can be viewed as an extension of the ExternalFileField or as a static join between documents from two collections. This functionality is useful in applications that require very different update regimes for the two parts of the index (e.g. main catalogue items combined with clickthroughs).
Apache Solr is a popular, open source enterprise search platform built on the Java based search engine library Apache Lucene. It powers the search and navigation features of many of the world's largest companies like Netflix, Instagram, LinkedIn, Twitter and eBay, etc.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
Presentation at FOSSETCON 2015
http://www.fossetcon.org/2015/sessions/getting-started-solr-open-source-search-platform-0
Solr is a very popular open source search engine which builds upon the capabilities of Lucene. It's the perfect tool to index loads of text and make it easily searchable. And it's very fast!
Powerful features such as facets, typeahead, and "did you mean" help your users to quickly navigate through a very large dataset and find what they're looking for.
A REST-style JSON interface makes it language-agnostic, you can even work with it straight from the command line using curl!
A flexible plugin mechanism lets you augment your searches with complementary tools such as rich document parsing, text analysis, or your own custom code.
In this session, learn the basics of making your content searchable with Solr.
1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr.
After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr.
In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them.
The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.
Presented by Andrzej Bialecki, LucidWorks
This session presents a set of Solr components for easy management of "sidecar indexes" - indexes that extend the main index with additional stored and / or indexed fields. Conceptually this can be viewed as an extension of the ExternalFileField or as a static join between documents from two collections. This functionality is useful in applications that require very different update regimes for the two parts of the index (e.g. main catalogue items combined with clickthroughs).
Apache Solr is a popular, open source enterprise search platform built on the Java based search engine library Apache Lucene. It powers the search and navigation features of many of the world's largest companies like Netflix, Instagram, LinkedIn, Twitter and eBay, etc.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Solr is a highly scalable and fast open source enterprise search platform from the Apache Lucene project. Let's explore why some of the largest Internet sites in the world are giving a preference to its many exciting features.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well
Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Solr is a highly scalable and fast open source enterprise search platform from the Apache Lucene project. Let's explore why some of the largest Internet sites in the world are giving a preference to its many exciting features.
You’re Solr powered, and needing to customize its capabilities. Apache Solr is flexibly architected, with practically everything pluggable. Under the hood, Solr is driven by the well-known Apache Lucene. Lucene for Solr Developers will guide you through the various ways in which Solr can be extended, customized, and enhanced with a bit of Lucene API know-how. We’ll delve into improving analysis with custom character mapping, tokenizing, and token filtering extensions; show why and how to implement specialized query parsing, and how to add your own search and update request handling.
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well
Got hundreds of millions of documents to search? DataImportHandler blowing up while indexing? Random thread errors thrown by Solr Cellduring document extraction? Query performance collapsing? Then you've searching at Big Data scale. This talk will focus on the underlying principles of Big Data, and how to apply them to Solr. This talk isn't a deep dive into SolrCloud, though we'll talk about it. It also isn't meant to be a talk on traditional scaling of Solr.
Rails and the Apache SOLR Search EngineDavid Keener
What good is content if nobody can find it? Many information sites are like icebergs, with only a limited amount of content directly accessible to users and the rest, the "underwater" potion, only available through searches. This talk shows how Rails web sites can take advantage of the world-class Apache SOLR search engine to provide sophisticated and customizable search features. We'll cover how to get started with SOLR, integrating with SOLR using the Sunspot gem, indexing, hit highlighting and other topics.
Anyone who has tried integrating search in their application knows how good and powerful Solr is but always wished it was simpler to get started and simpler to take it to production.
I will talk about the recent features added to Solr making it easier for users and some of the changes we plan on adding soon to make the experience even better.
[LDSP] Search Engine Back End API Solution for Fast PrototypingJimmy Lai
In this slides, I propose a solution for fast prototyping of search engine back end API. It consists of Linux + Django + Solr + Python (LDSP), and all are open source softwares. The solution also provides code repository with automation scripts. Everyone can build a Search Engine back end API in seconds by exploiting LDSP.
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Attendees will learn how eBay Germany has implemented Solr, why Solr was selected, which Solr features are utilized. and how Solr is configured and used in production. Recommended best practices will be profiled alomng with eBay Kleinanzeigen plans for future deployment of Solr.
There are 900 Tickets currently in the Rails Lighthouse, some big, many small, and some relevant for Rails 3. Many of those issues could be fixed by people like us without too much effort.
Using online resources and a short demo we want do find out how to use the Rails Lighthouse, how to clone edge rails and how to run the test suites, and how one can create patches out of the fixes to make them available to the developers.
IronRuby has been around for a while. This presentation is about the practical uses of IronRuby. It contains several different use cases that you can immediately go and use to enhance your everyday work.
Apache Solr for TYPO3 at TYPO3 Usergroup Day NetherlandsIngo Renner
Presentation of an extension to integrate Apache Solr for TYPO3. Apache Solr is an enterprise search server, TYPO3 is a mid-to large size enterprise Content Management System; combining both results in great user search experience.
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
The Guardian's Open Platform initiative enables partners to build applications with The Guardian. As part of this initiative, The Guardian provides the Content API - a rich interface to all The Guardian's content and metadata back to 1991 - over 1 million documents. This talk starts with a brief overview of the latest iteration of the content API. It will then cover how we implemented this in Scala using Solr, addressing real-world problems in creating an index of content:
how we represented a complex relational database model in Solr
how we keep the index up to date, meeting a sub-5 minute end-to-end update requirement
how we update the schema as the API evolves, with zero downtime
how we scale in response to unpredictable demand, using cloud services
Similar to Apache Solr - search for everyone! (20)
This talk brings up a couple of points about today's testing regime and how I believe these are in many ways working against us - especially when it comes to maintaining existing code.
Talk given at the University of Agder to the first-year students on how to deliver great software. What tools do we use as developers to deliver great software? What can students expect when they're done with their degree?
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Apache Solr - search for everyone!
1. Apache Solr
- search for everyone!
http://www.flickr.com/photos/malikdhadha/
2. • Co-founder and R&D Director at Integrasco
• Founder and developer of Notpod
• Leader of javaBin Sørlandet
• Programmer and Open Source enthusiast
Jaran Nilsen
twitter.com/jarannilsen
10. • Started out as a in-house CNET project for adding
search functionality to the CNET website in 2004
11. • Started out as a in-house CNET project for adding
search functionality to the CNET website.
• Donated to Apache Software Foundation in 2006
12. • Started out as a in-house CNET project for adding
search functionality to the CNET website.
• Donated to Apache Software Foundation in 2006
• Graduated from incubation status in 2007
13. • Since version 3.1 (March 2011), Solr and Lucene are
now sharing the same codebase.
+
14. • Since version 3.1 (March 2011), Solr and Lucene are
now sharing the same codebase.
• Meaning sharing of features and fixes between the
projects at a much higher rate
+
42. Facet queries
&facet=true
&facet.query=price:[* TO 100]
&facet.query=price:[100 TO 200]
&facet.query=price:[200 TO 300]
&facet.query=price:[300 TO 400]
&facet.query=price:[400 TO 500]
&facet.query=price:[500 TO *]
43. Now you want to drill down!
http://www.flickr.com/photos/kk/4712925031/
Welcome...Purpose of this talk is to show you how easy it is to get started and give you an idea of some of the cool things you can do with Solr without too much effort. I love to get quick and easy to understand introductions to tools, that can help me get started quickly. Hopefully that’s what I’ll be able to provide you guys with today.Solr is a big project and it would be silly to attempt to cover everything in one evening, so I am going to focus on some of the features that I believe are the easiest to get started with and which I also am well familiar with and have had good experience with.So let’s get cracking...
Most of you already know me, but I see there are some new faces...
Since 2004, Integrasco has been providing social media methodologies and technologies to corporations, agencies, government regulators and other institutionsDedicated team of vertical expertsTechnology platform for analysis of Social Media where Apache Solr is a vital component.
But enough about me and Integrasco, i bet that’s not why most of you came to this meetup. Let’s talk search. What is search?
I bet many of you think of get this image in your head when you’re thinking about search. For most people today, the term search is equal to the name Google. Because of this «to google» has even entered our dictionaries as an officially accepted verb.Search equals input box and search button! In in many cases this is true. But as Google has become expert on, and which hopefully we will discover Solr can help us do as well – it’s not about searching... ->
It’s all about finding – helping your users find the information they are seeking – not having them search for it! You may think I am just playing with words here, but in my opinion there is a big different between «searching» and «finding». Let’s not fall into a discussion about semantics here, but let’s just say that our job as engineers is to help people find the information they are looking for, and spend our time efficiently doing that instead of spending our time designing search boxes and buttons!
You don’t want your users having spend hours searching on your site (or perhaps you do, if your revenue is driven by advertisement, but let’s also put that discussion on the shelf for later). We want to have our users finding what they’re looking at with little effort and give them a good experience. The great thing is that Solr can help you with just that, to find stuff. It has a lot of features that can improve your user experience and bring value to your data without too much effort. And as we will see it does not have to be linked to search engines at all! That’s eactly what I hope to show you some of today.
Apache Solr is an open source...... But before we get into the juicy details...
... lets learn a little history.
Commo codebase since March 2011Which means...
... They are now sharing features and fixes at a much higher rate than what was the case before.Many often wonder what’s the difference between Solr and Lucene, and up until this merge Solr was often considered to be an additional layer on top of Lucene, providing additional functionality that was not available in Lucene.
What I often find frustrating when looking at new frameworks and technologies, is in many cases the amount of time and resources you have to invest in order to try it out. I am not talking about reading documentation to get a deeper understanding of it – which you eventually have to, but the time you have to invest in order to just get started... I love those 3-point very quick «getting started» guides that just works. That’s what I hope to leave behind here today.To get started with Solr is actually very, very simple. Even though my examples here have been taken from a Unix environment, I’ve tried to make them as platform and language independent as possible. I myself work in a Java environment, but Solr is fully possible to use in many other environments - just as well as from Java.
I’ve tried to shave the process of getting Solr up and running down as much as possible, and I actually came down to these four steps. This is actually all you need to be up and running with a working instance of Solr. There is obviously a lot of configuration and customization you can do to tailor Solr to your specific needs, but to get started playing this is actually all you need to do.
Solr is served by Jetty on port 8983 by default, and opening the solr admin application in a browser yields this view. It’s by no means i candy, but then again – that is YOUR part of the JOB, to create a good looking application that helps find the valuable information Solr can serve you.As you see in the middle here, there’s a search box and a search button – who would have thought that? Let’s cick it!
Voila – there’s not much here yet.Obviously becaus we haven’t actually indexed anything yet. So let’s add some data.
Solr comes with a good set of example data that you can easily import and index to play around and see some of Solr capabilities. These example documents comes with the downloaded package and can be imported using the bash scripts available in the exampledocs folder. Let’s import a couple of ipod related documents and see what happens!
We refresh our search from before and... Behold! We have search results. Don’t be scared by the XML output here, there’s several tools available to work with the response!... So, now that we have some data – it takes us to the obvious part of Solr
Full text searching!This is what Solr was made for and whenever you have a set of documents that you want to query against, you should consider adding a solr instance to your system and query against the documents there – rather against a database. You will quickly see the value it adds! Now, as with most other things in feature, the querying is done via a parameter in the URL...
Very simple!Now lets go back to our admin user interface for another example – just to make sure we don’t scare anyone off with URL parameters in addition to the XML!
If you remember the query from before, it was «asterix, colon, asterix». Solr INDEXES are composed of FIELDS, and you can specify what field you want to search by querying «field, colon, value». So when we searched for «asterix, colon, asterix» we were searching for all values in all fields – hence giving us all documents in the index.Let’s see what happens when we search for documents where the PRICE field contains the value 19.95.
... Unsurprisingly we get documents matching the price 19.95. Ok, but what if you want to add your own test data to play around with? Let’s look quickly at the input format that were used.
It looks like this. A very simpe XML structure that you can easily modify to fit your needs.
Here is the full ipod example we just imported an tested with.
As I said earlier – do not be scared by the XML if you feel it’s overwhelming. There are several different options both consuming the response, as well as for indexing data into Solr. And you do not need to specify all your input data in files, there are various means for importing from databases and other commonly used data sources. The example we have looked at so far has been centered around products from a shop. Obviously this is not likely to fit all your needs. As with any other technology for handling data, you need to define a DOMAIN, or a SCHEMA....
In Solr, the Schema is one of the core configurations you need to master. It’s at the core of your Solr solution and it’s important to spend time designing this schema. Let’s see what it looks like...
Of the key elements, we find....
We define the different types we want to have available in our schema...
... And finally we define all the fields that we want in our schema. One neat thing with Solr, is the elements defined towards the bottom here – the DYNAMIC FIELDS. These allows us to specify any field we want on a document basis, after the index is up and running. This gives us good flexibility in cases where only some document contains some special data. An EXAMPLE from one of our projects at INTEGRASCO was when we received a batch of data we needed to make available for analysis together with our social data. This data contained a lot of META data that did not make sense for social data, but we needed to be able to perform queries across everything. In this situation, our dynamic fields became very handy, as we could simply define them when indexing the new Frankenstein data and it fit nicely in with the rest of our social data.
The second important part of Solr.
Solrconfig.xml is a complex XML document, but as important to spend time with as the Schema.xml – however, we will go into detail on it today as we don’t have time, but here’s some of the key elements that you configure in this file: Settings... How solr should handle your index files etc, Update chain ... You can define your own components for introducing into the indexing processing.
Now that we’ve got solr configured and up and running, lets look closer at what it can DO beyond simple searching!
The first thing I want to show is FACETS.Facets are category counts for search results.Meaning you can provide details about which categories your search results falls within. And as we will see, these categories can be a lot of things. Facets is in many situations also mentioned as FACETED NAVIGATION. Facets are a very powerful feature of solr when it comes to navigating your search results, and as we said in the beginning – it’s not just about searching, it’s about finding. And facets are a good way to provide your users with neat ways to navigate your data.
Here is another good example of how facets CAN be used to enable navigation. This is taken from Finn.no. Now, I do not know whether Finn.no is USING Solr and facets for this information, but it’s a good example of how facets can be used to enable powerful and user friendly navigation.
Yet another example of search facets, or faceted navigation. This time from the webshop Komplett.no, where – after you have done a search – you will get details about CATEGORIES, PRODUCERS and PRICE RANGES.
And this is how you do this withSolr. You simply add a few parameters to your QUERY URL, specifying what field, or fields you wish you receive facet information for – and you’re good to go!To illustrate the output I’ve done a search in our system using the example parameters here, and this would yield....
... This output. As you see here, we get the top 10 languages for my search. You’re not limited to only getting top 10, you choose how many facet counts you want.
Another cool thing is that you are not limited to just getting facets for predefined fields, you can also use FACET QUERIES to provide unique value for your data. Here’s one example from our system, where we use facet queries on date fields to generate statistics for provided queries and display trends over time.The way we do this is to add a facet query for each of the timespans we want in our chart. Then we simply render those facet query counts in a chart.Another very common way to use facet queries, is for instance to generate price range information in online stores – like we saw in the screenshot from Komplett.no. (which we have an example query sting for here ... On the next slide)
So, once we have our FACETs, we may want to drill further down into our data. This is done via FILTER QUERIES. These are constraints you apply to your query to filter the results you get back from Solr.
Here we are going to filter on our facets as we do it in our solution. We have performed a search, and we wish to drill down into this by only looking at Facebook data....
So we choose the facebook Facet...
... and this adds a filter to our existing query – limiting our search within the MEDIA type FACEBOOK.Very easy. And the way we actually do this, in Solr....
… is by adding “fq” parameters to our query. In this case, we added a filter for Facebook. And you can add as many of these as you want, so you’re not limited to filtering on only one thing at the time!Another important thing about filter queries are that they are a big help for optimizing your queries. The reason for this is that they are applied BEFORE other queries are done, which means they limit the number of documents Solr has to query.
The next thing I want to talk about, which in some cases is a bit more tricky to get working, but is definitively worth it for your users – is WORD CLOUDS (or tag clouds, term clouds – whatever you like to call them)We like to call them BUZZ CLOUDS because they tell us what’s buzzing around i the Social Media space.
The way you do this is via SPECIAL SEARCH COMPONENTS in Solr. These are components that you enable in your Solr config which provides access to additional information about the terms and term vectors Solr is using for your index and individual documents. The TERMS COMPONENT is a way to get information from Solr about all the terms available in your index, and how many documents they appear in. So you get the DOCUMENT FREQUENCY for all your terms. This can be used for creating an auto-suggest feature for your search box, although newer versions of Solr contains a special componet for doing just that – so it’s recommended to use that in stead. However, the reason I mention it here is that it is absolutely possible to use this to create such a word cloud for your index. What we decided was a better solution, althought a bit more processing heavy solution, was to use the TermVectorComponent. This component gives you term vector information for individual documents, rather than the entire index. This means we can get information about the term which are included in our resultset and provide a word cloud for that, rather than the entire index.
They way we are doing this is by performing a query, then aggregating the term vector information for the documents i our search result. This means a bit more processing since we have to traverse the documents we’re interested in – and aggregate the term frequencies for each document – we are counting how often the terms appear in our resultset. We then use this information to render the terms based on how often they appear....
... Which in the end enables us to display clouds like this
Now that we’ve looked at some of the powerful, easy to use features of Solr – how do we scale it?The question is, do we continue to build towards the sky, adding more and more memory and processing power, or do you spread it out across smaller instances and distribute across them? How do we ensure that our indexes keeps a decent size and that we can distribute our search? There’s two key features which are useful for this, and which are easily available in Solr...
... That’s SHARDING and REPLICATION.
From an INDEXINGperspective sharding is about determining where to index documents. You do this using a SHARDING STRATEGY. This can be anything from an ID BASED strategy, where you place documents in different Solr instances based on a unique ID. Other strategies can be by GEOGRAPHY, USERNAMES – or as we do at Integrasco, build a strategy based on DATES.The drawback here, is that Solr does not support sharded indexing out of the box, so you need to develop the framework for this yourself. The way we have done this is by creating an index writer for each shard and connecting these index writers to our sharding strategy so it selects the right one to dispatch documents to.
From the SEARCH perspective sharding is very easy. You simply provide your COORDINATING INSTANCE (which can be any of your available instances), with a list of the URLs to the shards you wish to distribute the search across. The coordinating instance will then handle DISTRIBUTION of queries to all the specified shards, and CONSOLIDATE the result before it’s send back to you.
And you do not have to query all shards, it’s a fairly easy job to get your client to only specify relevant shards when performing the query. For instance in the case of Integrasco, where we have sharded based on date – we really don’t have to query the entire cluster when we know we only want to search data for 2011. Then we can select the shards for 2011 and only specify those as the shards we wish the query sent off to.
And how do we do this? Again, by adding some parameters to our query URL containing the addresses of the shards we want to distribute the query across.
When it comes to replication, we build a common MASTER SLAVE relationship, where we have a set of masters where we do all our WRITING and a set of slaves where we do all our SEARCHING. This way we can keep the masters fairly cheap on resources as they do not require that we keep caches up to date in memory, as they do not need to handle any queries. Multiple slavesRepeaters
And the configuration for this is just a cople of lines of XML code in your SOLRCONFIG xml file, as you see examples of here.The only bad thing with the replication feature in Solr that we have seen is that is is very resource hungry when it is replicating. It sucks as much bandwidth and writes to disk as fast as it can, and in cases where there are large amounts of data in each replication batch this can quickly lead to unwanted starvation. So make sure you properly test your replication setup before going into production.
When creating slide decks, I love to go on CreativeCommons.org and search for the main topic of the section and use one of the first pictures as the background. That sometimes leads to interesting slides like this one...Anyways, it’s about integration of Solr – how do you integrate it in your project. And not surprisingly, Solr comes with a lot of client libraries for different languages...
As you can see Solr has good support for many different languages, and most of this are easy-to-use client libraries offering METHODS FOR ACCESSING all of the features we’ve covered in this talk. At Integrasco we’re mostly using Java, so the SolrJ library is what we’re using and it’s providing a very easy to use interface for accessing most of the Solr features we need – more or less right out of the box. I think the only two things we have had to do MORE EXTENSIVE DEVELOPMENT of on top of Solr and SolrJ is SHARDED INDEXING and the WORD CLOUDS using the TermVectorComponent.
A very quick example of the use of SolrJ in Java.
Use sufficient time to analyze and figure out what data your clients need and are interested in. This should be at the core of your planning and help shape the design of your schema and configuration.
Similarly, figure out what kind of searches you will be doing. And make sure your schema allows for these queries to happen. Also, make sure you configure the appropriate warm-up queries to ensure your cache is performing optimally for the type of queries you are doing.
Once you have the previous two covered, make sure you spend significant time designing your schema.xml. As I said this is at the core of your Solr solution and can be difficult to change once you have a lot of data indexed.
This is a very good practice to follow. You never know when you might need a new field for a document, make sure you’re prepared with dynamic fields.
Solr is not for storing data! The documentation even says you should not do it because you should expect the index to become corrupted at one time or another.
As discovered by Twitter (and perhaps others), 20 million documents is the max size you should aim for when it comes to your shard size.
There is a lot of levers and switches in Solr to use for optimization and shaping your search solution to exactly fit your needs. However, don’t mind this in the beginning. Mix and match from the default schema to get familiar with how it works. Start out very simple and build on it. You’ll learn it very quickly and see that search functionality is not just for the large players!
Looking a little further ahead into the future, there’s a lot of exciting things going on with Solr. One of the most exciting developments from my point of view is the Solr Cloud product – which is built on ZooKeeper and will offer a much better setup for large search clusters. Also, there’s work going on with building a Solr distribution based on Hadoop for large scale distributed indexing and search. So it will be very interesting to see where Solr takes us over the next couple of years!
This has been a little taste of what Solr has to offer – and hopefully shown you that it’s not just for large enterprises or people with huge amounts of data. Hopefully you have seen that Solr can fit very well into situation where you normally would not think of placing a search server, and that you start thinking of new ways help your users FIND what they’re looking for – or even better, help them find what they did not know they needed to look for