After a thorough overview of the main features and benefits of Apache Solr (an open source search server), the architecture of Solr and strategies to adopt it for your PHP application and data model will be presented. The main lessons learned around dealing with a mix of structured and non-structured content, multilingual aspects, tuning and the various state-of-the-art features of Solr will be shared as well
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
The talk presents the sfSolrPlugin which transparently integrates the Solr search engine into symfony.
The talk explains :
* the features of the solr search engine
* how to integrate the search engine into symfony
* complex search : faceted and geolocalized search
* usage example : http://www.menugourmet.com and http://resolutionfinder.org
Overview of Solr 6.2 examples, including features they have and challenges they present. A contrasting demonstration of a minimal viable example. A step-by-step deconstruction of "films" example to show what part of shipped examples are not actually needed.
Building your own search engine with Apache SolrBiogeeks
Andrew Clegg : Building your own search engine with Apache Solr
Apache Solr (http://lucene.apache.org/solr/) is an open-source search
engine based on the popular Lucene library with a huge variety of
features. In this talk, Andrew describes how he used it to build a
high-performance search tool for protein and domain structures at
CATH, and talks about some of the suprisingly cool things you can do
with it beyond simple searching.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Solr is a highly scalable and fast open source enterprise search platform from the Apache Lucene project. Let's explore why some of the largest Internet sites in the world are giving a preference to its many exciting features.
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Understanding and visualizing solr explain information - Rafal Kuclucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Talk and presentation about how to use, understand and visualize Solr 'explain' information—essential output from Solr that lets you better tune and debug your search application. In the talk, I'll show the free software that is in development right now, that visualize Solr 'explain' information, such as how the score of the documents were counted, from what it is taken, how it was counted,which tokens mattered the most, and so on.
Tips for Tuning Solr Search: No Coding RequiredAcquia
Helping online visitors easily find what they’re looking for is key to a website’s success. In this webinar, you’ll learn how to improve search in ways that don’t require any coding or code changes. We’ll show you easy modifications to tune up the relevancy to more advanced topics, such as altering the display or configuring advanced facets.
Acquia’s Senior Search Engineer, Nick Veenhof , will guide you step by step through improving the search functionality of a website, using an in-house version of an actual conference site.
Some of the search topics we'll demonstrate include:
• Clean faceted URL’s
• Adding sliders, checkboxes, sorting and more to your facets
• Complete customization of your search displays using Display Suite
• Tuning relevancy by using Solr optimizations
This webinar will make use of the Facet API module suite in combination with the Apache Solr Search Integration module suite. We'll also use some generic modules to improve the search results that are independent of the search technology that is used. All of the examples shown are fully supported by Acquia Search.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Solr is a highly scalable and fast open source enterprise search platform from the Apache Lucene project. Let's explore why some of the largest Internet sites in the world are giving a preference to its many exciting features.
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Understanding and visualizing solr explain information - Rafal Kuclucenerevolution
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Talk and presentation about how to use, understand and visualize Solr 'explain' information—essential output from Solr that lets you better tune and debug your search application. In the talk, I'll show the free software that is in development right now, that visualize Solr 'explain' information, such as how the score of the documents were counted, from what it is taken, how it was counted,which tokens mattered the most, and so on.
Tips for Tuning Solr Search: No Coding RequiredAcquia
Helping online visitors easily find what they’re looking for is key to a website’s success. In this webinar, you’ll learn how to improve search in ways that don’t require any coding or code changes. We’ll show you easy modifications to tune up the relevancy to more advanced topics, such as altering the display or configuring advanced facets.
Acquia’s Senior Search Engineer, Nick Veenhof , will guide you step by step through improving the search functionality of a website, using an in-house version of an actual conference site.
Some of the search topics we'll demonstrate include:
• Clean faceted URL’s
• Adding sliders, checkboxes, sorting and more to your facets
• Complete customization of your search displays using Display Suite
• Tuning relevancy by using Solr optimizations
This webinar will make use of the Facet API module suite in combination with the Apache Solr Search Integration module suite. We'll also use some generic modules to improve the search results that are independent of the search technology that is used. All of the examples shown are fully supported by Acquia Search.
1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr.
After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr.
In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them.
The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.
This is an intro to Sphinx and PHP. It will take you through the very basics of how Sphinx works, how you can set up an index, and using the mysql client to search your index. Then, it culminates in a quick little PHP script that builds a small search interface around your index. I will be posting the example code into my github account soon.
This presentation was given to the LV PHP meetup on August 5th.
Open Education Week presentation as part of session organised by Gabi Witthaus for her SCORE fellowship:
http://toucansproject.wordpress.com/2012/03/07/rich-sharing/
Matching presentation from Martin Weller: http://www.slideshare.net/mweller/standing-up-for-little-oer
And Sandra Wills presentation: http://www.slideshare.net/Sandrawills/oeru-sandra
cC-BY: PAtrick McAndrew
For grades eight or nine. Gives a brief overview of the characteristics of narration and has a super Youtube video from Flocabulary [http://www.youtube.com/user/FlocabularyYT} which definitely appeals to young learners.
For the first time ever, there are four generations (Traditionals, Baby Boomers, Generation X and Millenials) in the workplace. This can be the root of many communication issues, ranging from employee interaction to job seekers interfacing with younger hiring managers and recruiters. In this presentation, I attempt to shed some light on generational characteristics as well as share relevant communication tips. Thanks to all who attended the event and requested the content. Any questions, let me know in the comments below or by contacting me (info on the last slide). I am also considering creation of a companion video for this presentation; LMK what you think.
Apache Solr is a state of the art, high performance and scalable search server you can use in your (PHP) application to provide a very feature rich search experience. Besides full-text search, it also provides spell checking, highlighting, facets and powerful functions that can put it in the realm of a general information retrieval engine, replacing complex database queries you would (need to) use otherwise.
Use cases range from e-commerce, real-estate database search, intranets/extranets, content management systems, document management systems and anything that offers exploration of structured and/or unstructured information. The recent addition of geo-aware features makes even location searches possible.
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
Apache Solr was always built on strong Information Retrieval/Natural Language Processing foundation. And, in recent versions, even more Artificial Intelligence features, techniques and integrations were added to the Solr.
This presentation covers some classic (and hidden gems) AI elements that Solr supported for long time as well as the most recent features that are not even fully documented yet.
The presentation was made with references to Solr 7.4.
The presentation describes what is Apache Solr, how it could be used. There is apache solr overview, performance tuning tips and advanced features description
SOLR has been integrated with OpenCms 9.5 tighter than ever before. With 9.5, all content items in the OpenCms repository can be indexed by SOLR, in all available languages. This deep integration allows to use SOLR not only for basic full text searches, but also as an API extension to create advanced queries for all kinds of contents.
In this workshop, Sören shows how to use SOLR for advanced content retrieval in OpenCms. He combines attributes, properties and XML field values in a query that generates an editable list of elements with a content collector. He also explains how to use advanced features such as individual content field mappings to make your custom content types easily findable.
OpenCms 8.5 integrates Apache Solr. And not only for full text search, but as a powerful query engine as well.
Imagine you want to show a list of "all resources of type news, that have changed since yesterday, where property X has the value Y" on your web page. Sure, there are API methods in OpenCms to load resources based on the type, on the date of change, or on the value of a specific property. But for many common use case combinations, there is no single API call. This means if you create a collector, you often end up sorting out the results of the initial API query in code.
In this session, Rüdiger will show how Apache Solr has been integrated in OpenCms 8.5. He will explain how to create improved front-end full text search functions with advanced options like faceting and spell check suggestions. And he will explain how to use Solr to directly read resources from the OpenCms VFS, allowing query combinations that combine resource attributes, properties and content in a powerful new way.
Introduction to Solr. A brief introduction to Solr for the resources who wants to get trained on Solr.
1. Introduction to Solr
2. Solr Terminologies
3.Installation and Configuration
4. Configuration files schema.xml and solrconfig.xml
5. Features of SOLR
a. Hit Highlighting
Auto Complete / Suggester
Stop words
Synonyms
SpellCheck
Geo Spatial Search
Result Grouping
Query Syntax
Query Boosting
Content Spotlighting
Block Record / Remove URL Feature
Content Spotlighting / Merchandising / Banner / Elevate
Block Record / Remove URL Feature
6. Indexing the Data
7. Search Queries
8. DataImportHandler - DIH
9. Plugins to index various types of Data (XML, CSV, DB, Filesystem)
10. Solr Client APIs
11. Overview of SOLRJ API
12. Running Solr on Tomcat
13. Enabling SSL on Solr
14. Zookeeper Configuration
15. Solr Cloud Deployment
16. Production Indexing Architecture
17. Production Serving Architecture
18. Solr Upgradation
19. References
Introduction to the basics of Information Retrieval (IR) with an emphasis on Apache Solr/Lucene. A lecture I gave during the JOSA Data Science Bootcamp.
In this On-Demand Webinar, Erik Hatcher, co-founder of Lucid Imagination, co-author of Lucene in Action, and Lucene/Solr PMC member and committer, presents and discusess key features and innovations of Apache Solr 1.4
Apache Solr is a popular, open source enterprise search platform built on the Java based search engine library Apache Lucene. It powers the search and navigation features of many of the world's largest companies like Netflix, Instagram, LinkedIn, Twitter and eBay, etc.
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
Slides for my presentation at SoCal Code Camp, June 29, 2014
(http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=6337660f-37de-4d6e-a5bc-46ba54478e5e)
Similar to Get the most out of Solr search with PHP (20)
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Get the most out of Solr search with PHP
1. Get the most out of
Solr search with PHP
Paul Borgermans
2. About me
● Active in open source community for a while
● Squid Proxy server (about 15y ago)
● PHP based CMS solutions (mostly eZ Publish)
● Currently fancying :
● PHP as the master glue language for almost everything
● Apache Lucene family of projects (mainly Solr)
● NoSQL (Not only SQL) and scalable architectures
● CMS systems & all kinds of challenges in information
management
3. Outline
● Overview of Apache Solr
● How to use it with PHP (1)
● Concepts & internals
● How to use it with PHP (2)
● Miscellaneous tips
● Resources
5. Solr Curriculum Vitae
● Open source Apache Lucene subproject
● Standalone, enterprise grade search server
built on top of Lucene
● Lives in a Java servlet container
● Access through a REST-ful API
● HTTP
● Primary payload in requests: XML
● Other response formats: PHP, JSON, …
6. Solr in a nutshell
● State of the art, advanced full text search
and information retrieval
● Fast, scalable with native replication features
● Flexible configuration
● Document oriented storage
● Extensible (if you know a bit of Java), but
usually not needed
7. Full text search main features
● Tuneable relevancy ranking on top of internal similarity
algorithms
● Highlighting
● Sorting
● Filtering
● “Drill-down” navigation (facets)
● Automatic related content
● Spell checking
● Multilingual text analysis
9. Tunable relevancy ranking
● “Boosting” at index and query time
● certain types of content
● certain parts of content (“fields”)
● page-rank like if the content has relations
● Elevate request component
● predefined “pages/documents” to the top when
certain keywords are entered
● With customised functions
● more recent articles
● proximity (geolocations)
10. Filtering
● Does not influence the relevancy
● Narrows down the scope
● Very powerful: full boolean, wildcards, fuzzy,
and unlimited combinations
● Ranges (dates, numbers, alphanumeric, ...)
Also for implementing security!
11. Facets
● Along the main query, “facet fields” may be defined,
usually operating on meta-data:
● Type of content
● Publication year
● Keywords
● Author ....
● The result set is returned offering the number hits
within each “facet”
● You can use the selected facet as a subsequent filter
13. Automatic related content (“More Like This”)
● Search engine determines itself which are
the important terms of a page and
performs a query
● All other normal features can be used
● Filtering
● Sorting
● Facets
15. Spell checking
● Two possible strategies
● Dictionary look-up
● Using the indexed words itself
(recommended)
● Possible “Google” approach using the “best
guess”
● Search for “Grein botle“
=> suggests “Green bottle”
● Let Solr return individual keyword suggestions
=> more client side processing required
16. Multilingual features
● Adapted tokenizers
● Stemming (reducing words to common form)
● Reduces some spelling errors too!
● May decrease accuracy
● Different algorithms per language
● Normalisation (“latin 1 characters”)
● élève = eleve, Spaß = spass, ...
17. Performance
● Solr employs intelligent caches
● filters
● queries
● internal indexes
● Optimized for search/retrieval
● Possible autowarming on start up
● When updates are done, caches are reconstructed
on the fly in the background
18. Performance (2)
● Replication
● master-slave for now
● works across platforms with same
configuration
● no native OS features needed (or rsync)
● more cloud features under development
● Sharding (client driven)
22. PHP: the client side
● Roll your own classes
● Not difficult, it's REST after all
● Some Curl, XML, Json or native PHP array parsing
● Use existing libraries
● PECL: http://pecl.php.net/package/solr
● http://code.google.com/p/solr-php-client/
(follows ZF coding standards)
● eZ Components: ezcSearch
● PHP CMS's usually come with their own
● eZ Publish, Drupal, Symfony ...
23. What's next?
● Getting data into Solr
● Basic searches
● Advanced requests
● But first something on the concepts and
internals
25. The Solr/Lucene index
● Inverted index
● Holds a collection of “documents”
● Document
● Collection of fields
● Flexible schema!
● Unique ID (user defined)
● Solr uses a XML based config file:
schema.xml
26. Fields
● Various field types, derived from base classes
● Indexed
● contains the inverted index
● usually analyzed & tokenized
● makes it searchable and sortable
● Stored
● contains also the original content
● content can be part of the request response
● Can be multi-valued!
● opens possibilities beyond full text search
27. Field definitions: schema.xml
● Field types
● text
● numerical
● dates
● location
● … (about 25 in total)
● Actual fields (name, definition, properties)
● Dynamic fields
● Copy fields (as aggregators)
28. schema.xml: simple field type examples
<fieldType name="string" class="solr.StrField"
sortMissingLast="true" omitNorms="true"/>
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField"
sortMissingLast="true" omitNorms="true"/>
<!-- A Trie based date field for faster date range
queries and date faceting. -->
<fieldType name="tdate" class="solr.TrieDateField"
omitNorms="true" precisionStep="6"
positionIncrementGap="0"/>
<!-- A text field that only splits on whitespace for exact
matching of words -->
<fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
29. schema.xml: more complex field type
<!-- A general unstemmed text field - good if one does not know the language of the field -->
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="false" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
31. Analysis
● Solr does not really search your text, but rather
the terms that result from the analysis of text
● Typically a chain of
● Character filter(s)
● Tokenisation
● Filter A
● Filter B
● …
32. Solr comes with many tokenizers and filters
● Some are language specific
● Others are very specialised
● It is very important to get this right
otherwise, you may not get what you
expect!
33. Text analysis examples
String Field type “text” term position 1 term position 2
iPad => i pad
ipad
élève. => elev
PowerShot => power shot
powershot
Lets have a look: http://localhost:8983/solr/admin
34. Character filters
● Used to cleanup text before tokenizing
● HTMLStripCharFilter (strips html, xml, js,
css)
● MappingCharFilter (normalisation of
characters, removing accents)
● Regular expression filter
35. Tokenizers
● Convert text to tokens (terms)
● You can define only one per field/analyzer
● Examples
● WhitespaceTokenizer (splits on white
space)
● StandardTokenizer
● CJK variants
36. Additional filters
● Many possible per field/analyzer
● Many delivered with Solr out of the box
● If not enough, write a tiny bit of Java or
look for contributions
● Examples ...
38. Reversing Filter
● Reverses the order of characters
● Use: allow “leading wildcards”
● *thing => gniht*
● A lot faster (prefixes)
39. Synonyms
● Inject synonyms for certain terms
● Language specific
● Best used for query time analysis
● may inflate the search index too much
● decreases relevancy
40. Stemming
● Reduce terms to their root form
● Language specific (or not relevant, CJK)
● Many specialised stemmers available
● Most european languages
41. Copy fields
● Analysis is done differently for
● searching/filtering
● faceting/sorting
● Stemming and not stemming in different fields
can increase relevance of results
● Use copy fields in schema.xml or do it client side
43. Get the data and feed it
● Most *AMP applications have databases
● Map your data to a “document model”
● denormalization, flattening
● most DB fields can be fed unaltered, Solr
takes care of the rest
● One constraint: it must be UTF-8!
44. Snippets (1)
class eZSolrDoc
{
function eZSolrDoc( $boost = false )
public function setBoost ( $boost = false )
public function addField ( $name, $content, $boost = false )
public function docToXML()
}
class eZSolr
{
public function addDocs ( $docs = array(), $commit = true,
$optimize = false, $commitWithin = 0 )
.....
45. Searching
● Construct a GET/POST query
● Base parameters
● “q” for query text
● “start” for offset
● “rows” for max number of results to
return
49. Indexing binary files
● Solr 1.4 includes the Apche Tika libraries
● convert about any format to plain text
● you can activate a dedicated
requesthandler for it
OR
● Use it standalone (command line) for
integration into existing code
See: http://lucene.apache.org/tika/
50. Integrate legacy data
● Use the Solr Data Import Handler
● Able to index DB's directly
● define the schema to use (including
possible joins)
● fire simple requests to Solr to actually
index/update
● Also XML feeds, files (csv), ...
51. Have multilingual content?
● Multi-core configuration
● Setup a dedicated Solr core per language
● Each has its own schema definitions, while
you can still use common field names
● If using one index
● Use dynamic fields and create language
specific analyzers for dedicate language
suffixes/prefixes