Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
The importance of search for modern applications is evident and nowadays it is higher than ever. A lot of projects use search forms as a primary interface for communication with a user. Though implementation of an intelligent search functionality is still a challenge and we need a good set of tools.
In this presentation, I will talk through the high-level architecture and benefits of Elasticsearch with some examples. Aside from that, we will also take a look at its existing competitors, their similarities, and differences.
Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
The importance of search for modern applications is evident and nowadays it is higher than ever. A lot of projects use search forms as a primary interface for communication with a user. Though implementation of an intelligent search functionality is still a challenge and we need a good set of tools.
In this presentation, I will talk through the high-level architecture and benefits of Elasticsearch with some examples. Aside from that, we will also take a look at its existing competitors, their similarities, and differences.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
Enterprise Search is a challenging problem for most organizations. Public search technologies such as Google can index content and use link popularity to rank content in addition to the basic keyword matches. Enterprise Search is different. Sometimes it requires specially designed indexes as well as several processing steps.
At the U.S. Patent & Trademark Office, part of the Department of Commerce, a team of professionals is building the next generation of search tools using open source technologies. Like any large undertaking, it’s not a simple plug and play project.
Main topics to be covered in this talk:
+ Architectures for Large Scale Enterprise Search
+ Leveraging Apache Cassandra & Spark
+ Customizing / Configuring Apache SolR and Indexing
+ Writing a custom Parser for SolR in Scala
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.
An overview of Akka.Net
- Intro to Actor Model
- Asynchronous Design
Actor Lifecycle
- Actor Hierarchies
- Location transparencies
- Configuration
- Clustering
- Routing
- Persistence
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
Enterprise Search is a challenging problem for most organizations. Public search technologies such as Google can index content and use link popularity to rank content in addition to the basic keyword matches. Enterprise Search is different. Sometimes it requires specially designed indexes as well as several processing steps.
At the U.S. Patent & Trademark Office, part of the Department of Commerce, a team of professionals is building the next generation of search tools using open source technologies. Like any large undertaking, it’s not a simple plug and play project.
Main topics to be covered in this talk:
+ Architectures for Large Scale Enterprise Search
+ Leveraging Apache Cassandra & Spark
+ Customizing / Configuring Apache SolR and Indexing
+ Writing a custom Parser for SolR in Scala
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Slides from my talk during ApacheCon EU 2012 - "Battle of the giants: Apache Solr vs ElasticSearch". Video available at http://player.vimeo.com/video/55645629
Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.
An overview of Akka.Net
- Intro to Actor Model
- Asynchronous Design
Actor Lifecycle
- Actor Hierarchies
- Location transparencies
- Configuration
- Clustering
- Routing
- Persistence
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Finite-State Queries in Lucene:
* Background, improvement/evolution of MultiTermQuery API in 2.9 and Flex
* Implementing existing Lucene queries with NFA/DFA for better performance: Wildcard, Regex, Fuzzy
* How you can use this Query programmatically to improve relevance (I'll use an English test collection/English examples)
Quick overview of other Lucene features in development, such as:
* Flexible Indexing
* "More-Flexible" Scoring: challenges/supporting BM25, more vector-space models, field-specific scoring, etc.
* Improvements to analysis
Bonus:
* Lucene / Solr merger explanation and future plans
About the presenter:
Robert Muir is a super-active Lucene developer. He works as a software developer for Abraxas Corporation. Robert received his MS in Computer Science from Johns Hopkins and BS in CS from Radford University. For the last few years Robert has been working on foreign language NLP problems - "I really enjoy working with Lucene, as it's always receptive to better int'l/language support, even though everyone seems to be a performance freak... such a weird combination!"
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
Enterprise Search is a challenging problem for most organizations. Public search technologies such as Google can index content and use link popularity to rank content in addition to the basic keyword matches. Enterprise Search is different. Sometimes it requires specially designed indexes as well as several processing steps.
At the U.S. Patent & Trademark Office, part of the Department of Commerce, a team of professionals is building the next generation of search tools using open source technologies. Like any large undertaking, it’s not a simple plug and play project.
Main topics to be covered in this talk:
+ Architectures for Large Scale Enterprise Search
+ Leveraging Apache Cassandra & Spark
+ Customizing / Configuring Apache SolR and Indexing
+ Writing a custom Parser for SolR in Scala
Presented by Adrien Grand, Software Engineer, Elasticsearch
Although people usually come to Lucene and related solutions in order to make data searchable, they often realize that it can do much more for them. Indeed, its ability to handle high loads of complex queries make Lucene a perfect fit for analytics applications and, for some use-cases, even a credible replacement for a primary data-store. It is important to understand the design decisions behind Lucene in order to better understand the problems it can solve and the problems it cannot solve. This talk will explain the design decisions behind Lucene, give insights into how Lucene stores data on disk and how it differs from traditional databases. Finally, there will be highlights of recent and future changes in Lucene index file formats.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
6. What is Lucene.Net?
Lucene.Net is a port of the Lucene search engine
library, written in C# and targeted at .NET
runtime users.
7. What is Lucene?
Apache Lucene is a high-performance, full-
featured text search engine library written
entirely in Java.
Apache Lucene is an open source project
available for free download.
8. History
1997 – Lucene project began by Doug Cutting
2000 – First open source release
2002 – First Apache Jakarta release
2005 – Lucene becomes a top-level project
2006 – Lucene.Net gets Apache incubation status
2010 – Lucene.Net orphaned by original committers
2011 – Lucene.Net reaccepted into Apache Incubator
2012 – Lucene.Net graduates from the Incubator
9. Why you should care
You want to provide
customers with a
“Google-like” search
experience
You want to tune
incoming queries or
results ranking
You want better
performance than SQL
“like” searches
You want to avoid
deploying a separate
search tool with your
website or application
10. What does it do?
• Allows you to index and search vast amounts
of text quickly
• Provides a powerful query syntax
• Integrates into applications easily
11. How it works
• Lucene uses an inverted index
– Maps terms to the documents that contain them
• Lucene manages its index
– Stores the index in memory or on disk
– Allows documents to be added or removed
• Makes an index for each document
• Merges the index with a set of other indices
14. Differences between Java and .Net
The Lucene.Net API:
• Lags a few steps behind the Java version of
Lucene
• Takes advantage of advanced .Net features not
found in Java
But it:
• Preserves the core Lucene concepts
• Maintains indexes that are compatible with the
Java version
15. Logical Index Storage
• Field – a name/value pair
• Document – a sequence of fields
• Index – a collection of documents
16. Physical Index Storage
• Lucene generates a
series of files within a
single directory
• Moving an index is a
copy-and-paste
operation
• You can compress or zip
an index to archive it
17. Luke
• Lucene Index Toolbox
• Built in Java, but can
read Lucene.Net
indexes
• http://code.google.com
/p/luke/
18. Analyzers and Tokens
• Analyzers take strings of text and break them
into tokens
• Tokens are chunks of text and associated
metadata
19. Terms, Queries and Hits
• Terms – the basic unit for searching. A field
name and a value to seek.
• Queries – combine terms to form search
criteria
• Hits – a ranked list of pointers to documents
27. Transactional Lucene
• Lucene supports ACID commits to its indexes
• Lucene uses the Commit and Rollback syntax,
much like relational databases.
• Source:
http://blog.mikemccandless.com/2012/03/tra
nsactional-lucene.html
28. Lucene index types
FSDirectory
• Stores indexed documents
on disk
• Persists data across sessions
• Best choice for most
applications
Your first choice
RAMDirectory
• Stores indexed documents
in memory
• Entire index must fit into
available memory
• Does not persist data
• Faster than FSDirectory
Useful for unit testing
29. Precalculation
• How you store things in Lucene matters –
choose field options and analyzers carefully
• The way you retrieve information determines
how it should be stored
• Smaller indexes give you better performance
31. Field.Index
• No – the field is not indexed, so it is not
searchable
• Not analyzed – the text is treated as single
unit and indexed whole
• Analyzed – the text is broken down into
tokens and indexed
32. Field.TermVector
• No – Does not store term vectors
• Yes – Stores the term vectors of each
document (terms and number of occurrences)
• With Positions Offsets – Term vector, token
position and offset information
33. Field types indexing options
Field Stored Analyzed Vectored
Id Yes Not analyzed No
Modified Yes Not analyzed No
Path Yes Analyzed No
Content No Analyzed With Positions Offsets
An example of storing fields related to files on
your computer.
34. Analyzers
• Break apart text into tokens; each token gets
indexed separately
• Remove stop words
• Decide how to handle punctuation
• Handle languages and case sensitivity
• You can create your own by building from
scratch or chaining exiting analyzers
35. Types of Queries
• TermQuery
• PhraseQuery
• RangeQuery
• PrefixQuery , Wildcard Query
• FuzzyQuery
• Use BooleanQuery to combine them
36. Query syntax
Query Type Purpose Sample
TermQuery Single word query scarlett
PhraseQuery Matches terms in order “frankly my dear”
RangeQuery Matches documents between the
terms
[1861 to 1865]
{1861 to 1865}
WildcardQuery Lightweight regex-like term matching Atl*
D?m?
PrefixQuery Matches terms that being with the
string
War*
FuzzyQuery Closeness matching cry~
BooleanQuery Combines other queries into complex
expressions
Scarlett AND “frankly my
dear” -voldemort
37. Query, Filter, and Sort
• Lucene.Net can handle all three
• Default sort is by relevance
• Prefer queries to filters – they perform better
40. Recap
• Why would I use a search engine?
• Why would I use Lucene.Net?
• How would I add Lucene.Net to my project?
– Web
– Desktop
• Where could I go to learn more?
• When can I buy Dean a beer?
43. Books
• Lucene in Action,
Second Edition
• Michael McCandless,
Erick Hatcher, Otis
Gospodnetić
• Manning Publications
• July 2010
• http://www.manning.co
m/hatcher3/
44. Books
• Taming Text
• Grant S. Ingersoll,
Thomas S. Morton,
Andrew L. Farris
• Manning Publications
• January 2013
• http://www.manning.co
m/ingersoll/
45. Books
• Introduction to
Information Retrieval
• Christopher D. Manning,
Prabhakar Raghavan,
Hinrich Schutze
• Cambridge University Press
• 2008
• http://www-
nlp.stanford.edu/IR-book/
48. Sample Files
All the literature shown in the code samples
comes from Project Gutenberg.
http://www.gutenberg.org/
Editor's Notes
Egad, the PUNishment! Well, at least I didn’t have a boring “Introduction to Lucene.NET” title.
Oooh, an agenda. Aren’t I organized?
Please send me an email to get in touch with me. Keep up with what I’m doing on the Infovark website or on my LinkedIn profile. I’ve listed my twitter handles – personal and work – but I rarely log into Twitter for any length of time. Send me a private message if you want to get my attention on Twitter.
Doug Cutting had written search engines in other languages, but he wanted to teach himself Java. So the Lucene project began. Although he started building a commercial venture around the project, he decided that he preferred writing code to running a business. He open sourced the code in 2000.Lucene got adopted by the Apache Software foundation in 2001. Lucene.Net, which began as an independent port of Lucene, was accepted by the ASF in 2006.In 2010, Lucene.Net hit a rough patch, but thatnk’s to the efforts of the Alt.Net community, it was reintroduced to the Apache Incubator. In 2012, it graduated from the Incubator and became a full-fledged Apache project.
Inverted indexMaps terms to the documents that contain themTerms may include metadata to improve rankingTerms may include position data for proximity searches
These are a few examples of websites, applications, and platforms that use Lucene.Net. If I included those that use Lucene, the Java version, the list would be huge. Even if you don’t use Lucene.Net directly, chances are good that you use something that does. Lucene has become a foundational technology for many of the tools and sites we use today, but not many folks working on the Microsoft side are familiar with it. Some prominent Java examples include: LinkedIn, Twitter, IBM’s OpenFind, and many more.
The .Net version is catching up with the Java version, but it remains nearly a full version behind.The .Net API is much nicer to work with, having good collections and generics support.Tools that interact with a Lucene index will work regardless of the Lucene library that created it.
Although we’ll be working with the Lucene.NET API tonight, many of the concepts you’ll hear will apply to any search engine, though the specific terminology may differ a little. Let’s review some basic definitions we’ll use throughout the rest of the presentation.Index – a collection of documentsDocument – a sequence of fieldsField – a string name/value pair
Luke is one of the ugliest applications I’ve ever seen, but it’s extremely useful. It exposes just about every aspect of the Lucene API, so it makes a great test-bed for trying out different ideas.
Analyzer – breaks field values into tokensToken – a tuple consisting of a chunk of text and its associated metadata. Tokens are the raw bits that gets indexed.(Tokens and terms are closely related.)
Query – a way to ask a question of an indexTerm – a tuple containing a field and a value to seek
Here are some of the key classes used to add documents to the index.I really ought to add some details to the slide for folks who can’t see the code sample.
Updating is a fairly new operation in the Lucene.Net API. Under the hood, it’s doing a Delete operation then an Add operation.
Did you know that you can use an IndexReader to update and delete documents, too? Yes, but I don’t recommend it. This is one of the parts of the API that’s getting revised in the near future.
Unlike a relational database, there’s no “normal form” to guide you when structuring a Lucene index. The key thing to remember is that the
Keeping the original text within the Lucene index is convenient, but can vastly increase the size of your indexes.
Term Vector Yes
Just an example of how you might combine the flags when adding fields to a document.
TermQuery – retrieve documents by a keyPrefixQuery – matches the start of a string valueRangeQuery – searches starting at one term and ending at another (useful for date searches)BooleanQuery – lets you combine other queries using AND, OR, NOT operationsPhaseQuery – finds terms a specified distance from one anotherFuzzyQuery – matches terms similar to a specified term
Examples of query syntax.
Some odds and ends on Queries, filters and sorting.
We can finally dispose of our Lucene objects in versions 2.9.4 and later. If you’re using older versions, you must remember to try/finally the FSDirectory and IndexWriter.Remember that it’s much more efficient to add a bunch of documents within a single using statement than to open a new IndexWriter each time.