Your SlideShare is downloading. ×
0
d35xp                    W.Meints         Search enabled        applications with           Lucene.NET
Agenda               Introduction   Technical bits               Inspiration #ISKALUCENE
Google has ruined search foreveryone
This is what you often buildas a developer. Because the         user wants it.
Three reasons why search sometimessucks• Can I even search?  • The number one reason, because sometimes it’s not there or ...
Three reasons why search sometimessucks• I am not going to address all of these issues today.• The focus of this talk is o...
This is what we expect to see today
Implementing proper search functionality  Simplicity is key  Gives the right     answersAllows me to refine
What search is todaySearch is hard on thedeveloper. It involves a lot ofthings:• Linguistics• Psychology• Information anal...
Lucene.NET as a possible solution• Lucene.NET is derived from  its Java cousin Lucene.• Compact search engine that  offers...
Getting started with lucene.NET                         Getting started
Overview of Lucene• Lucene provides the core  things you need to build a  search system• It does not:   • Contain a search...
This is what is in the box• Text analyzer  • Splits text in searchable terms  • Filters out stopwords (if you    want)• Qu...
This is what is in the box• IndexReader   • Reads everything from the     index• IndexWriter   • Stores documents and fiel...
A standard recipe for building search1                     2                          3Build an index with   Build a query...
A standard recipe for building search1                     2                          3Build an index with   Build a query...
Step 1: Building an index• The lucene search index is nothing like your average database!• Storage happens in key/value pa...
Step 1: Building an index• The Lucene indexing uses a tree like index structure    Doc #1      Doc #2      Doc #3   Each d...
Step 1: Building an index• Reasons for going in this direction:   • Segments are small, and update very fast.   • Searchin...
Step 1: Building an index                              Analyzer   Your Parser    Document   IndexWriter                   ...
A standard recipe for building search1                     2                          3Build an index with   Build a query...
Step 2: Building queries• Querying Lucene.NET is done through the IndexSearcher for  almost every scenario you can think o...
Step 2: Building queries“Some      QueryParser   Query   IndexSearchertext”            Analyzer
Step 2: Building queries• There’s a standard QueryParser, but you can also use the  MultiFieldQueryParser• The MultiFieldQ...
Step 2: Building queries• Using the QueryParser and analyzer to get a good query for the  search engine is one way of goin...
Step 2: Building queries• SpanQuery is a little weird, it allows you to find terms close  together in a piece of text. For...
A standard recipe for building search1                     2                          3Build an index with   Build a query...
Step 3: Getting results• With indexed content and a the right query, you can get the  answer to everything (Which by the w...
Step 3: Getting results              Query       IndexSearcher                          IndexReader                       ...
Step 3: Getting results• Documents are matched against your query using complex  math.• A TF-IDF algorithm is used to dete...
Step 3: Getting results• In the demo I showed you the basic form of finding documents.• There’s more to the Search method ...
Step 3: Getting results• Need to find documents in ranked order?  • Use the default method or use a TopDocsCollector• Need...
Step 3: Getting results• Don’t want documents that have nothing to do with what you  asked for in the first place?  • Use ...
A standard recipe for building search1                           2                        3Build an index with         Bui...
Good to go• Now that you know how Lucene.NET works I think it is time to  show you a few other things…
Categorize content based on previouscontent    ?                 IndexSearcher         Body                 Probably a goo...
Detecting plagiarized content                                                               ?                             ...
Spell check content• You can spell check a document based on what others wrote.  • Very similar to categorization, but ins...
Play jeopardy?• The IBM Watson super  computer uses Lucene
By the way…• Endeavour knowNow uses Lucene.NET• And there are more devs using it.   • Twitter uses Lucene for realtime sea...
http://www.fizzylogic.nl/@wmeints
Upcoming SlideShare
Loading in...5
×

Search enabled applications with lucene.net

1,213

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,213
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Demo geven over het bouwen van eenzoekindex + Uitlegbij de RecipeBrowser application.
  • Demo: Queries parsen en uitleggen in een unit-test.
  • Demo: SpanQuerylatenzien.
  • Demo: IndexSearcher + ResultsCollector.
  • Transcript of "Search enabled applications with lucene.net"

    1. 1. d35xp W.Meints Search enabled applications with Lucene.NET
    2. 2. Agenda Introduction Technical bits Inspiration #ISKALUCENE
    3. 3. Google has ruined search foreveryone
    4. 4. This is what you often buildas a developer. Because the user wants it.
    5. 5. Three reasons why search sometimessucks• Can I even search? • The number one reason, because sometimes it’s not there or it is there, but you cannot see it is there. Confusing stuff!• The search form is too complicated • I need to be an expert to find something I don’t know is there…. Good thinking!• The search engine is too slow • They sometimes warn you about this (why?!)
    6. 6. Three reasons why search sometimessucks• I am not going to address all of these issues today.• The focus of this talk is on the technical stuff, which solves • Having to use complex search forms to find something • Having to wait a long time before you find something (hopefully).• Usability of search engines is something I could talk about for a very long time too… but not today.
    7. 7. This is what we expect to see today
    8. 8. Implementing proper search functionality Simplicity is key Gives the right answersAllows me to refine
    9. 9. What search is todaySearch is hard on thedeveloper. It involves a lot ofthings:• Linguistics• Psychology• Information analysis• Computer science• Complex math
    10. 10. Lucene.NET as a possible solution• Lucene.NET is derived from its Java cousin Lucene.• Compact search engine that offers a solution to most of your search problems.• Best of all. It is free.
    11. 11. Getting started with lucene.NET Getting started
    12. 12. Overview of Lucene• Lucene provides the core things you need to build a search system• It does not: • Contain a search results page. • Parse HTML, Word, Excel, etc.
    13. 13. This is what is in the box• Text analyzer • Splits text in searchable terms • Filters out stopwords (if you want)• QueryParser • Common syntax without needing to learn anything• IndexSearcher • The goods, THE thing to have.
    14. 14. This is what is in the box• IndexReader • Reads everything from the index• IndexWriter • Stores documents and fields• Directory • The index itself, comes in many sizes and shapes
    15. 15. A standard recipe for building search1 2 3Build an index with Build a query from the Get results and presentcontent you want to question the user asked. them to the user.search through.
    16. 16. A standard recipe for building search1 2 3Build an index with Build a query from the Get results and presentcontent you want to question the user asked. them to the user.search through.
    17. 17. Step 1: Building an index• The lucene search index is nothing like your average database!• Storage happens in key/value pairs• Most of the time nothing is stored and you can still search for it • The engine stores hashes of content • Only when you ask it to store, it stores something
    18. 18. Step 1: Building an index• The Lucene indexing uses a tree like index structure Doc #1 Doc #2 Doc #3 Each document gets its own segment initially Segments get merged during optimization cycles Merged #1 + #2 Full index Finally everything is merged back into one big pile.
    19. 19. Step 1: Building an index• Reasons for going in this direction: • Segments are small, and update very fast. • Searching many segments is slower than one bigger segments• Overall, a merging segments index is more scalable and easier to implement than a B-tree index that is used elsewhere.
    20. 20. Step 1: Building an index Analyzer Your Parser Document IndexWriter Field Field Directory
    21. 21. A standard recipe for building search1 2 3Build an index with Build a query from the Get results and presentcontent you want to question the user asked. them to the user.search through.
    22. 22. Step 2: Building queries• Querying Lucene.NET is done through the IndexSearcher for almost every scenario you can think of.• There’s a number of possible options for queries: • Hand build a query using BooleanQuery, TermQuery or another query type • Let lucene decide which would best fit by parsing the query.
    23. 23. Step 2: Building queries“Some QueryParser Query IndexSearchertext” Analyzer
    24. 24. Step 2: Building queries• There’s a standard QueryParser, but you can also use the MultiFieldQueryParser• The MultiFieldQueryParser allows you to build a query across multiple fields at once.
    25. 25. Step 2: Building queries• Using the QueryParser and analyzer to get a good query for the search engine is one way of going at it.• Other query types include: • BooleanQuery – Terms must, should or must not appear in the document • TermQuery – Look for a single term • SpanQuery – Find terms that are close together in the text Please note: You can combine!
    26. 26. Step 2: Building queries• SpanQuery is a little weird, it allows you to find terms close together in a piece of text. For example: “The lazy fox jumps over the quick brown dog” “The quick brown fox jumps over the lazy dog” The second sentence is the one you want. The first one is sort of correct, but a little funky. Since when did the dog become brown and quick??
    27. 27. A standard recipe for building search1 2 3Build an index with Build a query from the Get results and presentcontent you want to question the user asked. them to the user.search through.
    28. 28. Step 3: Getting results• With indexed content and a the right query, you can get the answer to everything (Which by the way, might not be 42…)• The IndexSearcher is used to find the answer to your query.
    29. 29. Step 3: Getting results Query IndexSearcher IndexReader Directory
    30. 30. Step 3: Getting results• Documents are matched against your query using complex math.• A TF-IDF algorithm is used to determine how well the document matches the query.• You have been warned! This is complex stuff.
    31. 31. Step 3: Getting results• In the demo I showed you the basic form of finding documents.• There’s more to the Search method than meets the eye!• Depending on your needs, you may have to use a collector. • A collector optimizes the way you retrieve documents from the index
    32. 32. Step 3: Getting results• Need to find documents in ranked order? • Use the default method or use a TopDocsCollector• Need to sort the documents in a particular order? • Use the TopFieldsCollector instead. • This collector is optimized for sorting fields
    33. 33. Step 3: Getting results• Don’t want documents that have nothing to do with what you asked for in the first place? • Use a PositiveScoresOnlyCollector • Matches documents with score > 0 Use this only when you have a smaller index.
    34. 34. A standard recipe for building search1 2 3Build an index with Build a query Get resultscontent QueryParser IndexSearcherIndexWriter MultiFieldQueryParser CollectorDocument Choose the right query Choose the right collectorThink about Store / Index type! for better performance!settings on your fields!
    35. 35. Good to go• Now that you know how Lucene.NET works I think it is time to show you a few other things…
    36. 36. Categorize content based on previouscontent ? IndexSearcher Body Probably a good candidate! Label Occurences Search 180 Requirements 40 Other label 12
    37. 37. Detecting plagiarized content ? Lucene in action IndexSearcher ?Potential problematic documentField Value Lucene in OrchardTitle Lucene.NET in actionBody Lorem ipsum stuff and more about that Lucene thingie.Tags Search, Lucene, .NET, C#
    38. 38. Spell check content• You can spell check a document based on what others wrote. • Very similar to categorization, but instead of checking the highest hit for a single field, check which word matches best for the term at hand. • Uses an n-gram structure and the Levenshtein distance algorithm (sounds good, doesn’t it?) • Do NOT build this yourself, but download here: https://nuget.org/packages/Lucene.Net.Contrib/3.0.3
    39. 39. Play jeopardy?• The IBM Watson super computer uses Lucene
    40. 40. By the way…• Endeavour knowNow uses Lucene.NET• And there are more devs using it. • Twitter uses Lucene for realtime search • StackOverflow uses Lucene for searching questions • RavenDB uses Lucene as their primary storage mechanism• Give it a try, you might be surprised!
    41. 41. http://www.fizzylogic.nl/@wmeints
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×