7.1 Search and Lucene.Net
Ash Prasad

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Agenda
•
•
•
•
•

History and New Objectives
Architecture
Lucene / Lucene.Net
Crawlers, Entities, Controllers
Ranking, Syn...
History of Search
ISearchable

• Platform Edition
• SQL Server
• ISearchable

Scheduler

Module
Module

SQL

• Commercial ...
Objectives of New Search
• Handle diverse Content
• CMS, Social, Localized, 3rd Party Modules)

• Consistent User Experien...
Architecture

Don’t forget to include #DNNCon in your tweets!

@DNNCon
What‟s Lucene
•

•
•
•
•
•

Java-based indexing and search
technology
Managed by Apache
NOSQL database
Near real-time, Spe...
What‟s Lucene.Net
•
•
•
•

Line-by-line port from Java to C#
Maintains high-performance requirements
A bit behind Java rel...
Lucene – A Document Store
• Flexible Schema

• Consists of Documents
•

Which are collection of Fields

• Documents can ha...
Lucene – A Document Store (Contd.)

• Denormalized (No Referential Integrity)
• Deletion – Done through a flag
• Compact r...
Book consulted for Search
• Book on version
3.0
• ~ 500 pages
• Very useful

Don’t forget to include #DNNCon in your tweet...
Search Phases

Content
Acquisition

Content
Indexing

•
•
•
•
•

•
•
•
•
•

Crawling
ISearchable
ModuleSearchBase
URL
Doc ...
Crawlers
• Platform
• Site Crawler
•
•

Module and Tab Metadata
Module Content
(ModuleSearchBase/ISearchable)

• Commercia...
Search Entities
• SearchType
• Distinguishes Crawlers

• SearchDocument
• Properties for a Content
• Stored in the Index

...
Search Entities – Indexing vs. Querying

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Controllers
• SearchController
• For Querying

• InternalSearchController
• For Adding / Updating / Deleting

• LuceneCont...
Ranking = Boosting
• Doc and/or Field can be boosted in
Lucene
• DNN does Field boosts (Default - 10)
•
•
•
•
•

Title (50...
Synonyms and Ignore Words
• Synonyms are injected into Index
• Ignore Words are removed from Index

Don’t forget to includ...
Stemming
• Convert words to its root
• PorterStemFilter is used
• Country and Countries = countri
• breathe, breathes, bre...
Security Trimming
• Done through Collectors (Callback)
• Each Doc found is sent to Collector
• Collector rejects/accept pe...
Module Integration
• ModuleSearchBase
• New abstract class with just one method
• Defined in BusinessControllerClass
• Get...
New Crawler (How to)
• Define a new SearchType
• Optionally use IsPrivate to hide from site
search

• Implement BaseResult...
Demo

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Recap
•
•
•
•

New Search uses Lucene.Net
Platform has Site Crawler
Commercial has URL and File Crawlers
Modules to implem...
THANKS TO ALL OF OUR GENEROUS
SPONSORS!

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Upcoming SlideShare
Loading in …5
×

Search features and architecture in DNN 7.1

3,109 views

Published on

7.1 Search and Lucene.Net
Lucene.Net was the obvious choice of technology for Search in 7.1. Lucene is a general purpose search engine, integrating with the intricracies with DNN wasn't trivial. Ash was very instrumental in design and development of the new Search in 7.1. Join Ash to hear all about DNN Search and Lucene.Net and what's the future look like.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,109
On SlideShare
0
From Embeds
0
Number of Embeds
70
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Search features and architecture in DNN 7.1

  1. 1. 7.1 Search and Lucene.Net Ash Prasad Don’t forget to include #DNNCon in your tweets! @DNNCon
  2. 2. Agenda • • • • • History and New Objectives Architecture Lucene / Lucene.Net Crawlers, Entities, Controllers Ranking, Synonyms, Ignore Words, Stemming • Security Trimming • Module Integration, New Crawler Don’t forget to include #DNNCon in your tweets! @DNNCon
  3. 3. History of Search ISearchable • Platform Edition • SQL Server • ISearchable Scheduler Module Module SQL • Commercial Edition • Lucene 2.9.2 • URL and Files Scheduler Lucene Don’t forget to include #DNNCon in your tweets! @DNNCon
  4. 4. Objectives of New Search • Handle diverse Content • CMS, Social, Localized, 3rd Party Modules) • Consistent User Experience • Simple for Module Developers • Uniform Architecture • Feature based differentiation Don’t forget to include #DNNCon in your tweets! @DNNCon
  5. 5. Architecture Don’t forget to include #DNNCon in your tweets! @DNNCon
  6. 6. What‟s Lucene • • • • • • Java-based indexing and search technology Managed by Apache NOSQL database Near real-time, Spellchecking, Highlighting, Ranking, Synonyms Many companies use Lucene directly or customize Facebook‟s Graph search uses similar „Inverted Index‟ Don’t forget to include #DNNCon in your tweets! @DNNCon
  7. 7. What‟s Lucene.Net • • • • Line-by-line port from Java to C# Maintains high-performance requirements A bit behind Java releases Who Uses Lucene.Net • Products - RavenDB, Orchard, Umbraco, SubText • Commercial Sites – BBC UK Top Gear, AutoDesk, Koders.Com Don’t forget to include #DNNCon in your tweets! @DNNCon
  8. 8. Lucene – A Document Store • Flexible Schema • Consists of Documents • Which are collection of Fields • Documents can have different set of Fields • • Field(“ID”,”xxx-yyy-999”), Field(“Title”, “My best doc”) Field(“Owner”,”Ash”), Field(“Locale”,”en-US”) Don’t forget to include #DNNCon in your tweets! @DNNCon
  9. 9. Lucene – A Document Store (Contd.) • Denormalized (No Referential Integrity) • Deletion – Done through a flag • Compact reclaims deleted space • Update is Delete + Insert • Boost = Ranking • Unicode compliant Don’t forget to include #DNNCon in your tweets! @DNNCon
  10. 10. Book consulted for Search • Book on version 3.0 • ~ 500 pages • Very useful Don’t forget to include #DNNCon in your tweets! @DNNCon
  11. 11. Search Phases Content Acquisition Content Indexing • • • • • • • • • • Crawling ISearchable ModuleSearchBase URL Doc / PDF Text Analysis Ranking Synonyms Ignore Words Stemming Don’t forget to include #DNNCon in your tweets! Content Search • • • • • Querying Sorting Security Trimming Boolean Search Highlighting @DNNCon
  12. 12. Crawlers • Platform • Site Crawler • • Module and Tab Metadata Module Content (ModuleSearchBase/ISearchable) • Commercial Edition • File Crawler • Uses IFilter for extraction of text PDF/Office files • URL Crawler • Internal and External URLs Don’t forget to include #DNNCon in your tweets! @DNNCon
  13. 13. Search Entities • SearchType • Distinguishes Crawlers • SearchDocument • Properties for a Content • Stored in the Index • SearchQuery • Parameters to execute a Query • SearchResult • Derived from SearchDocument Don’t forget to include #DNNCon in your tweets! @DNNCon
  14. 14. Search Entities – Indexing vs. Querying Don’t forget to include #DNNCon in your tweets! @DNNCon
  15. 15. Controllers • SearchController • For Querying • InternalSearchController • For Adding / Updating / Deleting • LuceneController • Interacts with Lucene Don’t forget to include #DNNCon in your tweets! @DNNCon
  16. 16. Ranking = Boosting • Doc and/or Field can be boosted in Lucene • DNN does Field boosts (Default - 10) • • • • • Title (50) Tag (40) Keyword (35) Description (20) Author (15) • Configured manually by HostSettings Don’t forget to include #DNNCon in your tweets! @DNNCon
  17. 17. Synonyms and Ignore Words • Synonyms are injected into Index • Ignore Words are removed from Index Don’t forget to include #DNNCon in your tweets! @DNNCon
  18. 18. Stemming • Convert words to its root • PorterStemFilter is used • Country and Countries = countri • breathe, breathes, breathing, breathed = breath • fishing, fished, fisher = fish Don’t forget to include #DNNCon in your tweets! @DNNCon
  19. 19. Security Trimming • Done through Collectors (Callback) • Each Doc found is sent to Collector • Collector rejects/accept per Permission • Site Crawler - Module / Tab Permission • File Crawler - Folder Permission • User Crawler – Profile Permission Don’t forget to include #DNNCon in your tweets! @DNNCon
  20. 20. Module Integration • ModuleSearchBase • New abstract class with just one method • Defined in BusinessControllerClass • GetModifiedSearchDocuments • • • Returns New, Changed and Deleted content Delta based Granular Permission, Localization, etc. • ISearchable continues to work (no delta) Don’t forget to include #DNNCon in your tweets! @DNNCon
  21. 21. New Crawler (How to) • Define a new SearchType • Optionally use IsPrivate to hide from site search • Implement BaseResultController (2 methods) • HasViewPermission • GetDocUrl • Create Scheduled Task • Call AddSearchDocuments to inject contentforget to include #DNNCon in your tweets! @DNNCon Don’t
  22. 22. Demo Don’t forget to include #DNNCon in your tweets! @DNNCon
  23. 23. Recap • • • • New Search uses Lucene.Net Platform has Site Crawler Commercial has URL and File Crawlers Modules to implement ModuleSearchBase • New Crawler implements BaseResultController Don’t forget to include #DNNCon in your tweets! @DNNCon
  24. 24. THANKS TO ALL OF OUR GENEROUS SPONSORS! Don’t forget to include #DNNCon in your tweets! @DNNCon

×