SlideShare a Scribd company logo
1 of 24
7.1 Search and Lucene.Net
Ash Prasad

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Agenda
•
•
•
•
•

History and New Objectives
Architecture
Lucene / Lucene.Net
Crawlers, Entities, Controllers
Ranking, Synonyms, Ignore Words,
Stemming
• Security Trimming
• Module Integration, New Crawler

Don’t forget to include #DNNCon in your tweets!

@DNNCon
History of Search
ISearchable

• Platform Edition
• SQL Server
• ISearchable

Scheduler

Module
Module

SQL

• Commercial Edition
• Lucene 2.9.2
• URL and Files

Scheduler

Lucene
Don’t forget to include #DNNCon in your tweets!

@DNNCon
Objectives of New Search
• Handle diverse Content
• CMS, Social, Localized, 3rd Party Modules)

• Consistent User Experience
• Simple for Module Developers
• Uniform Architecture
•

Feature based differentiation

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Architecture

Don’t forget to include #DNNCon in your tweets!

@DNNCon
What‟s Lucene
•

•
•
•
•
•

Java-based indexing and search
technology
Managed by Apache
NOSQL database
Near real-time, Spellchecking,
Highlighting, Ranking, Synonyms
Many companies use Lucene
directly or customize
Facebook‟s Graph search uses
similar „Inverted Index‟

Don’t forget to include #DNNCon in your tweets!

@DNNCon
What‟s Lucene.Net
•
•
•
•

Line-by-line port from Java to C#
Maintains high-performance requirements
A bit behind Java releases
Who Uses Lucene.Net
• Products - RavenDB, Orchard, Umbraco, SubText
• Commercial Sites – BBC UK Top Gear, AutoDesk,
Koders.Com

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Lucene – A Document Store
• Flexible Schema

• Consists of Documents
•

Which are collection of Fields

• Documents can have different set of Fields
•
•

Field(“ID”,”xxx-yyy-999”), Field(“Title”, “My best
doc”)
Field(“Owner”,”Ash”), Field(“Locale”,”en-US”)

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Lucene – A Document Store (Contd.)

• Denormalized (No Referential Integrity)
• Deletion – Done through a flag
• Compact reclaims deleted space

• Update is Delete + Insert
• Boost = Ranking
• Unicode compliant

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Book consulted for Search
• Book on version
3.0
• ~ 500 pages
• Very useful

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Search Phases

Content
Acquisition

Content
Indexing

•
•
•
•
•

•
•
•
•
•

Crawling
ISearchable
ModuleSearchBase
URL
Doc / PDF

Text Analysis
Ranking
Synonyms
Ignore Words
Stemming

Don’t forget to include #DNNCon in your tweets!

Content Search
•
•
•
•
•

Querying
Sorting
Security Trimming
Boolean Search
Highlighting

@DNNCon
Crawlers
• Platform
• Site Crawler
•
•

Module and Tab Metadata
Module Content
(ModuleSearchBase/ISearchable)

• Commercial Edition
• File Crawler
•

Uses IFilter for extraction of text PDF/Office files

• URL Crawler
•

Internal and External URLs
Don’t forget to include #DNNCon in your tweets!

@DNNCon
Search Entities
• SearchType
• Distinguishes Crawlers

• SearchDocument
• Properties for a Content
• Stored in the Index

• SearchQuery
• Parameters to execute a Query

• SearchResult
• Derived from SearchDocument
Don’t forget to include #DNNCon in your tweets!

@DNNCon
Search Entities – Indexing vs. Querying

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Controllers
• SearchController
• For Querying

• InternalSearchController
• For Adding / Updating / Deleting

• LuceneController
• Interacts with Lucene

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Ranking = Boosting
• Doc and/or Field can be boosted in
Lucene
• DNN does Field boosts (Default - 10)
•
•
•
•
•

Title (50)
Tag (40)
Keyword (35)
Description (20)
Author (15)

• Configured manually by HostSettings
Don’t forget to include #DNNCon in your tweets!

@DNNCon
Synonyms and Ignore Words
• Synonyms are injected into Index
• Ignore Words are removed from Index

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Stemming
• Convert words to its root
• PorterStemFilter is used
• Country and Countries = countri
• breathe, breathes, breathing, breathed =
breath
• fishing, fished, fisher = fish

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Security Trimming
• Done through Collectors (Callback)
• Each Doc found is sent to Collector
• Collector rejects/accept per
Permission
• Site Crawler - Module / Tab Permission
• File Crawler - Folder Permission
• User Crawler – Profile Permission
Don’t forget to include #DNNCon in your tweets!

@DNNCon
Module Integration
• ModuleSearchBase
• New abstract class with just one method
• Defined in BusinessControllerClass
• GetModifiedSearchDocuments
•
•
•

Returns New, Changed and Deleted content
Delta based
Granular Permission, Localization, etc.

• ISearchable continues to work (no
delta)
Don’t forget to include #DNNCon in your tweets!

@DNNCon
New Crawler (How to)
• Define a new SearchType
• Optionally use IsPrivate to hide from site
search

• Implement BaseResultController (2
methods)
• HasViewPermission
• GetDocUrl

• Create Scheduled Task
• Call AddSearchDocuments to inject
contentforget to include #DNNCon in your tweets! @DNNCon
Don’t
Demo

Don’t forget to include #DNNCon in your tweets!

@DNNCon
Recap
•
•
•
•

New Search uses Lucene.Net
Platform has Site Crawler
Commercial has URL and File Crawlers
Modules to implement
ModuleSearchBase
• New Crawler implements
BaseResultController

Don’t forget to include #DNNCon in your tweets!

@DNNCon
THANKS TO ALL OF OUR GENEROUS
SPONSORS!

Don’t forget to include #DNNCon in your tweets!

@DNNCon

More Related Content

Similar to Search features and architecture in DNN 7.1

Creating URL Providers for your Custom Extensions
Creating URL Providers for your Custom ExtensionsCreating URL Providers for your Custom Extensions
Creating URL Providers for your Custom ExtensionsEngage Software
 
SharePoint Search - SPSNYC 2014
SharePoint Search - SPSNYC 2014SharePoint Search - SPSNYC 2014
SharePoint Search - SPSNYC 2014Avtex
 
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Paul Hunt
 
DNNCON: Lost and Found: New DNN Search
DNNCON: Lost and Found: New DNN SearchDNNCON: Lost and Found: New DNN Search
DNNCON: Lost and Found: New DNN Searchslhilbert
 
Search driven architecture in SharePoint
Search driven architecture in SharePointSearch driven architecture in SharePoint
Search driven architecture in SharePointJim Lennox
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Agnes Molnar - Best Practices for Information Architecture and Enterprise Search
Agnes Molnar - Best Practices for Information Architecture and Enterprise SearchAgnes Molnar - Best Practices for Information Architecture and Enterprise Search
Agnes Molnar - Best Practices for Information Architecture and Enterprise SearchAgnes Molnar
 
SPSBE building an faq for end users
SPSBE building an faq for end usersSPSBE building an faq for end users
SPSBE building an faq for end usersPaul Hunt
 
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02BIWUG
 
Workshop - Ways of Working Within the M365 Workspace.pptx
Workshop - Ways of Working Within the M365 Workspace.pptxWorkshop - Ways of Working Within the M365 Workspace.pptx
Workshop - Ways of Working Within the M365 Workspace.pptxSimon Rawson
 
Optimizing SharePoint for Transactional Content Management
Optimizing SharePoint for Transactional Content ManagementOptimizing SharePoint for Transactional Content Management
Optimizing SharePoint for Transactional Content ManagementDocFluix, LLC
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Petter Skodvin-Hvammen
 
SDL Tridion at the RSPB (2010)
SDL Tridion at the RSPB (2010)SDL Tridion at the RSPB (2010)
SDL Tridion at the RSPB (2010)Graham Bird
 
Domain Specific Development using T4
Domain Specific Development using T4Domain Specific Development using T4
Domain Specific Development using T4Joubin Najmaie
 
#SPSLondon - Session 1 - Building an faq for end users
#SPSLondon - Session 1 - Building an faq for end users#SPSLondon - Session 1 - Building an faq for end users
#SPSLondon - Session 1 - Building an faq for end usersPaul Hunt
 
DNNcon 2016: Are There Security Flaws in Your DNN Modules?
DNNcon 2016: Are There Security Flaws in Your DNN Modules?DNNcon 2016: Are There Security Flaws in Your DNN Modules?
DNNcon 2016: Are There Security Flaws in Your DNN Modules?Engage Software
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchAgnes Molnar
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Dnn Con Baltimore Security Flaws
Dnn Con Baltimore Security FlawsDnn Con Baltimore Security Flaws
Dnn Con Baltimore Security FlawsJoshua Bradley
 

Similar to Search features and architecture in DNN 7.1 (20)

Creating URL Providers for your Custom Extensions
Creating URL Providers for your Custom ExtensionsCreating URL Providers for your Custom Extensions
Creating URL Providers for your Custom Extensions
 
SharePoint Search - SPSNYC 2014
SharePoint Search - SPSNYC 2014SharePoint Search - SPSNYC 2014
SharePoint Search - SPSNYC 2014
 
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
 
DNNCON: Lost and Found: New DNN Search
DNNCON: Lost and Found: New DNN SearchDNNCON: Lost and Found: New DNN Search
DNNCON: Lost and Found: New DNN Search
 
Search driven architecture in SharePoint
Search driven architecture in SharePointSearch driven architecture in SharePoint
Search driven architecture in SharePoint
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Agnes Molnar - Best Practices for Information Architecture and Enterprise Search
Agnes Molnar - Best Practices for Information Architecture and Enterprise SearchAgnes Molnar - Best Practices for Information Architecture and Enterprise Search
Agnes Molnar - Best Practices for Information Architecture and Enterprise Search
 
Search
SearchSearch
Search
 
SPSBE building an faq for end users
SPSBE building an faq for end usersSPSBE building an faq for end users
SPSBE building an faq for end users
 
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
 
Workshop - Ways of Working Within the M365 Workspace.pptx
Workshop - Ways of Working Within the M365 Workspace.pptxWorkshop - Ways of Working Within the M365 Workspace.pptx
Workshop - Ways of Working Within the M365 Workspace.pptx
 
Optimizing SharePoint for Transactional Content Management
Optimizing SharePoint for Transactional Content ManagementOptimizing SharePoint for Transactional Content Management
Optimizing SharePoint for Transactional Content Management
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
SDL Tridion at the RSPB (2010)
SDL Tridion at the RSPB (2010)SDL Tridion at the RSPB (2010)
SDL Tridion at the RSPB (2010)
 
Domain Specific Development using T4
Domain Specific Development using T4Domain Specific Development using T4
Domain Specific Development using T4
 
#SPSLondon - Session 1 - Building an faq for end users
#SPSLondon - Session 1 - Building an faq for end users#SPSLondon - Session 1 - Building an faq for end users
#SPSLondon - Session 1 - Building an faq for end users
 
DNNcon 2016: Are There Security Flaws in Your DNN Modules?
DNNcon 2016: Are There Security Flaws in Your DNN Modules?DNNcon 2016: Are There Security Flaws in Your DNN Modules?
DNNcon 2016: Are There Security Flaws in Your DNN Modules?
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Dnn Con Baltimore Security Flaws
Dnn Con Baltimore Security FlawsDnn Con Baltimore Security Flaws
Dnn Con Baltimore Security Flaws
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Search features and architecture in DNN 7.1

  • 1. 7.1 Search and Lucene.Net Ash Prasad Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 2. Agenda • • • • • History and New Objectives Architecture Lucene / Lucene.Net Crawlers, Entities, Controllers Ranking, Synonyms, Ignore Words, Stemming • Security Trimming • Module Integration, New Crawler Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 3. History of Search ISearchable • Platform Edition • SQL Server • ISearchable Scheduler Module Module SQL • Commercial Edition • Lucene 2.9.2 • URL and Files Scheduler Lucene Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 4. Objectives of New Search • Handle diverse Content • CMS, Social, Localized, 3rd Party Modules) • Consistent User Experience • Simple for Module Developers • Uniform Architecture • Feature based differentiation Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 5. Architecture Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 6. What‟s Lucene • • • • • • Java-based indexing and search technology Managed by Apache NOSQL database Near real-time, Spellchecking, Highlighting, Ranking, Synonyms Many companies use Lucene directly or customize Facebook‟s Graph search uses similar „Inverted Index‟ Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 7. What‟s Lucene.Net • • • • Line-by-line port from Java to C# Maintains high-performance requirements A bit behind Java releases Who Uses Lucene.Net • Products - RavenDB, Orchard, Umbraco, SubText • Commercial Sites – BBC UK Top Gear, AutoDesk, Koders.Com Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 8. Lucene – A Document Store • Flexible Schema • Consists of Documents • Which are collection of Fields • Documents can have different set of Fields • • Field(“ID”,”xxx-yyy-999”), Field(“Title”, “My best doc”) Field(“Owner”,”Ash”), Field(“Locale”,”en-US”) Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 9. Lucene – A Document Store (Contd.) • Denormalized (No Referential Integrity) • Deletion – Done through a flag • Compact reclaims deleted space • Update is Delete + Insert • Boost = Ranking • Unicode compliant Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 10. Book consulted for Search • Book on version 3.0 • ~ 500 pages • Very useful Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 11. Search Phases Content Acquisition Content Indexing • • • • • • • • • • Crawling ISearchable ModuleSearchBase URL Doc / PDF Text Analysis Ranking Synonyms Ignore Words Stemming Don’t forget to include #DNNCon in your tweets! Content Search • • • • • Querying Sorting Security Trimming Boolean Search Highlighting @DNNCon
  • 12. Crawlers • Platform • Site Crawler • • Module and Tab Metadata Module Content (ModuleSearchBase/ISearchable) • Commercial Edition • File Crawler • Uses IFilter for extraction of text PDF/Office files • URL Crawler • Internal and External URLs Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 13. Search Entities • SearchType • Distinguishes Crawlers • SearchDocument • Properties for a Content • Stored in the Index • SearchQuery • Parameters to execute a Query • SearchResult • Derived from SearchDocument Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 14. Search Entities – Indexing vs. Querying Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 15. Controllers • SearchController • For Querying • InternalSearchController • For Adding / Updating / Deleting • LuceneController • Interacts with Lucene Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 16. Ranking = Boosting • Doc and/or Field can be boosted in Lucene • DNN does Field boosts (Default - 10) • • • • • Title (50) Tag (40) Keyword (35) Description (20) Author (15) • Configured manually by HostSettings Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 17. Synonyms and Ignore Words • Synonyms are injected into Index • Ignore Words are removed from Index Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 18. Stemming • Convert words to its root • PorterStemFilter is used • Country and Countries = countri • breathe, breathes, breathing, breathed = breath • fishing, fished, fisher = fish Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 19. Security Trimming • Done through Collectors (Callback) • Each Doc found is sent to Collector • Collector rejects/accept per Permission • Site Crawler - Module / Tab Permission • File Crawler - Folder Permission • User Crawler – Profile Permission Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 20. Module Integration • ModuleSearchBase • New abstract class with just one method • Defined in BusinessControllerClass • GetModifiedSearchDocuments • • • Returns New, Changed and Deleted content Delta based Granular Permission, Localization, etc. • ISearchable continues to work (no delta) Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 21. New Crawler (How to) • Define a new SearchType • Optionally use IsPrivate to hide from site search • Implement BaseResultController (2 methods) • HasViewPermission • GetDocUrl • Create Scheduled Task • Call AddSearchDocuments to inject contentforget to include #DNNCon in your tweets! @DNNCon Don’t
  • 22. Demo Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 23. Recap • • • • New Search uses Lucene.Net Platform has Site Crawler Commercial has URL and File Crawlers Modules to implement ModuleSearchBase • New Crawler implements BaseResultController Don’t forget to include #DNNCon in your tweets! @DNNCon
  • 24. THANKS TO ALL OF OUR GENEROUS SPONSORS! Don’t forget to include #DNNCon in your tweets! @DNNCon