SlideShare a Scribd company logo
1 of 32
Download to read offline
Faceted Search

   New York CTO Club
   December 9, 2009



 Daniel Tunkelang, Google
Otis Gospodneti!, Sematext
Agenda
Daniel:
!
    What is faceted search?
!
    Why use faceted search?
!
    Thoughts about design and user experience.


Otis:
!
    What are Lucene and Solr?
!
    Why use an open-source search library?
!
    Thoughts about implementation.
“Regular” Search
Interface:
!   User expresses information need as short query.
!   Search engine returns ranked, pageable result set.

User happy when...
!   Top-ranked result satisfies information need.
!   At least some result on first page is relevant.

User unhappy when...
!   No result on first page satisfies information need.
!   Results misleadingly appear relevant (bait and switch).
Relevance Is Subjective

Relevance is defined as a measure of
information conveyed by a document relative to
a query.

It is shown that the relationship between the
document and the query, though necessary, is
not sufficient to determine relevance.


William Goffman, On relevance as a measure, 1964.
Regular Search Experience
Assumptions Are Dangerous
                      !
                          self-awareness

  tf-idf
           PageRank   !
                          self-expression

                      !
                          model knows best

                      !
                          answer is a document

                      !
                          one-shot query
What is Faceted Search?
!   Best understood through examples.
       "   See the following slides.
       "   Or shop on almost any ecommerce site.
!   Facets = multiple ways to organize information.
       "   Often based on available structured information.
       "   But not always, e.g., facets obtained via text mining.
!   Typical interaction:
       "   User starts with a full-text search.
       "   Facets guide query refinement process.
Faceted Search for News
Faceted Search for People
Faceted Search for Breakfast
But Facets are Not a Silver Bullet...
!   Screen real estate is finite.
       "   Choose facets wisely.
       "   Choose facet values wisely for monster facets.
!   Multiple selection within a facet is powerful, but...
       "   Has to be intuitive, especially AND vs. OR.
       "   Even trickier for hierarchical facets.
!   Search relevance still matters!
       "   Most faceted search applications rank results.
       "   Irrelevant results " irrelevant facet refinements.
Exploring Information Science
Deliver Precision and Recall




Easier said than done!

Ranking of facet values is an open research topic.
Be Careful with Faceted Search!



     Cameras have artists?!
Clarify, Then Refine
Take-Aways
!   Faceted search addresses the subjectivity of
    relevance and information overload.
!   But deploying faceted search effectively
    requires that you think about user experience.
!   Recommended reading:
       "   My thin book entitled Faceted Search
       "   Marti Hearst's book on Search User Interfaces
       "   Peter Morville's upcoming book on Search Patterns
Faceted Search with Lucene & Solr




         Otis Gospodneti!, Sematext
What is / isn't Lucene
!   Free, ASL, Java IR library, Jar
!   Doug Cutting, ASF, 2001
!   Application agnostic: Indexing & Searching
!   High performance, scalable
!   No dependencies
!   Heavily ported
!   No: crawler, rich doc parser, turn-key solution
!   No: out of the box faceted search-capability... but...
What is/isn't Solr
!
    Indexing/Search server with HTTP API built on
    top of Lucene
!
    Fast & scalable (distributed search, index
    replication)#
!
    XML, JSON, Ruby, Perl, PHP, javabin
!
    No: crawler (but Nutch ==> Solr works)#
!
    Yes: rich text parser
!
    Yes: Faceted Search out of the box!
Solr and Faceted Search
!
    3 Types of facets: Field Values (text), Dates,
    Queries.
!
    “Text”: return counts for all/top terms in a field
    for a result set - e.g. categories a la Amazon
!
    Dates: return counts for docs in specified date
    ranges
!
    Queries: return counts for docs that also match
    a given query - handy for number ranges (think
    prices!)#
Facet Field Requirements
!
    Must be indexed
!
    Often not tokenized
!
    Often not altered (lowercase, punctuation)#
!
    Storing not required
!
    Multivalued fields OK
Turn It On
!
    0 facets:
    !
        http://host:80/solr/select?q=foo

!
    1 facet:
    !
        http://host:80/solr/select?q=foo&facet=true&facet.field=category

!
    N facets:
    !
        http://host:80/solr/select?
        q=foo&facet=true&facet.field=category&facet.field=inStock

!
    facet=true or facet.on
Text Facet Response
<result numFound="4" start="0"/>
                                       !
                                           facet.mincount=1 to
<lst name="facet_counts">

<lst name="facet_fields">
                                           avoid 0-count facet
 <lst name="category">                     values
     <int name="electronics">3</int>   !
                                           facet.limit=N to limit to
     <int name="copier">0</int>
                                           top N facet values
 </lst>

 <lst name="inStock">                  !
                                           facet.missing=true to
     <int name="false">3</int>             catch uncategorized
     <int name="true">1</int>

 </lst>
                                       !
                                           lots of other options!
</lst>

</lst>
Date Facets
!
    http://.../solr/select/?
    q=*:*&rows=0&facet=true&facet.date=timesta
    mp&facet.date.start=NOW/DAY-
    5DAYS&facet.date.end=NOW/DAY
    %2B1DAY&facet.date.gap=%2B1DAY
!
    (%2B1 ==> +1)#
!
    Solr Date Math Parser syntax: /HOUR,
    +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS,
    +6MONTHS+3DAYS/DAY
Date Facet Response
<result name="response" numFound="42" start="0"/>

<lst name="facet_counts">

<lst name="facet_dates">

 <lst name="timestamp">

     <int name="2007-08-11T00:00:00.000Z">1</int>

     <int name="2007-08-12T00:00:00.000Z">5</int>

     <int name="2007-08-13T00:00:00.000Z">3</int>

     <int name="2007-08-14T00:00:00.000Z">7</int>

     <int name="2007-08-15T00:00:00.000Z">2</int>

     <int name="2007-08-16T00:00:00.000Z">16</int>

     <str name="gap">+1DAY</str>

     <date name="end">2007-08-17T00:00:00Z</date>

 </lst>
Query Facets
!
    http://.../solr/select?
    q=shoes&rows=0&facet=true&facet.field=inStoc
    k&facet.query=price:
    [*+TO+500]&facet.query=price:[500+TO+*]
!
    Avoids the bucket-at-index-time work-around
!
    Keep queries disjoint
Query Facet Response
<result numFound="3" start="0"/>

<lst name="facet_counts">

<lst name="facet_queries">

 <int name="price:[* TO 500]">3</int>

 <int name="price:[500 TO *]">1</int>

</lst>

<lst name="facet_fields">

 <lst name="inStock">

     <int name="false">3</int>

     <int name="true">1</int>

 </lst>

</lst>

</lst>
UI Integration
!
    Use Filter Queries via fq
!
    http://.../solr/select?
    q=shoes&facet=true&facet.field=category&
    fq=price:[0 TO 300]
!
    http://.../solr/select?
    q=shoes&facet=true&facet.field=category&
    fq=price:[0 TO 300]&fq=inStock:true
!
    Important: single request does it all
State of Lucene & Solr
!
    Super healthy community, exploding
    development
!
    Lucene 3.0 – 2009-11-25:
       !
           Performance, faster range queries, clean API, better
           Unicode support, more non-English support
!
    Solr 1.4 – 2009-11-10:
       !
           Performance, new replication, Db indexing, rich-doc
           indexing, results clustering, faster response protocol,
           deduplication...
Lucene, Solr, Enterprise
!
    Free: Community
       !
           Lucene ~ 600 emails/month (dev: 2000/month)#
       !
           Solr ~1300 emails/month (dev: 800/month)#


!
    Commercial: Support Subscriptions
       !
           Sematext
       !
           Lucid Imagination

More Related Content

Viewers also liked

Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsShakas Technologies
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...Earley Information Science
 
Designing For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationDesigning For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationJim Kalbach
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)Jonathon Colman
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in LibrariesLaura Loveday Maury
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Allotment Digital Marketing
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemesNadeem Nazir
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.Khushboo Shaukat
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines PresentationJSCHO9
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
Functional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateFunctional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateAmye Scavarda
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
4150415
41504154150415
4150415kombi9
 

Viewers also liked (17)

Data mining
Data miningData mining
Data mining
 
Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search results
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
 
Designing For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationDesigning For Discovery With Faceted Navigation
Designing For Discovery With Faceted Navigation
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Functional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateFunctional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A Pirate
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
4150415
41504154150415
4150415
 

More from Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 

More from Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Faceted Search Nycto Talk

  • 1. Faceted Search New York CTO Club December 9, 2009 Daniel Tunkelang, Google Otis Gospodneti!, Sematext
  • 2. Agenda Daniel: ! What is faceted search? ! Why use faceted search? ! Thoughts about design and user experience. Otis: ! What are Lucene and Solr? ! Why use an open-source search library? ! Thoughts about implementation.
  • 3. “Regular” Search Interface: ! User expresses information need as short query. ! Search engine returns ranked, pageable result set. User happy when... ! Top-ranked result satisfies information need. ! At least some result on first page is relevant. User unhappy when... ! No result on first page satisfies information need. ! Results misleadingly appear relevant (bait and switch).
  • 4. Relevance Is Subjective Relevance is defined as a measure of information conveyed by a document relative to a query. It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance. William Goffman, On relevance as a measure, 1964.
  • 6. Assumptions Are Dangerous ! self-awareness tf-idf PageRank ! self-expression ! model knows best ! answer is a document ! one-shot query
  • 7. What is Faceted Search? ! Best understood through examples. " See the following slides. " Or shop on almost any ecommerce site. ! Facets = multiple ways to organize information. " Often based on available structured information. " But not always, e.g., facets obtained via text mining. ! Typical interaction: " User starts with a full-text search. " Facets guide query refinement process.
  • 10. Faceted Search for Breakfast
  • 11.
  • 12. But Facets are Not a Silver Bullet... ! Screen real estate is finite. " Choose facets wisely. " Choose facet values wisely for monster facets. ! Multiple selection within a facet is powerful, but... " Has to be intuitive, especially AND vs. OR. " Even trickier for hierarchical facets. ! Search relevance still matters! " Most faceted search applications rank results. " Irrelevant results " irrelevant facet refinements.
  • 14. Deliver Precision and Recall Easier said than done! Ranking of facet values is an open research topic.
  • 15. Be Careful with Faceted Search! Cameras have artists?!
  • 17. Take-Aways ! Faceted search addresses the subjectivity of relevance and information overload. ! But deploying faceted search effectively requires that you think about user experience. ! Recommended reading: " My thin book entitled Faceted Search " Marti Hearst's book on Search User Interfaces " Peter Morville's upcoming book on Search Patterns
  • 18. Faceted Search with Lucene & Solr Otis Gospodneti!, Sematext
  • 19. What is / isn't Lucene ! Free, ASL, Java IR library, Jar ! Doug Cutting, ASF, 2001 ! Application agnostic: Indexing & Searching ! High performance, scalable ! No dependencies ! Heavily ported ! No: crawler, rich doc parser, turn-key solution ! No: out of the box faceted search-capability... but...
  • 20.
  • 21. What is/isn't Solr ! Indexing/Search server with HTTP API built on top of Lucene ! Fast & scalable (distributed search, index replication)# ! XML, JSON, Ruby, Perl, PHP, javabin ! No: crawler (but Nutch ==> Solr works)# ! Yes: rich text parser ! Yes: Faceted Search out of the box!
  • 22. Solr and Faceted Search ! 3 Types of facets: Field Values (text), Dates, Queries. ! “Text”: return counts for all/top terms in a field for a result set - e.g. categories a la Amazon ! Dates: return counts for docs in specified date ranges ! Queries: return counts for docs that also match a given query - handy for number ranges (think prices!)#
  • 23. Facet Field Requirements ! Must be indexed ! Often not tokenized ! Often not altered (lowercase, punctuation)# ! Storing not required ! Multivalued fields OK
  • 24. Turn It On ! 0 facets: ! http://host:80/solr/select?q=foo ! 1 facet: ! http://host:80/solr/select?q=foo&facet=true&facet.field=category ! N facets: ! http://host:80/solr/select? q=foo&facet=true&facet.field=category&facet.field=inStock ! facet=true or facet.on
  • 25. Text Facet Response <result numFound="4" start="0"/> ! facet.mincount=1 to <lst name="facet_counts"> <lst name="facet_fields"> avoid 0-count facet <lst name="category"> values <int name="electronics">3</int> ! facet.limit=N to limit to <int name="copier">0</int> top N facet values </lst> <lst name="inStock"> ! facet.missing=true to <int name="false">3</int> catch uncategorized <int name="true">1</int> </lst> ! lots of other options! </lst> </lst>
  • 26. Date Facets ! http://.../solr/select/? q=*:*&rows=0&facet=true&facet.date=timesta mp&facet.date.start=NOW/DAY- 5DAYS&facet.date.end=NOW/DAY %2B1DAY&facet.date.gap=%2B1DAY ! (%2B1 ==> +1)# ! Solr Date Math Parser syntax: /HOUR, +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS, +6MONTHS+3DAYS/DAY
  • 27. Date Facet Response <result name="response" numFound="42" start="0"/> <lst name="facet_counts"> <lst name="facet_dates"> <lst name="timestamp"> <int name="2007-08-11T00:00:00.000Z">1</int> <int name="2007-08-12T00:00:00.000Z">5</int> <int name="2007-08-13T00:00:00.000Z">3</int> <int name="2007-08-14T00:00:00.000Z">7</int> <int name="2007-08-15T00:00:00.000Z">2</int> <int name="2007-08-16T00:00:00.000Z">16</int> <str name="gap">+1DAY</str> <date name="end">2007-08-17T00:00:00Z</date> </lst>
  • 28. Query Facets ! http://.../solr/select? q=shoes&rows=0&facet=true&facet.field=inStoc k&facet.query=price: [*+TO+500]&facet.query=price:[500+TO+*] ! Avoids the bucket-at-index-time work-around ! Keep queries disjoint
  • 29. Query Facet Response <result numFound="3" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"> <int name="price:[* TO 500]">3</int> <int name="price:[500 TO *]">1</int> </lst> <lst name="facet_fields"> <lst name="inStock"> <int name="false">3</int> <int name="true">1</int> </lst> </lst> </lst>
  • 30. UI Integration ! Use Filter Queries via fq ! http://.../solr/select? q=shoes&facet=true&facet.field=category& fq=price:[0 TO 300] ! http://.../solr/select? q=shoes&facet=true&facet.field=category& fq=price:[0 TO 300]&fq=inStock:true ! Important: single request does it all
  • 31. State of Lucene & Solr ! Super healthy community, exploding development ! Lucene 3.0 – 2009-11-25: ! Performance, faster range queries, clean API, better Unicode support, more non-English support ! Solr 1.4 – 2009-11-10: ! Performance, new replication, Db indexing, rich-doc indexing, results clustering, faster response protocol, deduplication...
  • 32. Lucene, Solr, Enterprise ! Free: Community ! Lucene ~ 600 emails/month (dev: 2000/month)# ! Solr ~1300 emails/month (dev: 800/month)# ! Commercial: Support Subscriptions ! Sematext ! Lucid Imagination