SlideShare a Scribd company logo
1 of 32
Download to read offline
Faceted Search

   New York CTO Club
   December 9, 2009



 Daniel Tunkelang, Google
Otis Gospodneti!, Sematext
Agenda
Daniel:
!
    What is faceted search?
!
    Why use faceted search?
!
    Thoughts about design and user experience.


Otis:
!
    What are Lucene and Solr?
!
    Why use an open-source search library?
!
    Thoughts about implementation.
“Regular” Search
Interface:
!   User expresses information need as short query.
!   Search engine returns ranked, pageable result set.

User happy when...
!   Top-ranked result satisfies information need.
!   At least some result on first page is relevant.

User unhappy when...
!   No result on first page satisfies information need.
!   Results misleadingly appear relevant (bait and switch).
Relevance Is Subjective

Relevance is defined as a measure of
information conveyed by a document relative to
a query.

It is shown that the relationship between the
document and the query, though necessary, is
not sufficient to determine relevance.


William Goffman, On relevance as a measure, 1964.
Regular Search Experience
Assumptions Are Dangerous
                      !
                          self-awareness

  tf-idf
           PageRank   !
                          self-expression

                      !
                          model knows best

                      !
                          answer is a document

                      !
                          one-shot query
What is Faceted Search?
!   Best understood through examples.
       "   See the following slides.
       "   Or shop on almost any ecommerce site.
!   Facets = multiple ways to organize information.
       "   Often based on available structured information.
       "   But not always, e.g., facets obtained via text mining.
!   Typical interaction:
       "   User starts with a full-text search.
       "   Facets guide query refinement process.
Faceted Search for News
Faceted Search for People
Faceted Search for Breakfast
But Facets are Not a Silver Bullet...
!   Screen real estate is finite.
       "   Choose facets wisely.
       "   Choose facet values wisely for monster facets.
!   Multiple selection within a facet is powerful, but...
       "   Has to be intuitive, especially AND vs. OR.
       "   Even trickier for hierarchical facets.
!   Search relevance still matters!
       "   Most faceted search applications rank results.
       "   Irrelevant results " irrelevant facet refinements.
Exploring Information Science
Deliver Precision and Recall




Easier said than done!

Ranking of facet values is an open research topic.
Be Careful with Faceted Search!



     Cameras have artists?!
Clarify, Then Refine
Take-Aways
!   Faceted search addresses the subjectivity of
    relevance and information overload.
!   But deploying faceted search effectively
    requires that you think about user experience.
!   Recommended reading:
       "   My thin book entitled Faceted Search
       "   Marti Hearst's book on Search User Interfaces
       "   Peter Morville's upcoming book on Search Patterns
Faceted Search with Lucene & Solr




         Otis Gospodneti!, Sematext
What is / isn't Lucene
!   Free, ASL, Java IR library, Jar
!   Doug Cutting, ASF, 2001
!   Application agnostic: Indexing & Searching
!   High performance, scalable
!   No dependencies
!   Heavily ported
!   No: crawler, rich doc parser, turn-key solution
!   No: out of the box faceted search-capability... but...
What is/isn't Solr
!
    Indexing/Search server with HTTP API built on
    top of Lucene
!
    Fast & scalable (distributed search, index
    replication)#
!
    XML, JSON, Ruby, Perl, PHP, javabin
!
    No: crawler (but Nutch ==> Solr works)#
!
    Yes: rich text parser
!
    Yes: Faceted Search out of the box!
Solr and Faceted Search
!
    3 Types of facets: Field Values (text), Dates,
    Queries.
!
    “Text”: return counts for all/top terms in a field
    for a result set - e.g. categories a la Amazon
!
    Dates: return counts for docs in specified date
    ranges
!
    Queries: return counts for docs that also match
    a given query - handy for number ranges (think
    prices!)#
Facet Field Requirements
!
    Must be indexed
!
    Often not tokenized
!
    Often not altered (lowercase, punctuation)#
!
    Storing not required
!
    Multivalued fields OK
Turn It On
!
    0 facets:
    !
        http://host:80/solr/select?q=foo

!
    1 facet:
    !
        http://host:80/solr/select?q=foo&facet=true&facet.field=category

!
    N facets:
    !
        http://host:80/solr/select?
        q=foo&facet=true&facet.field=category&facet.field=inStock

!
    facet=true or facet.on
Text Facet Response
<result numFound="4" start="0"/>
                                       !
                                           facet.mincount=1 to
<lst name="facet_counts">

<lst name="facet_fields">
                                           avoid 0-count facet
 <lst name="category">                     values
     <int name="electronics">3</int>   !
                                           facet.limit=N to limit to
     <int name="copier">0</int>
                                           top N facet values
 </lst>

 <lst name="inStock">                  !
                                           facet.missing=true to
     <int name="false">3</int>             catch uncategorized
     <int name="true">1</int>

 </lst>
                                       !
                                           lots of other options!
</lst>

</lst>
Date Facets
!
    http://.../solr/select/?
    q=*:*&rows=0&facet=true&facet.date=timesta
    mp&facet.date.start=NOW/DAY-
    5DAYS&facet.date.end=NOW/DAY
    %2B1DAY&facet.date.gap=%2B1DAY
!
    (%2B1 ==> +1)#
!
    Solr Date Math Parser syntax: /HOUR,
    +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS,
    +6MONTHS+3DAYS/DAY
Date Facet Response
<result name="response" numFound="42" start="0"/>

<lst name="facet_counts">

<lst name="facet_dates">

 <lst name="timestamp">

     <int name="2007-08-11T00:00:00.000Z">1</int>

     <int name="2007-08-12T00:00:00.000Z">5</int>

     <int name="2007-08-13T00:00:00.000Z">3</int>

     <int name="2007-08-14T00:00:00.000Z">7</int>

     <int name="2007-08-15T00:00:00.000Z">2</int>

     <int name="2007-08-16T00:00:00.000Z">16</int>

     <str name="gap">+1DAY</str>

     <date name="end">2007-08-17T00:00:00Z</date>

 </lst>
Query Facets
!
    http://.../solr/select?
    q=shoes&rows=0&facet=true&facet.field=inStoc
    k&facet.query=price:
    [*+TO+500]&facet.query=price:[500+TO+*]
!
    Avoids the bucket-at-index-time work-around
!
    Keep queries disjoint
Query Facet Response
<result numFound="3" start="0"/>

<lst name="facet_counts">

<lst name="facet_queries">

 <int name="price:[* TO 500]">3</int>

 <int name="price:[500 TO *]">1</int>

</lst>

<lst name="facet_fields">

 <lst name="inStock">

     <int name="false">3</int>

     <int name="true">1</int>

 </lst>

</lst>

</lst>
UI Integration
!
    Use Filter Queries via fq
!
    http://.../solr/select?
    q=shoes&facet=true&facet.field=category&
    fq=price:[0 TO 300]
!
    http://.../solr/select?
    q=shoes&facet=true&facet.field=category&
    fq=price:[0 TO 300]&fq=inStock:true
!
    Important: single request does it all
State of Lucene & Solr
!
    Super healthy community, exploding
    development
!
    Lucene 3.0 – 2009-11-25:
       !
           Performance, faster range queries, clean API, better
           Unicode support, more non-English support
!
    Solr 1.4 – 2009-11-10:
       !
           Performance, new replication, Db indexing, rich-doc
           indexing, results clustering, faster response protocol,
           deduplication...
Lucene, Solr, Enterprise
!
    Free: Community
       !
           Lucene ~ 600 emails/month (dev: 2000/month)#
       !
           Solr ~1300 emails/month (dev: 800/month)#


!
    Commercial: Support Subscriptions
       !
           Sematext
       !
           Lucid Imagination

More Related Content

Viewers also liked

Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsShakas Technologies
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...Earley Information Science
 
Designing For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationDesigning For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationJim Kalbach
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)Jonathon Colman
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in LibrariesLaura Loveday Maury
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Allotment Digital Marketing
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemesNadeem Nazir
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.Khushboo Shaukat
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines PresentationJSCHO9
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 
Functional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateFunctional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateAmye Scavarda
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
4150415
41504154150415
4150415kombi9
 

Viewers also liked (17)

Data mining
Data miningData mining
Data mining
 
Automatically mining facets for queries from their search results
Automatically mining facets for queries from their search resultsAutomatically mining facets for queries from their search results
Automatically mining facets for queries from their search results
 
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
SharePoint Jumpstart #3: Navigation, Metadata, & Faceted Search: Approaches &...
 
Designing For Discovery With Faceted Navigation
Designing For Discovery With Faceted NavigationDesigning For Discovery With Faceted Navigation
Designing For Discovery With Faceted Navigation
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)The Four Pillars of Search Engine Optimization (SEO)
The Four Pillars of Search Engine Optimization (SEO)
 
Faceted Classification System in Libraries
Faceted Classification System in LibrariesFaceted Classification System in Libraries
Faceted Classification System in Libraries
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
 
Comparative study of major classification schemes
Comparative study of major classification schemesComparative study of major classification schemes
Comparative study of major classification schemes
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Functional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A PirateFunctional requirements: Thinking Like A Pirate
Functional requirements: Thinking Like A Pirate
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
4150415
41504154150415
4150415
 

More from Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query UnderstandingDaniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and ContextDaniel Tunkelang
 

More from Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Faceted Search Nycto Talk

  • 1. Faceted Search New York CTO Club December 9, 2009 Daniel Tunkelang, Google Otis Gospodneti!, Sematext
  • 2. Agenda Daniel: ! What is faceted search? ! Why use faceted search? ! Thoughts about design and user experience. Otis: ! What are Lucene and Solr? ! Why use an open-source search library? ! Thoughts about implementation.
  • 3. “Regular” Search Interface: ! User expresses information need as short query. ! Search engine returns ranked, pageable result set. User happy when... ! Top-ranked result satisfies information need. ! At least some result on first page is relevant. User unhappy when... ! No result on first page satisfies information need. ! Results misleadingly appear relevant (bait and switch).
  • 4. Relevance Is Subjective Relevance is defined as a measure of information conveyed by a document relative to a query. It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance. William Goffman, On relevance as a measure, 1964.
  • 6. Assumptions Are Dangerous ! self-awareness tf-idf PageRank ! self-expression ! model knows best ! answer is a document ! one-shot query
  • 7. What is Faceted Search? ! Best understood through examples. " See the following slides. " Or shop on almost any ecommerce site. ! Facets = multiple ways to organize information. " Often based on available structured information. " But not always, e.g., facets obtained via text mining. ! Typical interaction: " User starts with a full-text search. " Facets guide query refinement process.
  • 10. Faceted Search for Breakfast
  • 11.
  • 12. But Facets are Not a Silver Bullet... ! Screen real estate is finite. " Choose facets wisely. " Choose facet values wisely for monster facets. ! Multiple selection within a facet is powerful, but... " Has to be intuitive, especially AND vs. OR. " Even trickier for hierarchical facets. ! Search relevance still matters! " Most faceted search applications rank results. " Irrelevant results " irrelevant facet refinements.
  • 14. Deliver Precision and Recall Easier said than done! Ranking of facet values is an open research topic.
  • 15. Be Careful with Faceted Search! Cameras have artists?!
  • 17. Take-Aways ! Faceted search addresses the subjectivity of relevance and information overload. ! But deploying faceted search effectively requires that you think about user experience. ! Recommended reading: " My thin book entitled Faceted Search " Marti Hearst's book on Search User Interfaces " Peter Morville's upcoming book on Search Patterns
  • 18. Faceted Search with Lucene & Solr Otis Gospodneti!, Sematext
  • 19. What is / isn't Lucene ! Free, ASL, Java IR library, Jar ! Doug Cutting, ASF, 2001 ! Application agnostic: Indexing & Searching ! High performance, scalable ! No dependencies ! Heavily ported ! No: crawler, rich doc parser, turn-key solution ! No: out of the box faceted search-capability... but...
  • 20.
  • 21. What is/isn't Solr ! Indexing/Search server with HTTP API built on top of Lucene ! Fast & scalable (distributed search, index replication)# ! XML, JSON, Ruby, Perl, PHP, javabin ! No: crawler (but Nutch ==> Solr works)# ! Yes: rich text parser ! Yes: Faceted Search out of the box!
  • 22. Solr and Faceted Search ! 3 Types of facets: Field Values (text), Dates, Queries. ! “Text”: return counts for all/top terms in a field for a result set - e.g. categories a la Amazon ! Dates: return counts for docs in specified date ranges ! Queries: return counts for docs that also match a given query - handy for number ranges (think prices!)#
  • 23. Facet Field Requirements ! Must be indexed ! Often not tokenized ! Often not altered (lowercase, punctuation)# ! Storing not required ! Multivalued fields OK
  • 24. Turn It On ! 0 facets: ! http://host:80/solr/select?q=foo ! 1 facet: ! http://host:80/solr/select?q=foo&facet=true&facet.field=category ! N facets: ! http://host:80/solr/select? q=foo&facet=true&facet.field=category&facet.field=inStock ! facet=true or facet.on
  • 25. Text Facet Response <result numFound="4" start="0"/> ! facet.mincount=1 to <lst name="facet_counts"> <lst name="facet_fields"> avoid 0-count facet <lst name="category"> values <int name="electronics">3</int> ! facet.limit=N to limit to <int name="copier">0</int> top N facet values </lst> <lst name="inStock"> ! facet.missing=true to <int name="false">3</int> catch uncategorized <int name="true">1</int> </lst> ! lots of other options! </lst> </lst>
  • 26. Date Facets ! http://.../solr/select/? q=*:*&rows=0&facet=true&facet.date=timesta mp&facet.date.start=NOW/DAY- 5DAYS&facet.date.end=NOW/DAY %2B1DAY&facet.date.gap=%2B1DAY ! (%2B1 ==> +1)# ! Solr Date Math Parser syntax: /HOUR, +2YEARS, -1DAY, /DAY+6MONTHS+3DAYS, +6MONTHS+3DAYS/DAY
  • 27. Date Facet Response <result name="response" numFound="42" start="0"/> <lst name="facet_counts"> <lst name="facet_dates"> <lst name="timestamp"> <int name="2007-08-11T00:00:00.000Z">1</int> <int name="2007-08-12T00:00:00.000Z">5</int> <int name="2007-08-13T00:00:00.000Z">3</int> <int name="2007-08-14T00:00:00.000Z">7</int> <int name="2007-08-15T00:00:00.000Z">2</int> <int name="2007-08-16T00:00:00.000Z">16</int> <str name="gap">+1DAY</str> <date name="end">2007-08-17T00:00:00Z</date> </lst>
  • 28. Query Facets ! http://.../solr/select? q=shoes&rows=0&facet=true&facet.field=inStoc k&facet.query=price: [*+TO+500]&facet.query=price:[500+TO+*] ! Avoids the bucket-at-index-time work-around ! Keep queries disjoint
  • 29. Query Facet Response <result numFound="3" start="0"/> <lst name="facet_counts"> <lst name="facet_queries"> <int name="price:[* TO 500]">3</int> <int name="price:[500 TO *]">1</int> </lst> <lst name="facet_fields"> <lst name="inStock"> <int name="false">3</int> <int name="true">1</int> </lst> </lst> </lst>
  • 30. UI Integration ! Use Filter Queries via fq ! http://.../solr/select? q=shoes&facet=true&facet.field=category& fq=price:[0 TO 300] ! http://.../solr/select? q=shoes&facet=true&facet.field=category& fq=price:[0 TO 300]&fq=inStock:true ! Important: single request does it all
  • 31. State of Lucene & Solr ! Super healthy community, exploding development ! Lucene 3.0 – 2009-11-25: ! Performance, faster range queries, clean API, better Unicode support, more non-English support ! Solr 1.4 – 2009-11-10: ! Performance, new replication, Db indexing, rich-doc indexing, results clustering, faster response protocol, deduplication...
  • 32. Lucene, Solr, Enterprise ! Free: Community ! Lucene ~ 600 emails/month (dev: 2000/month)# ! Solr ~1300 emails/month (dev: 800/month)# ! Commercial: Support Subscriptions ! Sematext ! Lucid Imagination