Started publishing some code here: https://github.com/Nagyman/django-faceted-search Lots to do to make it more generally useful as it was extracted from a larger project.
2. What is Faceted Navigation/Browsing?
• Visible options for narrowing a set of items based on metadata; interactive
query building
• Important for e-commerce sites with a significant number of products; shared
classifications/metadata
• Browse vs Search, Links vs. Forms
3.
4.
5. Engines with Faceted Search
• Commercial: Endeca, FAST Search Server (Microsoft, SharePoint)
• Open Source: solr (backed by Apache Lucene), Whoosh, Sphinx (“multi-
queries”)
• SQL - COUNT, GROUP BY, JOINs, oh my
6. Solr
• Full text, faceted, distributed, amazing performance (Java)
• Lucene syntax - title:”The Right Way” AND text:go, text:swim?ing, test~, craig
nagy^4
• JSON/XML API - transparent with Haystack
• Run in a servlet container (e.g. Tomcat); Haystack provides easy local testing
(Jetty) - manage.py solr --start
• Data is denormalized; documents based on a schema
7. Solr Schema
• All fields to index in schema.xml. Haystack has tools to help generate this file
(manage.py build_solr_schema)
• _exact fields for facets, multiValued=”true” for lists of data, data types
8. Haystack
• Provides QuerySet-like API to a number of search backends
haystack.query.SearchQuerySet
• sqs = SearchQuerySet().filter(content='foo',
pub_date__lte=datetime.date(2012, 1, 1))
• Mirror your models, and any value you want to index
• Automatic content_type handling
• Performance Tip: Don’t access unindexed data
9. Haystack - Indexing
• subclass haystack.indexes.SearchIndex.
• prepare_<fieldname> or prepare callbacks
• manage.py update_index or real-time
11. Not Included
• Not included in Haystack
• Views, templates, utilities, generating URLs, handling special data types
• Custom Helpers: Searcher, FacetList, Facet, FacetItem, templatetags
• Extras: Facet Landing Pages (e.g. Etsy, Zappos, G Adventures)
* (size, price, brand, type, categories).\n* improving findability\n* like products (e.g computers) not unlike products (books & cars)\n\n
AirBnb, Zappos, ToysRUs\n
Browse vs Search\nLinks vs Forms\nOften little or &#x201C;No Results&#x201D;\nUnpredictable combinations\n
SQL - Don&#x2019;t torture yourself; not made for search\n
* Lucene: boosting, boolean, wildcards, ranges, grouping, etc. But Haystack takes care of this, unless you need more advanced searching\n* avoid hitting your DB\n
starting schema provided with Haystack - django_ct, django_id important for differentiating content types\nexpected text field for full-text search\n
* filter, exclude, order_by, highlight\n
* Similar to django model definition. If fields mirror model exactly, no extra work required. Otherwise use\n
* SearchIndex, a la django model definitions. Denormalize your data here.\n* text field template\n* prepare fields for indexing\n\n