Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar: Solr's example/files: From bin/post to /browse and Beyond


Published on

Join Lucidworks cofounder, Sr. Solutions Architect, and Lucene/Solr committer, Erik Hatcher for a webinar to explore how to build a personal document search app with the ease and power of Solr.

Published in: Technology
  • ➤➤ How Long Does She Want You to Last? Here's the link to the FREE report ●●●
    Are you sure you want to  Yes  No
    Your message goes here

Webinar: Solr's example/files: From bin/post to /browse and Beyond

  1. 1. Solr’s example/files from bin/post to /browse and beyond
  2. 2. $ bin/post -c your_collection your_data/
  3. 3. http://localhost:8983/solr/<collection>/browse
  4. 4. example/files •Distilled, simple, document type navigation •Multi-lingual, localizable interface •Language detection and faceting •Phrase/shingle indexing and "tag cloud" faceting •E-mail address and URL index-time extraction •"instant search" (as you type results)
  5. 5. $ bin/solr start $ bin/solr create -c files -d example/files $ bin/post -c files ~/Documents $ open http://localhost:8983/solr/files/browse quick start
  6. 6. The UI is The App
  7. 7. document type filtering
  8. 8. language detection and faceting
  9. 9. • instant search • localized interface • HTML safe highlighting
  10. 10. URLs are UI too! • /browse is stateless • all parameters for the view must be passed on the URL • and/or in config: • request handler definition (core reload API) • paramsets / params.json (real-time API, no reload)
  11. 11. /browse?type=pdf browsing by document type
  12. 12. /browse?locale=de_DE localizing the interface
  13. 13. URL comparison • /browse?type=html&locale=de_DE • /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html} &wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.qu ery={!ex=type%20key=all_types}*:*&facet=on&facet.field={! ex=type}doc_type…
  14. 14. URL comparison: /browse • /browse?type=html&locale=de_DE • /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html} &wt=velocity&v.template=browse&v.layout=layout&q=*:*&fac et.query={!ex=type%20key=all_types} *:*&facet=on&facet.field={!ex=type}doc_type…
  15. 15. URL comparison: type • /browse?type=html&locale=de_DE • /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html} &wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.qu ery={!ex=type%20key=all_types}*:*&facet=on&facet.field={! ex=type}doc_type…
  16. 16. URL comparison: locale • /browse?type=html&locale=de_DE • /select?v.locale=de_DE&fq={!field%20f=doc_type%20v=html} &wt=velocity&v.template=browse&v.layout=layout&q=*:*&facet.qu ery={!ex=type%20key=all_types}*:*&facet=on&facet.field={! ex=type}doc_type…
  17. 17. Solr Tips and Tricks within • Indexing “pipeline” • language detection • document type identification • e-mail address and URL extraction • Top Phrases • Query “pipeline” • document type faceting and filtering • UI localisation/localization
  18. 18. implementation: language detection &facet.field=language (via params.json) conf/solrconfig.xml:
  19. 19. implementation: document type identification conf/update-script.js:
  20. 20. implementation: E-mail address and URL extraction conf/email_url_types.txt <URL> <EMAIL> /select?fl=id,email_ss,url_ss&wt=csv conf/managed-schema: conf/update-script.js:
  21. 21. implementation: Top Phrases &facet.field=text_shingles conf/managed-schema:
  22. 22. implementation: document type faceting and filtering &type=[doc|pdf|image|…|all|unknown] fq={!switch v=$type tag=type case=‘*:*' case.all=‘*:*' case.unknown='-doc_type:[* TO *]’ default=$type_fq} type_fq={!field f=doc_type v=$type} facet.field={!ex=type}doc_type f.doc_type.facet.mincount=0 f.doc_type.facet.missing=true facet.query={!ex=type key=all_types}*:*
  23. 23. implementation: UI conf/params.json: conf/solrconfig.xml:
  24. 24. example/files: what’s next? • Fix e-mail and URL field names (<email>_ss and <url>_ss, with angle brackets in field names), also add display of these fields in /browse results rendering • Improve quality of extracted phrases • Extract, facet, and display acronyms • Add sorting controls, possibly all or some of these: last modified date, created date, relevancy, and title • Add grouping by doc_type perhaps • fix debug mode - currently does not update the parsed query debug output (this is probably a bug in data driven /browse as well) • Harden update-script: it currently errors if documents do not have a "content" field • Filter out bogus e-mail addresses
  25. 25. And beyond… • Leveraging • Analytics • Relevancy Tuning: signals feedback, parameter adjustments • Landing pages, scripting, etc
  26. 26. Analytics
  27. 27. Landing Pages