• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Multi faceted responsive search, autocomplete, feeds engine & logging
 

Multi faceted responsive search, autocomplete, feeds engine & logging

on

  • 1,820 views

Presented by Remi Mikalsen, Search Engineer, The Norwegian Centre for ICT in Education ...

Presented by Remi Mikalsen, Search Engineer, The Norwegian Centre for ICT in Education

Learn how utdanning.no leverages open source technologies to deliver a blazing fast multi-faceted responsive search experience and a flexible and efficient feeds engine on top of Solr 3.6. Among the key open source projects that will be covered are Solr, Ajax-Solr, SolrPHPClient, Bootstrap, jQuery and Drupal. Notable highlights are ajaxified pivot facets, multiple parents hierarchical facets, ajax autocomplete with edge-n-gram and grouping, integrating our search widgets on any external website, custom Solr logging and using Solr to deliver Atom feeds. utdanning.no is a governmental website that collects, normalizes and publishes study information for related to secondary school and higher education in Norway. With 1.2 million visitors each year and 12.000 indexed documents we focus on precise information and a high degree of usability for students, potential students and counselors.

Statistics

Views

Total Views
1,820
Views on SlideShare
1,413
Embed Views
407

Actions

Likes
0
Downloads
21
Comments
0

4 Embeds 407

http://www.lucenerevolution.org 320
http://lucenerevolution.org 52
http://www.lucenerevolution.com 33
http://lucenerevolution.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Multi faceted responsive search, autocomplete, feeds engine & logging Multi faceted responsive search, autocomplete, feeds engine & logging Presentation Transcript

    • Multi-faceted responsive search,autocomplete, feeds engine and loggingRemi MikalsenSearch Engineer, utdanning.no
    • Multi-facetedMulti-facetedresponsive search,responsive search,autocomplete,autocomplete,feeds engine andfeeds engine andlogginglogging
    • IntroductionRemi MikalsenSearch engineer, utdanning.no«Utdanning.no is the official Norwegian national education andcareer portal, and includes an overview of education in Norwayand more than 500 career descriptions» - utdanning.no« [...] Our main goals are to improve the quality of education andto improve learning outcomes and learning for children, pupilsand students thourgh use of ICT in education» - iktsenteret.no
    • utdanning.noDrupal 7 & Solr 3.6~3 million visitors / year~12,000 documents~18,000,000 terms~260 fields~1 QPS (~9M searches / year)~8 ms latency
    • Data integration in the CMS
    • Universities, colleges andcommunity colleges~30 different endpoints~3500 documentsFolk high schools(non-academic)1 national endpoint~650 documentsSecondary schools1 national endpoint~1100 documentsHigher education admissions(Samordna opptak)1 national endpoint~1500 documentsSecondary schoolsmetadata (Grep)1 national endpoint~650 documentsHigher educationmetadata (NUS)1 national endpoint~3500 documentsTransform &normalizeDrupal 7ER-modelAdded valueEditorial staffProfessions, interviews,education summaries, etc.~1500 documentsProfessions metadata(STYRK)2 national endpoints~1000 documentsFetch dataSolr 3.6De-normalizedSearchable
    • Indexing
    • Drupal 7Apache Solr SearchIntegration 7.x-1.1Customizedbusiness logicSolr 3.6ProsBasic Drupal integrationTrack document changesSome facet supportEasily extendableConsLacks deep introspectingLittle de-normalizationHacky hierarchies (Drupal)NoteCustom config files!schema.xml(mainly dynamic fields)solrconfig.xml(mainly a drupal request handler)We addedDeep introspectingData de-normalizationSolid hierarchy supportPivot facet supportAtomizationManual partial re-indexschema.xml- field types (auto)- various copy fields- better spell- bucket fields- autocomplete
    • Organization(school)Study programStudy programStudy programOrganization(school)+all itsStudy programsDrupal DB Solr documentsStudy program+Organization
    • <doc><str name="id">394353</str><bool name="bs_mainsearch">true</bool><str name="bundle">org</str><str name="bundle_name">Organization</str><str name="label">ACME University</str><str name="atom">[XML]</str><arr name="related_nodes"><str>ACME Rocket Science</str><str>Study program 2</str><str>Study program N</str></arr><arr name="sm_geography_hierarchy"><str>1>California</str><str>2>California>San Diego</str><str>3>California>San Diego>Gaslamp Quarter</str></arr><str name="ss_menu_1">orgmenu</str><str name="ss_menu_2">org</str></doc>
    • <doc><str name="id">394354</str><bool name="bs_mainsearch">true</bool><str name="bundle">he</str><str name="bundle_name">Higher Education</str><str name="label">ACME Rocket Science</str><str name="atom">[XML]</str><arr name="sm_offered_by"><str>ACME University</str></arr><arr name="sm_study_area"><str>Engineering</str><str>Science</str></arr><long name="its_field_semesters">8</long><str name="ss_menu_1">edumenu</str><str name="ss_menu_2">he</str></doc>
    • Searching- Site search- Embedded search- Feeds engine
    • Site search
    • Our goalStudents, councelors and teachers must find what they look forHow?- Interaction design (IxD) vs graphical design- User testing, user testing and user testing (and experience)- Resulting in a GUI specification we must implement
    • Ajax-Solr is our JS framework:https://github.com/evolvingweb/ajax-solr/wiki/reuters-tutorial- manages all querying- widgets for interaction with and displaying results- events fire search requests which updates widgetsWe extended it heavily- Developed all our widgets (10+)- Added logging (async, via ajax, local and GA)- Distributed configuration (server + client)- Simplified initialization scriptBut it also works out of the box!
    • Logger~200 linesJS library~1700 linesSolr 3.6Our WebsiteSolr proxy~85 linesajax-solrevolvingwebSolrPhpClientr60Default configInitialize(config)JS library(copy)SearchACME EngineeringLorum sollicitudin nunc id nibhblandit pellentesque ipsum.ACME LawCras nunc id nibh blanditpellentesque sollicitudin.ACME MedIpsum ollicitudin nunc id blanditnibh pellentesque nibh.- Include JS library- Initialize- Set up HTML- Search! (and log)
    • Site search – widgets & facetingAjax Solr allows defining N widgets«Everything» is a widgetA facet is an instance of a FacetWidgetInteraction with widgets may fire queryAll facetation is piped into one queryAll widgets are updated after Solr response
    • Some facet widgets we have developed- PlainFacet values and facet counts in a listMultiple (AND) or single choice- HierarchicalFacet values and facet counts in a listClicking on a facet value drills down into the hierarchy; facet.prefix + fq- DropdownDisplays facet values in a dropdown listUseful for mobile devices in our responsive theme- TagcloudFacet values in a tagcloud- Pivot facetOur menu system
    • Adding facetsConfigfacets[interests] = new facetobject(tagcloud, field_interests, #interests);facets[ispublic] = new facetobject(plain, field_ispublic, #ispublic);config[facets] = facets;HTML<ul id="interests"></ul><ul id="ispublic"></ul>INITIALIZEManager.addFacets(config);
    • Example widget codeAjaxSolr.PlainFacetWidget = AjaxSolr.AbstractFacetWidget.extend({multivalue: true,target: null, // HTML target idfield: null, // Solr-fieldfacet_display_limit: 5, // Max facets to display before «See more»facet_field_sort: null, // Optional facet sortdependencies: null, // Conditional display of facetfacet_display_more: See more,facet_display_less: See less,...init: function() { ...}beforeRequest: function() { ... }afterRequest: function() { ... }});
    • Site search – pivot facet
    • Pivot faceting allows you to facet within the results of the parent facet- http://wiki.apache.org/solr/SimpleFacetParametersSlight problem; we dont run Solr 4.x!
    • ProblemMenu facets shouldnt affect each other, but affect search result and other facets
    • Our solutionSolr document 1<str name="ss_menu_1">orgmenu</str><str name="ss_menu_2">org</str>Solr document 2<str name="ss_menu_1">edumenu</str><str name="ss_menu_2">higher_ed</str>Solr document 3<str name="ss_menu_1">edumenu</str><str name="ss_menu_2">secondary</str>Solr query when a top level menu tab is selectedfq={!tag=ss_menu_1}ss_menu_1:edumenu&facet.field={!ex=ss_menu_1}ss_menu_1Solr query when a sub-level menu tab is selectedfq={!tag=ss_menu_1}ss_menu_1:edumenu&fq={!tag=ss_menu_1,ss_menu_2}ss_menu_2:higher_ed&facet.field={!ex=ss_menu_1}ss_menu_1&facet.field={!ex=ss_menu_2}ss_menu_2
    • Drawbacks- Can be VERY slow on large indexes with many unique terms in the facetWhy do we do it?- Small index; 18M terms, 12K documents- Pivot facet fields have very few distinct values (5-8)!
    • Site search - autocomplete
    • Our goalGive our users the feeling that weve implemented a mind-readerHow?With relevant, grouped suggestions* as they type in a search queryDo we succeed?50% of our «clicks to content» from searches comes from autocomplete
    • Implementing autocomplete is «easy»1) Ajax2) Detect keystrokes3) Send one request per keystroke4) Receive results, populate result listTechniques we employ- Minimal payload (reduced fl)- But same boosts and qf as «normal» queries- group=true, group.field=, group.limit=- start_label^1.5 wild_label^1 wild_other^0.25- Caching (jsonp, cache=true)
    • Define field type<fieldType name="startsWith" class="solr.TextField"><analyzer type="index"><tokenizer class="solr.KeywordTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/><filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" /></analyzer><analyzer type="query"><tokenizer class="solr.KeywordTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/></analyzer></fieldType>Define fields<field name="start_label" type="startsWith" indexed="true" stored="false" multiValued="false"/>Copy fields<copyField source="label" dest="start_label"/>
    • Define field type<fieldType name="wildCardType" class="solr.TextField" omitNorms="true"><analyzer type="index"><tokenizer class="solr.KeywordTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="front"/><filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="70" side="back"/></analyzer><analyzer type="query"><tokenizer class="solr.KeywordTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"><filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="false"/><filter class="solr.NorwegianLightStemFilterFactory"/></analyzer></fieldType>Define fields<field name="wild_label" type="wildCardType" indexed="true" stored="false" multiValued="false"/><field name="wild_other" type="wildCardType" indexed="true" stored="false" multiValued="true"/>Copy fields<copyField source="label" dest="wild_label"/><copyField source="teaser" dest="wild_other"/><copyField source="body" dest="wild_other"/><copyField source="searchwords" dest="wild_other"/><copyField source="related_nodes" dest="wild_other"/>
    • Embedded search
    • Our goalLet other sites search our dataHow?The exact same way we do ourselvesDo we succeed?Two external sites are up and running and a third is on its way
    • Logger~200 linesJS library~1700 linesSolr 3.6ACME WebsiteSolr proxy~85 linesajax-solrevolvingwebACME configSolrPhpClientr60Default configConfig(override)JS library(copy)SearchACME EngineeringLorum sollicitudin nunc id nibhblandit pellentesque ipsum.ACME LawCras nunc id nibh blanditpellentesque sollicitudin.ACME MedIpsum ollicitudin nunc id blanditnibh pellentesque nibh.- Register with us- Include our JS library- Set up config- Set up HTML- Search! (and log)
    • <html><head><title>ACME Website</title><!-- utdanning.no search framework --><script src="/js/jquery.js"></script><script src="http://example.com/solrservice/js-min/solr-search-full-min.js"></script><script src="/js/search-init.js"></script></head><body><!-- Search form --><form><input id="query" name="query" type="search" /><input type="submit" value="Search" /></form><!-- Search results --><div><ul class="hits" id="hits"></ul></div></body></html>
    • <script type="text/javascript">// ACME mockup init-scriptvar Manager; // Search manager objectuno_config = loadConfig(http://example.com/solrservice/.../acme.config);// Fully customizable search configuration, e.g.:uno_config[server][qf] = label^1.8 content^1.2;// Search box widgetManager.addPlainSearch(uno_config);// Result list widgetManager.addResults(uno_config);Manager.finalizeConfig(uno_config);Manager.doRequest(); // Optional
    • Site owners have full controlAdd, edit and configure widgetsQuery fields, boosts, etc.FacetingStylingPre-limit search to parts of our indexBecause we eat our own dog food!
    • Feeds engine
    • Our goalDeliver data in bulk to partner organizationsHow?Restful searchable data endpoint that returns XML (Atom++)Do we succeed?Beta-partner up and running with stunning performance
    • ConsumerQueryDefault configFeeds engine~300 linesSolr proxy~85 linesSolr 3.6Logger~200 linesSolrPhpClientr60
    • Feeds engine- Parses incoming query- Loads config (filters, weights, ...)- Transforms incoming + config to Solr URL- Sends to Solr proxySolr Proxy- Loads Solr PHP Client library- Sends search request and parses response- Returns results to Feeds engineFeeds engine- Loads logger and logs results- Picks out ATOM from response- Glues result inside an ATOM frame- Display feed
    • http://example.com/data/atom/organizationshttp://example.com/data/atom/organizations/10/2http://example.com/data/atom/organizations?fq=type:HEhttp://example.com/data/atom/organizations?fq=type:HE&q=lawConsume with feeds reader
    • Logging
    • How?Logging back-end written in PHP that writes to a MySQL database- called asynchronously from JS library- called inline in Feeds engineGoogle Analytics (ga.js)- called from JS library (searchwords and categories)What?- Search terms- Facets- User interaction- List of search results- Stack latency (JS, PHP, Solr)- Search domain- Session
    • Why?Most popular queries with no results?Most popular queries?How does QPS affect latency?Follow a user through search (interaction design & user testing)Displaying logsCharts are generated with Google Chart Tools in DrupalOther statistics can easily be explored with Drupal Views
    • Demo (includes responsiveness)
    • http://utdanning.no/sokhttp://utdanning.no/searchhttp://utdanning.no/solrservice/utdanning.no
    • Drupal 7Apache Solr Search Integration+ custom indexingOmega theme (responsiveness with Drupal)+ custom jsAjax Solr+ custom widgetsSolr Php Client r60+ custom proxyBootstrap (responsiveness without Drupal)jQueryGoogle Chart Tools
    • Remi MikalsenRemi Mikalsenremi.mikalsen@iktsenteret.noremi.mikalsen@iktsenteret.noiktsenteret.noiktsenteret.noMulti-facetedMulti-facetedresponsive search,responsive search,autocomplete,autocomplete,feeds engine andfeeds engine andlogginglogging
    • CONTACTRemi Mikalsenremi.mikalsen@iktsenteret.no