Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building an easy to
use search solution
(for different languages)

Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference...
Speaker
• Co-owner of Netgen - web development
agency, Zagreb, Croatia

• Started as developer 11 years ago
• Now I do var...
So I am still a developer! :)

www.netgenlabs.com
Use case
• Regulatory reform project: cutting of unneeded
legislative, laws and/or procedures

• Netgen is the technology ...
We would rather work in
Denmark, but seems that
it doesn’t need such a
solution :(

www.netgenlabs.com
How we use search
Solution
• In 2006. simple filter
• Today eZ Publish CMS powered flexible information
architecture with Solr for search 

• ...
Search features
•
•
•
•
•
•

Simple (default) and advanced search (with filters)
Full text search on complex data, boosting...
Additional features
• Sometimes using multi search
• Typing suggestions
• Latest search phrase list

www.netgenlabs.com
Challenges
Characters
• At the beginning we didn’t have Unicode it was a mess!

• Unicode solved a lot of problems but not all
• Same...
Indexing
• Indexing files like Word, PDF or similar proved
to be problematic due to character problems

• token delimiter c...
Searching
• search phrase input problems

www.netgenlabs.com
Blind work
• the biggest challenge is that developers don’t know the
language

• first level of testing is very hard
• stil...
What vehicle would you
use to transport 10 cases
of Heineken?

www.netgenlabs.com
How to overcome this?
Main idea
• lets try to assess search result quality 
• use editors for rating (not the public)
• use most frequently sear...
The tool
• integrated in the public site
• added thumbs up/down buttons for first X
results and only shown to editors

www....
Demo
• imported articles to test instance form various
sources about CMS topic

• rating result quality of 7 search terms
...
Rating side
Analysing side
Rate measures
• Discounted Cumulative Gain (DCG) - rate sum

discounted based on position in search results

• Normalised ...
Known problems
• What if good results are not showing? - something bad
is going on with the search engine

• what if there...
Improvements
• opening rating to public users
• using clicks as rates
• implement “did you find what you have looking for?”...
Questions now or later
ivo@netgen.hr
ilukac.com/twitter
ilukac.com/facebook
ilukac.com/gplus
ilukac.com/linkedin
Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference
Upcoming SlideShare
Loading in …5
×

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

1,030 views

Published on

Published in: Technology, Design
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/Nr5k4 ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

  1. 1. Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference ! “Making search work” track
  2. 2. Speaker • Co-owner of Netgen - web development agency, Zagreb, Croatia • Started as developer 11 years ago • Now I do variety of things, but can be best described as International Business Developer www.netgenlabs.com
  3. 3. So I am still a developer! :) www.netgenlabs.com
  4. 4. Use case • Regulatory reform project: cutting of unneeded legislative, laws and/or procedures • Netgen is the technology implementation partner • Project lead by Sense Consulting • Croatia, Egypt, Vietnam, Armenia, Iraq - mostly “exotic” countries www.netgenlabs.com
  5. 5. We would rather work in Denmark, but seems that it doesn’t need such a solution :( www.netgenlabs.com
  6. 6. How we use search
  7. 7. Solution • In 2006. simple filter • Today eZ Publish CMS powered flexible information architecture with Solr for search • Usually 70% common features, 30% customisation • Aiming for 90%/10% • If you interested in tech specifics ask me later… www.netgenlabs.com
  8. 8. Search features • • • • • • Simple (default) and advanced search (with filters) Full text search on complex data, boosting on attribute level Filtering with multilevel tags/taxonomies Stopwords Search time spelling based on indexed data Sometimes using faceting on result set www.netgenlabs.com
  9. 9. Additional features • Sometimes using multi search • Typing suggestions • Latest search phrase list www.netgenlabs.com
  10. 10. Challenges
  11. 11. Characters • At the beginning we didn’t have Unicode it was a mess! • Unicode solved a lot of problems but not all • Same characters can have more byte codes which is not being normalised by default www.netgenlabs.com
  12. 12. Indexing • Indexing files like Word, PDF or similar proved to be problematic due to character problems • token delimiter configuration could be language specific • stemming sometimes supported, sometimes not www.netgenlabs.com
  13. 13. Searching • search phrase input problems www.netgenlabs.com
  14. 14. Blind work • the biggest challenge is that developers don’t know the language • first level of testing is very hard • still can’t trust Google Translate www.netgenlabs.com
  15. 15. What vehicle would you use to transport 10 cases of Heineken? www.netgenlabs.com
  16. 16. How to overcome this?
  17. 17. Main idea • lets try to assess search result quality • use editors for rating (not the public) • use most frequently searched terms (we can’t test all) • rate results above the fold www.netgenlabs.com
  18. 18. The tool • integrated in the public site • added thumbs up/down buttons for first X results and only shown to editors www.netgenlabs.com
  19. 19. Demo • imported articles to test instance form various sources about CMS topic • rating result quality of 7 search terms • Thumbs up/down for suggested 3 search results • Test periods are used for framing test data www.netgenlabs.com
  20. 20. Rating side
  21. 21. Analysing side
  22. 22. Rate measures • Discounted Cumulative Gain (DCG) - rate sum discounted based on position in search results • Normalised Discounted Cumulative Gain (NDCG) - discounted rate sum normalised against best possible outcome (to get percentage as the unit) • Popularity based NDCG - takes into account the popularity of the search form http://en.wikipedia.org/wiki/Discounted_cumulative_gain www.netgenlabs.com
  23. 23. Known problems • What if good results are not showing? - something bad is going on with the search engine • what if there is no good result? • what about new content added in time? • at the end of the day measurements are good for comparing between test periods, not meaningful by itself www.netgenlabs.com
  24. 24. Improvements • opening rating to public users • using clicks as rates • implement “did you find what you have looking for?” feature • integrate with analytics • use rate data to boost particular item in search! www.netgenlabs.com
  25. 25. Questions now or later ivo@netgen.hr ilukac.com/twitter ilukac.com/facebook ilukac.com/gplus ilukac.com/linkedin

×