Building an easy to
use search solution
(for different languages)

Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference...
Speaker
• Co-owner of Netgen - web development
agency, Zagreb, Croatia

• Started as developer 11 years ago
• Now I do var...
So I am still a developer! :)

www.netgenlabs.com
Use case
• Regulatory reform project: cutting of unneeded
legislative, laws and/or procedures

• Netgen is the technology ...
We would rather work in
Denmark, but seems that
it doesn’t need such a
solution :(

www.netgenlabs.com
How we use search
Solution
• In 2006. simple filter
• Today eZ Publish CMS powered flexible information
architecture with Solr for search 

• ...
Search features
•
•
•
•
•
•

Simple (default) and advanced search (with filters)
Full text search on complex data, boosting...
Additional features
• Sometimes using multi search
• Typing suggestions
• Latest search phrase list

www.netgenlabs.com
Challenges
Characters
• At the beginning we didn’t have Unicode it was a mess!

• Unicode solved a lot of problems but not all
• Same...
Indexing
• Indexing files like Word, PDF or similar proved
to be problematic due to character problems

• token delimiter c...
Searching
• search phrase input problems

www.netgenlabs.com
Blind work
• the biggest challenge is that developers don’t know the
language

• first level of testing is very hard
• stil...
What vehicle would you
use to transport 10 cases
of Heineken?

www.netgenlabs.com
How to overcome this?
Main idea
• lets try to assess search result quality 
• use editors for rating (not the public)
• use most frequently sear...
The tool
• integrated in the public site
• added thumbs up/down buttons for first X
results and only shown to editors

www....
Demo
• imported articles to test instance form various
sources about CMS topic

• rating result quality of 7 search terms
...
Rating side
Analysing side
Rate measures
• Discounted Cumulative Gain (DCG) - rate sum

discounted based on position in search results

• Normalised ...
Known problems
• What if good results are not showing? - something bad
is going on with the search engine

• what if there...
Improvements
• opening rating to public users
• using clicks as rates
• implement “did you find what you have looking for?”...
Questions now or later
ivo@netgen.hr
ilukac.com/twitter
ilukac.com/facebook
ilukac.com/gplus
ilukac.com/linkedin
Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference
Upcoming SlideShare
Loading in …5
×

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

810 views

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
810
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

  1. 1. Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference ! “Making search work” track
  2. 2. Speaker • Co-owner of Netgen - web development agency, Zagreb, Croatia • Started as developer 11 years ago • Now I do variety of things, but can be best described as International Business Developer www.netgenlabs.com
  3. 3. So I am still a developer! :) www.netgenlabs.com
  4. 4. Use case • Regulatory reform project: cutting of unneeded legislative, laws and/or procedures • Netgen is the technology implementation partner • Project lead by Sense Consulting • Croatia, Egypt, Vietnam, Armenia, Iraq - mostly “exotic” countries www.netgenlabs.com
  5. 5. We would rather work in Denmark, but seems that it doesn’t need such a solution :( www.netgenlabs.com
  6. 6. How we use search
  7. 7. Solution • In 2006. simple filter • Today eZ Publish CMS powered flexible information architecture with Solr for search • Usually 70% common features, 30% customisation • Aiming for 90%/10% • If you interested in tech specifics ask me later… www.netgenlabs.com
  8. 8. Search features • • • • • • Simple (default) and advanced search (with filters) Full text search on complex data, boosting on attribute level Filtering with multilevel tags/taxonomies Stopwords Search time spelling based on indexed data Sometimes using faceting on result set www.netgenlabs.com
  9. 9. Additional features • Sometimes using multi search • Typing suggestions • Latest search phrase list www.netgenlabs.com
  10. 10. Challenges
  11. 11. Characters • At the beginning we didn’t have Unicode it was a mess! • Unicode solved a lot of problems but not all • Same characters can have more byte codes which is not being normalised by default www.netgenlabs.com
  12. 12. Indexing • Indexing files like Word, PDF or similar proved to be problematic due to character problems • token delimiter configuration could be language specific • stemming sometimes supported, sometimes not www.netgenlabs.com
  13. 13. Searching • search phrase input problems www.netgenlabs.com
  14. 14. Blind work • the biggest challenge is that developers don’t know the language • first level of testing is very hard • still can’t trust Google Translate www.netgenlabs.com
  15. 15. What vehicle would you use to transport 10 cases of Heineken? www.netgenlabs.com
  16. 16. How to overcome this?
  17. 17. Main idea • lets try to assess search result quality • use editors for rating (not the public) • use most frequently searched terms (we can’t test all) • rate results above the fold www.netgenlabs.com
  18. 18. The tool • integrated in the public site • added thumbs up/down buttons for first X results and only shown to editors www.netgenlabs.com
  19. 19. Demo • imported articles to test instance form various sources about CMS topic • rating result quality of 7 search terms • Thumbs up/down for suggested 3 search results • Test periods are used for framing test data www.netgenlabs.com
  20. 20. Rating side
  21. 21. Analysing side
  22. 22. Rate measures • Discounted Cumulative Gain (DCG) - rate sum discounted based on position in search results • Normalised Discounted Cumulative Gain (NDCG) - discounted rate sum normalised against best possible outcome (to get percentage as the unit) • Popularity based NDCG - takes into account the popularity of the search form http://en.wikipedia.org/wiki/Discounted_cumulative_gain www.netgenlabs.com
  23. 23. Known problems • What if good results are not showing? - something bad is going on with the search engine • what if there is no good result? • what about new content added in time? • at the end of the day measurements are good for comparing between test periods, not meaningful by itself www.netgenlabs.com
  24. 24. Improvements • opening rating to public users • using clicks as rates • implement “did you find what you have looking for?” feature • integrate with analytics • use rate data to boost particular item in search! www.netgenlabs.com
  25. 25. Questions now or later ivo@netgen.hr ilukac.com/twitter ilukac.com/facebook ilukac.com/gplus ilukac.com/linkedin

×