SlideShare a Scribd company logo
1 of 26
Download to read offline
Building an easy to
use search solution
(for different languages)

Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference
!
“Making search work” track
Speaker
• Co-owner of Netgen - web development
agency, Zagreb, Croatia

• Started as developer 11 years ago
• Now I do variety of things, but can be best

described as International Business Developer

www.netgenlabs.com
So I am still a developer! :)

www.netgenlabs.com
Use case
• Regulatory reform project: cutting of unneeded
legislative, laws and/or procedures

• Netgen is the technology implementation partner
• Project lead by Sense Consulting
• Croatia, Egypt, Vietnam, Armenia, Iraq - mostly
“exotic” countries

www.netgenlabs.com
We would rather work in
Denmark, but seems that
it doesn’t need such a
solution :(

www.netgenlabs.com
How we use search
Solution
• In 2006. simple filter
• Today eZ Publish CMS powered flexible information
architecture with Solr for search 

• Usually 70% common features, 30% customisation 
• Aiming for 90%/10%
• If you interested in tech specifics ask me later…

www.netgenlabs.com
Search features
•
•
•
•
•
•

Simple (default) and advanced search (with filters)
Full text search on complex data, boosting on attribute level
Filtering with multilevel tags/taxonomies
Stopwords
Search time spelling based on indexed data
Sometimes using faceting on result set

www.netgenlabs.com
Additional features
• Sometimes using multi search
• Typing suggestions
• Latest search phrase list

www.netgenlabs.com
Challenges
Characters
• At the beginning we didn’t have Unicode it was a mess!

• Unicode solved a lot of problems but not all
• Same characters can have more byte codes
which is not being normalised by default

www.netgenlabs.com
Indexing
• Indexing files like Word, PDF or similar proved
to be problematic due to character problems

• token delimiter configuration could be
language specific

• stemming sometimes supported, sometimes
not

www.netgenlabs.com
Searching
• search phrase input problems

www.netgenlabs.com
Blind work
• the biggest challenge is that developers don’t know the
language

• first level of testing is very hard
• still can’t trust Google Translate

www.netgenlabs.com
What vehicle would you
use to transport 10 cases
of Heineken?

www.netgenlabs.com
How to overcome this?
Main idea
• lets try to assess search result quality 
• use editors for rating (not the public)
• use most frequently searched terms (we
can’t test all)

• rate results above the fold

www.netgenlabs.com
The tool
• integrated in the public site
• added thumbs up/down buttons for first X
results and only shown to editors

www.netgenlabs.com
Demo
• imported articles to test instance form various
sources about CMS topic

• rating result quality of 7 search terms
• Thumbs up/down for suggested 3 search results
• Test periods are used for framing test data

www.netgenlabs.com
Rating side
Analysing side
Rate measures
• Discounted Cumulative Gain (DCG) - rate sum

discounted based on position in search results

• Normalised Discounted Cumulative Gain (NDCG) -

discounted rate sum normalised against best possible
outcome (to get percentage as the unit)

• Popularity based NDCG - takes into account the
popularity of the search form

http://en.wikipedia.org/wiki/Discounted_cumulative_gain
www.netgenlabs.com
Known problems
• What if good results are not showing? - something bad
is going on with the search engine

• what if there is no good result?
• what about new content added in time?
• at the end of the day measurements are good for

comparing between test periods, not meaningful by
itself

www.netgenlabs.com
Improvements
• opening rating to public users
• using clicks as rates
• implement “did you find what you have looking for?”
feature

• integrate with analytics
• use rate data to boost particular item in search!

www.netgenlabs.com
Questions now or later
ivo@netgen.hr
ilukac.com/twitter
ilukac.com/facebook
ilukac.com/gplus
ilukac.com/linkedin

More Related Content

Similar to Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?Great Wide Open
 
How to make a great website
How to make a great websiteHow to make a great website
How to make a great websiteDr. Taher Ghazal
 
WE-06-Testing.ppt
WE-06-Testing.pptWE-06-Testing.ppt
WE-06-Testing.pptjaved281701
 
Easy ways to make your site more accessible
Easy ways to make your site more accessibleEasy ways to make your site more accessible
Easy ways to make your site more accessibleJana Veliskova
 
Build your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-frameBuild your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-framePaul Bostrom
 
Building a custom cms with django
Building a custom cms with djangoBuilding a custom cms with django
Building a custom cms with djangoYann Malet
 
Mobile media module part 6 - app development rev-mf
Mobile media module   part 6 - app development rev-mfMobile media module   part 6 - app development rev-mf
Mobile media module part 6 - app development rev-mfMichelle Ferrier
 
Engage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pagesEngage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pagesHeiko Voigt
 
Tech Thursdays: Building Products
Tech Thursdays: Building ProductsTech Thursdays: Building Products
Tech Thursdays: Building ProductsHayden Bleasel
 
Untangling spring week11
Untangling spring week11Untangling spring week11
Untangling spring week11Derek Jacoby
 
ConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the leftConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the leftTom Chavez
 
Pearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET DeveloperPearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET DeveloperOfer Zelig
 
ShopekLobek first term work summary
ShopekLobek first term work summaryShopekLobek first term work summary
ShopekLobek first term work summaryAshraf Hamdy
 
Discover the power of browser developer tools
Discover the power of browser developer toolsDiscover the power of browser developer tools
Discover the power of browser developer toolsylefebvre
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesiabrucelawson
 
Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer Sarah Dutkiewicz
 
Open Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus TiscaliOpen Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus TiscaliRiccardo Sirigu
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughRandy Shoup
 

Similar to Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference (20)

You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?
 
How to make a great website
How to make a great websiteHow to make a great website
How to make a great website
 
Dmdh workshop #6
Dmdh workshop #6Dmdh workshop #6
Dmdh workshop #6
 
WE-06-Testing.ppt
WE-06-Testing.pptWE-06-Testing.ppt
WE-06-Testing.ppt
 
Easy ways to make your site more accessible
Easy ways to make your site more accessibleEasy ways to make your site more accessible
Easy ways to make your site more accessible
 
Build your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-frameBuild your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-frame
 
Building a custom cms with django
Building a custom cms with djangoBuilding a custom cms with django
Building a custom cms with django
 
Mobile media module part 6 - app development rev-mf
Mobile media module   part 6 - app development rev-mfMobile media module   part 6 - app development rev-mf
Mobile media module part 6 - app development rev-mf
 
Engage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pagesEngage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pages
 
Tech Thursdays: Building Products
Tech Thursdays: Building ProductsTech Thursdays: Building Products
Tech Thursdays: Building Products
 
Untangling spring week11
Untangling spring week11Untangling spring week11
Untangling spring week11
 
ConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the leftConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the left
 
Pearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET DeveloperPearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET Developer
 
ShopekLobek first term work summary
ShopekLobek first term work summaryShopekLobek first term work summary
ShopekLobek first term work summary
 
Case study
Case studyCase study
Case study
 
Discover the power of browser developer tools
Discover the power of browser developer toolsDiscover the power of browser developer tools
Discover the power of browser developer tools
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesia
 
Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer
 
Open Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus TiscaliOpen Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus Tiscali
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good Enough
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

  • 1. Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference ! “Making search work” track
  • 2. Speaker • Co-owner of Netgen - web development agency, Zagreb, Croatia • Started as developer 11 years ago • Now I do variety of things, but can be best described as International Business Developer www.netgenlabs.com
  • 3. So I am still a developer! :) www.netgenlabs.com
  • 4. Use case • Regulatory reform project: cutting of unneeded legislative, laws and/or procedures • Netgen is the technology implementation partner • Project lead by Sense Consulting • Croatia, Egypt, Vietnam, Armenia, Iraq - mostly “exotic” countries www.netgenlabs.com
  • 5. We would rather work in Denmark, but seems that it doesn’t need such a solution :( www.netgenlabs.com
  • 6. How we use search
  • 7. Solution • In 2006. simple filter • Today eZ Publish CMS powered flexible information architecture with Solr for search • Usually 70% common features, 30% customisation • Aiming for 90%/10% • If you interested in tech specifics ask me later… www.netgenlabs.com
  • 8. Search features • • • • • • Simple (default) and advanced search (with filters) Full text search on complex data, boosting on attribute level Filtering with multilevel tags/taxonomies Stopwords Search time spelling based on indexed data Sometimes using faceting on result set www.netgenlabs.com
  • 9. Additional features • Sometimes using multi search • Typing suggestions • Latest search phrase list www.netgenlabs.com
  • 11. Characters • At the beginning we didn’t have Unicode it was a mess! • Unicode solved a lot of problems but not all • Same characters can have more byte codes which is not being normalised by default www.netgenlabs.com
  • 12. Indexing • Indexing files like Word, PDF or similar proved to be problematic due to character problems • token delimiter configuration could be language specific • stemming sometimes supported, sometimes not www.netgenlabs.com
  • 13. Searching • search phrase input problems www.netgenlabs.com
  • 14. Blind work • the biggest challenge is that developers don’t know the language • first level of testing is very hard • still can’t trust Google Translate www.netgenlabs.com
  • 15. What vehicle would you use to transport 10 cases of Heineken? www.netgenlabs.com
  • 16.
  • 18. Main idea • lets try to assess search result quality • use editors for rating (not the public) • use most frequently searched terms (we can’t test all) • rate results above the fold www.netgenlabs.com
  • 19. The tool • integrated in the public site • added thumbs up/down buttons for first X results and only shown to editors www.netgenlabs.com
  • 20. Demo • imported articles to test instance form various sources about CMS topic • rating result quality of 7 search terms • Thumbs up/down for suggested 3 search results • Test periods are used for framing test data www.netgenlabs.com
  • 23. Rate measures • Discounted Cumulative Gain (DCG) - rate sum discounted based on position in search results • Normalised Discounted Cumulative Gain (NDCG) - discounted rate sum normalised against best possible outcome (to get percentage as the unit) • Popularity based NDCG - takes into account the popularity of the search form http://en.wikipedia.org/wiki/Discounted_cumulative_gain www.netgenlabs.com
  • 24. Known problems • What if good results are not showing? - something bad is going on with the search engine • what if there is no good result? • what about new content added in time? • at the end of the day measurements are good for comparing between test periods, not meaningful by itself www.netgenlabs.com
  • 25. Improvements • opening rating to public users • using clicks as rates • implement “did you find what you have looking for?” feature • integrate with analytics • use rate data to boost particular item in search! www.netgenlabs.com
  • 26. Questions now or later ivo@netgen.hr ilukac.com/twitter ilukac.com/facebook ilukac.com/gplus ilukac.com/linkedin