Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Episerver and search engines


Published on

Slides from Episerver meetup 20.4.2017.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Episerver and search engines

  1. 1. EPISERVER AND SEARCH ENGINES CASE: EVIRA Episerver Meetup 19.4.2017 Mikko Huilaja
  2. 2. AGENDA › Short glimpse to past and modern search engines › Case Evira › Environments & Cloud › CMS and Elasticsearch combination › More practical stuff
  4. 4. SEARCH ENGINES 2014 (AND BEFORE) › We (Solita) had 3 different types of search solutions 1. Google Search Appliance • Only in your own data center • Require investment beforehand (the box) 2. Episerver Find • Only in cloud (in Ireland) • Pricing depends on content items, languages and QPS 3. Lucene search • Only in one server • built-in Episerver • Free
  5. 5. OPTIONS AND STRENGTHS › Google Search Appliance • Crawler-driven search engines are excellent to multiplatform environments (web, extranet, document bank, blogs) • Excellent statistics › Episerver Find • Easy to install and start using • Don’t need to host any server • Excellent API › Lucene search • OOTB in Episerver • Requires only files
  6. 6. OPTIONS AND WEAKNESSES › Google Search Appliance • Talks only xml • Sort by metadata: “The sorting occurs only on the 1000 most relevant results for the specific query” › Episerver Find • It’s a bit expensive • Limited dev options › Lucene search • Built-in Episerver -> hard to customize • Error prone • Only a full-text search
  7. 7. SEARCH ENGINES 2017 1. Google Search Appliance • 2016 Google ended development 2. Episerver Find • Find has become key component in many web sites and in DXC platform 3. Lucene search • Episerver’s focus is in Find 4. Elasticsearch • Elastic has build a hole family of products around Elasticsearch
  8. 8. CRAWLER VS EVENT-DRIVEN › Event-driven engines fits larger variation of use cases (projects) • Example access rights management, real-time • Might need more time to install › Crawler-driven engines often have lot’s of easy to use OOTB features • Don’t need that much customizing • Customizing is expensive Event-driven search engines fits nicely into Episerver projects
  9. 9. EPISERVER FIND › First released with the name Truffler in the end of 2011 › Event-driven search engine › Build on top of Elasticsearch which is build on top of Lucene › Father is Joel Abrahamsson
  10. 10. ELASTICSEARCH › Open source project › Build on top of Lucene and Java › Allows communication only through REST API and JSON › Various platforms have Client libraries to ease the communication (.NET, JAVA, JavaScript) › It’s build to be distributed and scalable search engine › Elasticsearch is a key product which has a hole family of products around (Kibane, Logstash, Beats, Monitoring, Alerting, Machine Learning)
  11. 11. Most/all search engines are using Lucene in the background
  12. 12. CASE EVIRA
  13. 13. CASE: EVIRA › Evira is Finnish Food Safety Authority. • Lots of official documentation • Lots of content editors • Contains mostly text, documents, forms and table data • Low amount of images and rich content › Same project contains also intranet for Evira. So the same architecture was required to work with intranet case also.
  14. 14. WWW.EVIRA.FI Go and test the search!
  15. 15. Search with URL -parameters Facet groups Ordering Search word highlights Customizable search results Did you mean this -feature Filters and Facets Easily customizable facets Fallback wildcard search File search for most common document types
  17. 17. KEY COMPONENTS › Episerver CMS (Content editing, UI and master data store) • Platform for content editing with many languages, versioning, document bank, metadata, etc. • Master data and primary data source for Elasticsearch › Elasticsearch (Search and performance) • Global search and efficient way to query large data sets with full-text support › Azure (Cloud platform and scale) • Azure contains all the environments, files, data, backups, monitoring, maintenance jobs, etc.
  18. 18. CUSTOMIZABLE PLATFORM › Episerver CMS • From medium size to very large projects • Easily customizable front-end and pluggable/extendable back-end › Elasticsearch • From the smallest to very large projects • Runs locally your laptop, buy it from the cloud or private data center › Azure Cloud • From the smallest to very large projects • IaaS and PaaS options
  19. 19. On premise / Private cloudAzure IaaS on virtual machines (one or many) Developer’s laptop IIS Web Server SETUP OPTIONS PaaS on Azure App Services and Elastic Cloud SEARCH VM Elasticsearch Web Site SQL Database Blob Storage Elasticsearch Web ServerWeb Server SQL Server
  20. 20. ELASTIC IS NOT JUST FOR SEARCH › It’s a performance tool. It makes querying large data sets much more efficient than tools like SQL Server or many other search tools › We use Elastic: › Global search › Internal searches and listings: Products news, announcements, comments, Files, RSS, sitemap › Handling a large datasets. Example some migrations. › Analytics and statistics • Site visitor analytics • Search usage analytics › 404 statistics › Error logging and log analyzing › Monitoring servers Full-text search, Listings, performance Analytics, statistics
  21. 21. NOT A CRAWLER › We have integrated Elasticsearch to events of Episerver › Real-time (1 or 2 seconds latency) • Long latencies often cause multiple other problems › We can send more data than what’s visible (example access rights) Real-time is really hard gain if it’s not built into the architecture
  23. 23. CQRS WITH CMS (TRADITIONAL FORMAT) Commands Queries SQL Server database Elasticsearch Index Web Site Episerver CMS Elasticsearch
  24. 24. CQRS WITH CMS (AS WE USE IT) Commands Queries Elasticsearch Index Web Site Episerver CMS Elasticsearch Simple Queries Episerver CMS SQL Server database
  25. 25. CQRS WITH CMS (AS EPISERVER FIND USE IT) Commands Queries Episerver Find Index Web Site Episerver CMS Elasticsearch Simple Queries Episerver CMS SQL Server database returns only the id’s GetContenResults() -method
  26. 26. FIND PROJECTIONS › CQRS pattern with traditional format › Querying Find without IContent › Does not use Episerver cache or database › All IContent (example BlogPost) properties do not exists in index or might not be up-to-date (example FriendlyUrl and AccessRights) var result = client.Search<BlogPost>() .Select(x => new SearchResult { Title = x.Title, Author = x.Author.Name}) .GetResult();
  27. 27. WHY ELASTIC WITH CMS › Content Management Systems are generally good for managing content, files, content relations, hierarchy, language variations, content versions, access rights, user management, model type management and CACHING › They often have hierarchical structure of handling content › So querying a page and querying parent or child pages often comes straight from the cache and does not even make a database query. › But CMS often do not include good tools querying across hierarchies
  28. 28. CHOOSE THE BEST TOOL › Use Episerver/CMS for simple queries • If you need to query: just one object, sibling objects or child objects from less than 2 hierarchy levels › Use Elasticsearch/Find • Everything else › Except don’t use Elasticsearch/Find: • If 1-2 second latency is too much • If there is some transactions requirements • If Find host exists in too far away (lag) or SLA requirements for the feature is higher than Find can provide (or use with cache)
  29. 29. ELASTIC INDEX = QUERY DATABASE › We can always recreate elastic index from SQL Server “master data” › That’s why we don’t really need multiple nodes or chards Get all the data Elasticsearch Index Episerver CMS SQL Server database Reindex
  30. 30. ELASTICSEARCH.NET & NEST › Official .NET Elasticsearch clients › ElasticSearch.NET & NEST makes the usage strongly typed: • No JSON • No typos • Every value has a type • IDE will help you var response = client.Search<Tweet>(s => s .From(0) .Size(10) .Query(q => q.Term(t => t.HashTags, "elasticsearch") ) ); public class Tweet { public string[] HasTags; ... } › Code example:
  31. 31. MAPPINGS ARE LIKE SCHEMA IN DB › NEST will automatically map most of the types but not all: › Separate string types: • Text (analyzed) default type for strings • Keywords (not analyzed) Keyword fields are only searchable by their exact value › Automating the mappings will help a lot in long run public class Tweet { [Text] public string Content; [keyword] public string Url; [keyword] public string[] HashTags; ... } › Code example: › Mappings is normally generated automatically based on content you insert into index. But sometimes you need custom mappings.
  32. 32. SCORING OPTIMIZATION › Boosting fields is the most important scoring customization › We normally have 3 fields which we boost with different values: • Titles (boost 2.0) • FullTextField (boost 1.5) • ExtraContent (boost 1.0)
  33. 33. SCORING OPTIMIZATION › Script scoring allows us to boost results with certain properties: • Search result type • Number of internal links • Depth in hierarchy • Recently published / edited • Popularity by user visits › Requires that dynamic scripting is enabled from the Elasticsearch. All hosting partners won’t allow it.
  34. 34. SUMMARY › Event-driven search engines fits nicely into Episerver projects › Episerver Find is build on top of Elasticsearch › Elasticsearch/Find fits with most CMSes because they lack good search tools › CQRS pattern will help with performance but choose wisely how to use it › Invest your platform that it’s customizable. So it fits your next project also.