The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Flynn, SAS
1. The SAS Search Journey:
Using AI to Move from Google
to Lucidworks
Alex Flynn
Senior Manager, IT Operations - SAS
@aeflynn
#Activate18 #ActivateSearch
2. Agenda
• Where Did We Start?
• Welcome to Fusion
• Search Themes
• Ranking and Relevance
• Indexing and Content
• Future Plans
• Why We Chose Lucidworks
3. Where Did We Start?
• Had TWO solutions for Enterprise – Need ONE
• INTRANET / Corporate Internal Sites
• Google Search Appliance
• a.k.a. Yellow Box
• INTERNET / Customer Facing Sites
• Verity UltraSeek
• Formerly Inktomi Enterprise Search
4. Welcome to Fusion!
• Staff – how many developers does it take to create an
Enterprise Site?
• Issues out of the Box – Hired PS
• How many pages to Index?
• Language Support
5. AI – Search: How Bad Is It?
• Current Search Complaints
• “I can’t find….”
• Check logs
• Time spent on Pages, Abandon Rates
• Google…..
• Data - Results
• SAS Data Sets
• Manual before Automation
11. Ranking and Relevance
• June 7 Release
• Broadened fields considered in search result ranking
• Initial excludes / Content clean up
• Changed operator to AND
• Adjusted search engine configuration
• More Data, More AI
• June 21 Release
• De-boosted specific content
• More excludes
• Slight boost for home pages in the
Sales & Marketing Areas
12. Indexing and Content – Round 1
• viya
• gdpr
• visual investigator
• visual analytics
• client
• sas global forum
• data management
• fraud
• office analytics
• analytics in action
• accenture
• tableau
• enterprise miner pricing
50+%
Top Result
60+%
In Top 3
14. Indexing and Content – Round 2
• accenture
• aml
• analytics
• analytics experience
• analytics in action
• ax 2017
• banking events
• beacon accounts
• ci
• ci 360
• ci360
• clientiq
• cloud
• communications
• competitors
• contextual analysis
• customer decision hub
• customer intelligence
• data management
• data management locale
• demos
60+%
Top Result
70+%
In Top 3
• ebc
• enterprise miner
• enterprise miner fees
• enterprise miner packaging
• enterprise miner policy
• enterprise miner pricing
• esp
• event stream processing
• events
• executive briefing center
• fees
• forecast server
• fraud
• gdpr
• GDPR
• government
• informatica
• internal case studies
• machine learning
• manufacturing webinar
• marketing automation pricing
• Mips
• Mips table
• office analytics
• omnichannel webinar
• paxata
• pricing
• retail
• risk
• sales highlights
• sas global forum
• sgf
• strata
• strategic accounts
• tableau
• text data language pack
• text language
• training
• upgrades
• vdmml
• visual analytics
• visual investigator
• visual investigator demo
• visual investigator
presentation
• viya
• viya trial
15. Welcome to Fusion - Today
• Staff – Maintained by 3 Developers, including 1 Team Lead
• Work with Partners:
• Documentation, Support, Marketing
• Upgrades!
• How do you move from 2.x to 3.x to 4.x ?
• Upgrade in place? Or clean index?
• Security Scans surface jar file issues
• Use Customer Success Manager
16. Future Plans
• Fusion 4
• AEM “push” Logic, to index new content immediately
• Auto-suggest
• Spellcheck
• Use an API to import data from Fusion
to SAS Visual Analytics dashboard
17. Why We Chose Lucidworks
• Data Analytics and ability to use AI
• Cross Data Center Replication (CDCR)
• Administration of SOLR is so easy!
• Connectors
• Query and Index Pipelines – we can
modify data and make it SEO friendly
*before* indexing.
• Ongoing SOLR Support
Organic growth for an Enterprise Search need.
Google Search Appliance – GSA – in 2004, back when they shipped the appliance with a t-shirt. Move to the “Yellow Box” in 2008. In the end, indexing almost 9 million pages.
Google announced End of Life in 2016, saying support would end in 2017-18. Renewals exponentially increased as end date got closer.
UltraSeek: Licensed through Hewlett Packard, JAVA API, Circa 200. Prior to that used AltaVista and then Inktomi (2000). Inktomi was sold to
Verity which then became Verity UltraSeek (2002). Last release was in 2005 - and that was running until 2017.
Started with Development team of 4, a Project Manager, and company partners (Marketing, Support, site owners).
No Architect – used Lucidworks Professional Services to help with clusters and infrastructure recommendations.
How many pages? Intranet – 8.8 million to 3.6 million, 2 collections. Internet: 570K documents, 13 collections.
Internet: Started with English Speaking country site (first Iceland, then others)
SAS is a Data Company – so we gathered up data from logs, anecdotal comments, and SEO
Google
Quick links look like ad space
Very outdated design with CSS problems
Bad results – get me the right results fast. Fails miserably. In this search for Hadoop, the first result is the home page, the 2nd result is the RSS feeds page, the 3rd is the News page and so on.
SO – we know the problem – how do we get to where we want?
Data from Search Logs, Signals (Fusion), and import into SAS for analysis.
Present the terms to site owners. Work on relevance (more about that in a few slides).
Here is where we ended up.
New, modern simple and clean design for search results page
Authored and localized in Adobe AEM. Countries have created their own localized versions of these search results pages in their local language.
Search refinements through facets are available but aren’t displayed by default.
When replacing a search tool, there are factors to consider:
Ranking and Relevance – Ranking Parameters: What should be in the Top 5? Top 10? Top 20?
Configuration Changes: Define Audience - Marketing Dept? Support?
Indexing and Content – use this time to clean up old and irrelevant content, engage content providers to eliminate and BOOST
or Enhance content with SEO tools (metadata tags, SEO tools for Content Management – like YOST for WP)
Where to Start? Can’t do everything at once - Use an Agile development environment, with 2 week Sprints.
Intra Content went from 8.2 million to 3,600,000 pages to 900K
Internet Content decreased to selecting and creating limited collections.
Operator: AND instead of OR Ranking Fields: Added keywords, description Relevance Boosts: Title*2, Content*1.5 Other Settings: Obey robots.txt, Respect redirects
First group of “key words” – (Search Terms)
Use an Agile development environment, with 2 week Sprints.
Operator: AND instead of OR
Ranking Fields: Added keywords, description
Relevance Boosts: Title*2, Content*1.5
Other Settings: Obey robots.txt, Respect redirects
Second group – dozens more – of “key words” (Search Terms)
Staff: Down to 3 people for a two Global Enterprise Web Sites (Internal and External Customer Facing)
Upgrades: Fusion from 2.x to 3.x then Fusion 3.1.3 to Fusion 4.0.1 for Intranet Search. No “in place” script yet.
How many people are still on version 2? Version 3? Has anyone migrated to Version 4? (show of hands)
Fusion from 2.x to 3.x on our Intranet Search clusters - contacted Lucidworks customer support and customer success teams.
for Internet search Fusion upgrade we decided to directly install Fusion 4.x on new clusters to avoid 2.4.5 to 3.x to 4.x
We now have to upgrade Fusion 3.1.3 to Fusion 4.0.1 for Intranet Search. No “in place” script yet.
We Scan for Security Vulnerabilities, per policy, and have run into several jars with ‘HIGH’ security vulnerability. Each time we upgrade, we have to manually replace these vulnerable jars and open a Lucidworks support ticket to get confirmation from Lucidworks that we have not violated any terms and conditions.
Thanks to Lucidworks for their ongoing support, as well as listening to our feedback.
Fusion 4 – our plans
AEM Push Logic
Auto-suggest, Spell Check
Use an API to import data directly form Fusion to a SAS dashboard (right now, import is manual)
Data Analytics – we want to use our own data to make the Search Engine “smart” – boost results, using Signals and SAS
Cross Data Center Replication – important in our High Availability search
Unable to administer SOLR on our own – it’s a powerful tool, but needs powerful support
Ongoing SOLR Support