7 February 2020 | Paris - Bercy
L’Événement Search Marketing
PARIS 2020
2
TEACHING MACHINES &
HUMANS TO IMPROVE
SITE SEARCH RESULTS
JP SHERMAN
MANAGER OF SEARCH & FINDABILITY
STATE OF SITE SEARCH
Search is much larger than
search engines.
@JPSHERMAN
BIGGER THAN GOOGLE?
Now, what about those non-search engine searches?
Amazon, Facebook, Sohu, Weibo, Reddit, Instagram, Twitter, Ebay….
Web Search
App Search
GOOGLE’S 2 TRILLION PER YEAR SEARCH VOLUME
4
Maintaining Site search
will
● Increase Conversions
● Reduce Abandonment
● Reinforce Expertise
● Deliver a Good User &
Brand Experience
@JPSHERMAN
SEARCH AS A BEHAVIOR IS FRACTURED
THERE ARE MORE WAYS TO SEARCH THAN EVER.
5
Search isn’t just a search
engine. It’s in an
application, in IoT, in smart
devices
Findability Is:
● Understanding “How”
● Understanding
Selection
● Understanding Behavior
● Understanding Intent
@JPSHERMAN
IF THEY’RE SEARCHING ON YOUR SITE...
IF THEY DON’T FIND IT,
THEY WILL LEAVE YOU.
THEY THINK YOU HAVE WHAT THEY’RE LOOKING FOR.
6
If a user cannot find what
they’re looking for, they
know that Google is less
than a second away.
● They think you have
what they want
● They’re probably right
● If it’s not findable
● They’re gone.
@JPSHERMAN
IF THEY FIND IT, DO BALLOONS DROP?
THAT’S THE EXPECTATION.
NO.
7
@JPSHERMAN
USERS REMEMBER THEIR SITE SEARCH
EXPERIENCE
USERS ARE NOT KIND.
8
Clever girl...
A poor search experience
is remembered.
● Some trust is lost
● They’ll go to Google
● They may find what
they’re looking for.
● Let's hope your
competitor doesn’t rank.
@JPSHERMAN
SEARCH BEHAVIOR: HOW … NOT WHAT...
USERS SCAN WITH PURPOSE AND INTENT
9
Passive Search Active Search
Users apply criteria as they
scan through your results
● They have acceptance and
rejection criteria
● They spend less than a second
scanning a snippet
● Perception of Value is Critical
@JPSHERMAN
SITE SEARCH BEHAVIORAL SCIENCE
INFORMATION SCENT TRAILS
USERS LOOK FOR “INFORMATION SCENT TRAILS”
10
USERS SCAN FOR
PATTERNS
● They include elements
of or related to their
intent
● They look at textual,
image proximities
● Active vs. Passive
Scanning
● Value Signals.
@JPSHERMAN
INFORMATION SCENT TRAILS
A QUICK EXAMPLE
11
An intent based word-
cloud.
● Users scan
● When words match
intent
● Acceptance & Rejection
Criteria.
● One will lead to an
information trail.
TYPES PROPERTIES
@JPSHERMAN
USER PERCEPTION OF VALUE
WITH INTENT, USERS LOOK FOR VALUE
12
Results for “Road Bikes”
● sigh.
● They all look alike
● Which one is good?
@JPSHERMAN
USER PERCEPTION OF VALUE
WITH INTENT, USERS LOOK FOR VALUE
13
Results for “Road Bikes”
● Value applied as metadata.
● Triggers for behavior
● Which one is better?
@JPSHERMAN
THINGS HUMANS CAN DO TO IMPROVE
RESULTS
SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO
14
Actionable Tasks to Improve Site Search
Results:
● Keyword Metadata
● Synonym Lists
● Boosted Results
● SERP Features
● Clickstream Data
● Personalization
@JPSHERMAN
THINGS HUMANS CAN DO TO IMPROVE
RESULTS
IMPROVE THE SERP DESIGN
15
Actionable Tasks to Improve
Site Search Results:
● SERP Features
AUTOCOMPLETE/
AUTOSUGGEST
FACETS
KEYMATCH
KNOWLEDGE
GRAPH
NATURAL RESULTS
@JPSHERMAN
THINGS HUMANS CAN DO TO IMPROVE
RESULTS
LOCATION CAN BE A STRONG SIGNAL OF INTENT
16
Actionable Tasks to Improve
Site Search Results:
● Personalization
Keyword: Bike Tires
Saint-Brieuc Bay Portes du Soleil
Location Bias Can
Deliver Intent
Road Bike Tires Mountain Bike Tires
@JPSHERMAN
THINGS HUMANS CAN DO TO IMPROVE
RESULTS
MEASURE HUMAN BEHAVIOR
17
Users apply criteria as they scan
through your results
● Measure consumption & conversion
● Measure dwell time
● Measure time from query to conversion
@JPSHERMAN
THINGS HUMANS CAN DO TO IMPROVE
RESULTS
SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO
18
Design your SERP for the user.
● SERP Design
● Accessibility for people with
visual impairments
● Snippet Design
● Features
● Disambiguation
@JPSHERMAN
SO.. UH… WHAT’S THE POINT?
Site Search is a massive behavior across the web.
1. Simple changes to the search platform & content will pay off
2. Users who search your site think you have what they want
3. Metadata and what is displayed in the SERP influences CTR
4. Use boosting of content to quickly rank on your site-search
5. Consider Design
6. Consider Accessibility
Don’t Be Google. Google has to figure out “everything”. You don't.
Be Better Than Google.
YOU CAN DO A LOT TO MAKE SITE SEARCH BETTER, BUT THERE’S
MORE
19
@JPSHERMAN
UNDERSTANDING CONTEXT
AT RED HAT, WE SELL FREE SOFTWARE.
THIS IS WHO WE ARE
20
@JPSHERMAN
WE SUPPORT PEOPLE FIRST
WHICH MEANS THAT WE ARE A SUBSCRIPTION & SUPPORT COMPANY.
THIS IS MY CONTEXT
21
@JPSHERMAN
HAPPY PEOPLE RECOGNIZE VALUE
THE FASTER PEOPLE FIND ANSWERS TO THEIR SUPPORT NEEDS, THE HAPPIER
THEY ARE.
THIS IS HOW WE SUCCEED.
22
@JPSHERMAN
SEARCH INTENT IS REALLY HARD
PEOPLE LOOK FOR INFORMATION…. UNIQUELY.
PEOPLE CAN BE WEIRD
23
my linux is broken
@JPSHERMAN
SETTING UP THE MACHINE TO LEARN
WOULDN’T IT BE GREAT IF WE COULD PREDICT INTENT?
MAGIC ISN’T REQUIRED.
24
@JPSHERMAN
RISE OF THE MACHINES
- GOALS
- REDUCE ZERO RESULTS
- IMPROVE MATCHING
- IMPROVE CTR
SOME THINGS TO REMEMBER
25
@JPSHERMAN
FIRST, LET’S LOOK AT THE DATA YOU HAVE.
- UNSTRUCTURED DATA
STRUCTURE IS VERY IMPORTANT
26
- STRUCTURED DATA
@JPSHERMAN
NEXT, LET’S LOOK AT YOUR INFORMATION
ARCHITECTURE
- UNSTRUCTURED IA
STRUCTURE IS STILL IMPORTANT
27
- STRUCTURED IA
@JPSHERMAN
NOW, LET’S LOOK AT THE PLAN
START SIMPLE, INCREASE COMPLEXITY
28
keywordsearch
Taxonomies
EntityExtraction
Ontologies
Queryintent
Queryclassification
Semantic parsing
Clustering
RelevancyTuning
Signals
A/BTesting
LTR
Self LearningGoal
Reference:
https://www.slideshare.net/treygrainger/intent-algorithms
Trey Grainger, SVP Engineering Lucidworks
@JPSHERMAN
THE PLATFORM OF THE ENGINE
SEARCH PLATFORM
FREE, OPEN-SOURCE & POWERFUL - LUCENE IS POWERFUL.
29
Lucene is a Java-based, free and open-source search engine software platform. Lucene has
several different flavors.
● Apache Nutch
● Apache Solr
● Compass
● CrateDB
● DocFetcher
● ElasticSearch
● Kinosearch
● Swiftype
@JPSHERMAN
THE MACHINE LEARNING APPLICATION
DATA SET
THESE ARE YOUR QUERIES OR CONTENT YOU WANT TO LEARN FROM…
FOR EXAMPLE.
“ARE THESE CATS OR DOGS?”
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
30
@JPSHERMAN
THE MACHINE LEARNING APPLICATION
TRAINING SET
PARTITION YOUR DATA SET TO 10% EVALUATION, 90% TRAINING.
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
31
DATA SET: TRAINING SET: EVALUATION
@JPSHERMAN
THE MACHINE LEARNING APPLICATION
TO QUANTIFY “DOG-NESS” OR
“CAT-NESS” A POWERFUL
FORMULA IS THE OKAPI BM25
FORMULA.
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
32
Good resources on data science formulation.
https://www.datasciencecentral.com/profiles/blogs/140-machine-learning-formulas
Reference:
https://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021
Reference:
https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
@JPSHERMAN
THE MACHINE LEARNING APPLICATION
DATA OUTPUT AND FRACTIONAL SCORES.
HOW CLOSE TO “CAT-NESS” IS THIS?
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
33
0.99732134 0.87569821 0.62587471 0.0000111
@JPSHERMAN
THE MACHINE LEARNING APPLICATION
WHAT IF THE QUERY ISN’T “CATS” BUT “FUZZY ANIMALS”?
THIS IS A CORE VALUE OF STRUCTURED MARKUP AS AN ATTRIBUTE SIGNAL
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
34
0.9632115 0.9178585 0.9244844 0.9371025
@dawnieandois an inspirationto all of
us.
@JPSHERMAN
THE RECIPE FOR MACHINE LEARNING
LEARN TO RANK PLUGIN (LTR)
IT’S NOT AS HARD AS YOU MAY THINK.
35
LTR is a Lucene-compatible plugin that allows the application of machine
learning. It uses a wide variety of ranking signals.
● QUERY INDEPENDENT: Looks only at the body of indexed content
● QUERY DEPENDENT: Looks at both the query and the document, most often a TF-
IDF score.
● QUERY LEVEL FEATURES: Looks only at the query
@JPSHERMAN
HOW MACHINES PREDICT INTENT
A SUPPORT ORGANIZATION’S
PRIMARY TASK IS TO HELP
THIS IS THE CORE OF FINDABILITY
- TO DELIVER THE RIGHT
INFORMATION
- AT THE RIGHT TIME
- TO RESOLVE AN ISSUE
- QUICKLY
- THROUGH ANY CHANNEL
THROUGH THE CONTEXT OF SUPPORT
36
@JPSHERMAN
CONFIGURING SYSTEM FOR EXPERIMENTS
STARTING SMALL:
- WE SELECTED A SINGLE PRODUCT
- RED HAT OPENSHIFT
- WE SELECTED A SINGLE USE-CASE
- TROUBLESHOOTING
- WE SELECTED THE CONTENT
- SOLUTION CONTENT TYPE
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
37
@JPSHERMAN
CONFIGURING SYSTEM FOR EXPERIMENTS
STARTING SMALL:
- SOLUTION CONTENT IS 30% OF
CONTENT.
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
38
DOCUMENTATION
VIDEOS
ARTICLES
SECURITY
PRODUCT
SOLUTION
@JPSHERMAN
CONFIGURING SYSTEM FOR EXPERIMENTS
APPROXIMATELY 100K INDIVIDUAL
PIECES OF CONTENT.
- 90K WENT TO TRAINING
PARTITION
- 10K WENT TO EVALUATION
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
39
@JPSHERMAN
RUN THE RELEVANCY ALGORITHM
RUN THE EVALUATION
- FIRST CHECK:
- DOES THIS LOOK RIGHT?
- CONFIRM/ CORRECT SAMPLE
- DEFINE “SUCCESS”
WHAT PERCENT EQUALS
“GOOD”
LETS ASSUME FAILURE. WHAT
CAN BE DONE?
THIS REQUIRES SOME HUMAN INTERVENTION
40
@JPSHERMAN
RELEVENCY TUNING: THE MACHINE PARTS
FIXING THE MACHINE FIRST
41
- CHECK “WEIGHTS”:
- SIGNAL WEIGHT
- CTR WEIGHT
- IMPRESSION WEIGHT
- SYNONYM WEIGHT
- INTERNAL LINK WEIGHT
- SERP IMPRESSIONS
@JPSHERMAN
RELEVENCY TUNING: THE METADATA PARTS
TUNING THE UNDERLYING STRUCTURE OF THE DATA
42
- CHECK “WEIGHTS”:
- TITLE, DESCRIPTION &
KEYWORD METADATA
- STRUCTURED MARKUP
- SCHEMA
- PAGE COMPONENTS
ABSTRACT
CONCLUSION
COMMENTS
- ENTITIES/ TAXONOMIES/
ONTOLOGIES
@JPSHERMAN
RELEVENCY TUNING: THE CONTENT PARTS
PRACTICAL EXAMPLE OF WHY CONTENT IS KING
43
- CHECK CONTENT:
- IS YOUR CONTENT DESCRIPTIVE?
- DOES IT HAVE DIVERSE
LANGUAGE/ WORDS?
- IS IT ORGANIZED?
- ARE THEIR DIFFERENT CONTENT
TYPES?
THIS IS “GOOD SEO FOR CONTENT”
@JPSHERMAN
TUNING THE MACHINE FOR IMPROVEMENT
TRAINING DATA
- CONTROL GROUP
EVALUATION DATA
- EXPERIMENTAL GROUP
USE CTR AS A SUCCESS
SIGNAL
DETERMINE SIGNIFICANCE
CONFIRM WITH METRICS.
44
@JPSHERMAN
PUTTING IT ALL TOGETHER
CONTENTQUALITY
- WRITE GREAT, DESCRIPTIVE CONTENT
METADATA & IA
- DONTIGNORE YOURMETADATA, SCHEMA &
STRUCTURE
MACHINE RELEVANCY
- THE MACHINE WILL ATTEMPTTO
UNDERSTAND, BUT TUNING REQUIRES
HUMANS.
MEASUREMENT
- DEFINE WHAT SUCCESS IS, SEGMENT&
MEASURE CTR.
MACHINE LEARNING & INTENTION DETECTION IS A BALANCING
ACT
45
@JPSHERMAN
THANK YOU SO MUCH
46@JPSHERMAN
PLEASE FEEL FREE TO TALK TO ME.
BECAUSE NO ONE DOES THIS ALONE… THANK YOU:
- JASON BARNARD
- MANIKANDAN SIVANESAN
- JIM SCARBOROUGH
- DAWN ANDERSON
- MARIANNE SWEENY
- TREY GRAINGER
- CHARLIE HULL
- JAIMIE ALBERICO
- HAMLET BATISTA
- BRITNEY MULLER
- MARTHA VAN BERKEL
- GRANT INGERSOLL
- JR OAKES
- MICHAEL KING
- MARK TRAPHAGEN
- BARRY ADAMS
- JENN HOFFMAN
- JENNY HALASZ

How Humans & Machines Can Improve Site Search Results - Search Y: Paris

  • 1.
    7 February 2020| Paris - Bercy L’Événement Search Marketing PARIS 2020
  • 2.
    2 TEACHING MACHINES & HUMANSTO IMPROVE SITE SEARCH RESULTS JP SHERMAN MANAGER OF SEARCH & FINDABILITY
  • 3.
    STATE OF SITESEARCH Search is much larger than search engines. @JPSHERMAN
  • 4.
    BIGGER THAN GOOGLE? Now,what about those non-search engine searches? Amazon, Facebook, Sohu, Weibo, Reddit, Instagram, Twitter, Ebay…. Web Search App Search GOOGLE’S 2 TRILLION PER YEAR SEARCH VOLUME 4 Maintaining Site search will ● Increase Conversions ● Reduce Abandonment ● Reinforce Expertise ● Deliver a Good User & Brand Experience @JPSHERMAN
  • 5.
    SEARCH AS ABEHAVIOR IS FRACTURED THERE ARE MORE WAYS TO SEARCH THAN EVER. 5 Search isn’t just a search engine. It’s in an application, in IoT, in smart devices Findability Is: ● Understanding “How” ● Understanding Selection ● Understanding Behavior ● Understanding Intent @JPSHERMAN
  • 6.
    IF THEY’RE SEARCHINGON YOUR SITE... IF THEY DON’T FIND IT, THEY WILL LEAVE YOU. THEY THINK YOU HAVE WHAT THEY’RE LOOKING FOR. 6 If a user cannot find what they’re looking for, they know that Google is less than a second away. ● They think you have what they want ● They’re probably right ● If it’s not findable ● They’re gone. @JPSHERMAN
  • 7.
    IF THEY FINDIT, DO BALLOONS DROP? THAT’S THE EXPECTATION. NO. 7 @JPSHERMAN
  • 8.
    USERS REMEMBER THEIRSITE SEARCH EXPERIENCE USERS ARE NOT KIND. 8 Clever girl... A poor search experience is remembered. ● Some trust is lost ● They’ll go to Google ● They may find what they’re looking for. ● Let's hope your competitor doesn’t rank. @JPSHERMAN
  • 9.
    SEARCH BEHAVIOR: HOW… NOT WHAT... USERS SCAN WITH PURPOSE AND INTENT 9 Passive Search Active Search Users apply criteria as they scan through your results ● They have acceptance and rejection criteria ● They spend less than a second scanning a snippet ● Perception of Value is Critical @JPSHERMAN
  • 10.
    SITE SEARCH BEHAVIORALSCIENCE INFORMATION SCENT TRAILS USERS LOOK FOR “INFORMATION SCENT TRAILS” 10 USERS SCAN FOR PATTERNS ● They include elements of or related to their intent ● They look at textual, image proximities ● Active vs. Passive Scanning ● Value Signals. @JPSHERMAN
  • 11.
    INFORMATION SCENT TRAILS AQUICK EXAMPLE 11 An intent based word- cloud. ● Users scan ● When words match intent ● Acceptance & Rejection Criteria. ● One will lead to an information trail. TYPES PROPERTIES @JPSHERMAN
  • 12.
    USER PERCEPTION OFVALUE WITH INTENT, USERS LOOK FOR VALUE 12 Results for “Road Bikes” ● sigh. ● They all look alike ● Which one is good? @JPSHERMAN
  • 13.
    USER PERCEPTION OFVALUE WITH INTENT, USERS LOOK FOR VALUE 13 Results for “Road Bikes” ● Value applied as metadata. ● Triggers for behavior ● Which one is better? @JPSHERMAN
  • 14.
    THINGS HUMANS CANDO TO IMPROVE RESULTS SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO 14 Actionable Tasks to Improve Site Search Results: ● Keyword Metadata ● Synonym Lists ● Boosted Results ● SERP Features ● Clickstream Data ● Personalization @JPSHERMAN
  • 15.
    THINGS HUMANS CANDO TO IMPROVE RESULTS IMPROVE THE SERP DESIGN 15 Actionable Tasks to Improve Site Search Results: ● SERP Features AUTOCOMPLETE/ AUTOSUGGEST FACETS KEYMATCH KNOWLEDGE GRAPH NATURAL RESULTS @JPSHERMAN
  • 16.
    THINGS HUMANS CANDO TO IMPROVE RESULTS LOCATION CAN BE A STRONG SIGNAL OF INTENT 16 Actionable Tasks to Improve Site Search Results: ● Personalization Keyword: Bike Tires Saint-Brieuc Bay Portes du Soleil Location Bias Can Deliver Intent Road Bike Tires Mountain Bike Tires @JPSHERMAN
  • 17.
    THINGS HUMANS CANDO TO IMPROVE RESULTS MEASURE HUMAN BEHAVIOR 17 Users apply criteria as they scan through your results ● Measure consumption & conversion ● Measure dwell time ● Measure time from query to conversion @JPSHERMAN
  • 18.
    THINGS HUMANS CANDO TO IMPROVE RESULTS SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO 18 Design your SERP for the user. ● SERP Design ● Accessibility for people with visual impairments ● Snippet Design ● Features ● Disambiguation @JPSHERMAN
  • 19.
    SO.. UH… WHAT’STHE POINT? Site Search is a massive behavior across the web. 1. Simple changes to the search platform & content will pay off 2. Users who search your site think you have what they want 3. Metadata and what is displayed in the SERP influences CTR 4. Use boosting of content to quickly rank on your site-search 5. Consider Design 6. Consider Accessibility Don’t Be Google. Google has to figure out “everything”. You don't. Be Better Than Google. YOU CAN DO A LOT TO MAKE SITE SEARCH BETTER, BUT THERE’S MORE 19 @JPSHERMAN
  • 20.
    UNDERSTANDING CONTEXT AT REDHAT, WE SELL FREE SOFTWARE. THIS IS WHO WE ARE 20 @JPSHERMAN
  • 21.
    WE SUPPORT PEOPLEFIRST WHICH MEANS THAT WE ARE A SUBSCRIPTION & SUPPORT COMPANY. THIS IS MY CONTEXT 21 @JPSHERMAN
  • 22.
    HAPPY PEOPLE RECOGNIZEVALUE THE FASTER PEOPLE FIND ANSWERS TO THEIR SUPPORT NEEDS, THE HAPPIER THEY ARE. THIS IS HOW WE SUCCEED. 22 @JPSHERMAN
  • 23.
    SEARCH INTENT ISREALLY HARD PEOPLE LOOK FOR INFORMATION…. UNIQUELY. PEOPLE CAN BE WEIRD 23 my linux is broken @JPSHERMAN
  • 24.
    SETTING UP THEMACHINE TO LEARN WOULDN’T IT BE GREAT IF WE COULD PREDICT INTENT? MAGIC ISN’T REQUIRED. 24 @JPSHERMAN
  • 25.
    RISE OF THEMACHINES - GOALS - REDUCE ZERO RESULTS - IMPROVE MATCHING - IMPROVE CTR SOME THINGS TO REMEMBER 25 @JPSHERMAN
  • 26.
    FIRST, LET’S LOOKAT THE DATA YOU HAVE. - UNSTRUCTURED DATA STRUCTURE IS VERY IMPORTANT 26 - STRUCTURED DATA @JPSHERMAN
  • 27.
    NEXT, LET’S LOOKAT YOUR INFORMATION ARCHITECTURE - UNSTRUCTURED IA STRUCTURE IS STILL IMPORTANT 27 - STRUCTURED IA @JPSHERMAN
  • 28.
    NOW, LET’S LOOKAT THE PLAN START SIMPLE, INCREASE COMPLEXITY 28 keywordsearch Taxonomies EntityExtraction Ontologies Queryintent Queryclassification Semantic parsing Clustering RelevancyTuning Signals A/BTesting LTR Self LearningGoal Reference: https://www.slideshare.net/treygrainger/intent-algorithms Trey Grainger, SVP Engineering Lucidworks @JPSHERMAN
  • 29.
    THE PLATFORM OFTHE ENGINE SEARCH PLATFORM FREE, OPEN-SOURCE & POWERFUL - LUCENE IS POWERFUL. 29 Lucene is a Java-based, free and open-source search engine software platform. Lucene has several different flavors. ● Apache Nutch ● Apache Solr ● Compass ● CrateDB ● DocFetcher ● ElasticSearch ● Kinosearch ● Swiftype @JPSHERMAN
  • 30.
    THE MACHINE LEARNINGAPPLICATION DATA SET THESE ARE YOUR QUERIES OR CONTENT YOU WANT TO LEARN FROM… FOR EXAMPLE. “ARE THESE CATS OR DOGS?” DATA SETS, TRAINING SETS AND HOW TO MEASURE. 30 @JPSHERMAN
  • 31.
    THE MACHINE LEARNINGAPPLICATION TRAINING SET PARTITION YOUR DATA SET TO 10% EVALUATION, 90% TRAINING. DATA SETS, TRAINING SETS AND HOW TO MEASURE. 31 DATA SET: TRAINING SET: EVALUATION @JPSHERMAN
  • 32.
    THE MACHINE LEARNINGAPPLICATION TO QUANTIFY “DOG-NESS” OR “CAT-NESS” A POWERFUL FORMULA IS THE OKAPI BM25 FORMULA. DATA SETS, TRAINING SETS AND HOW TO MEASURE. 32 Good resources on data science formulation. https://www.datasciencecentral.com/profiles/blogs/140-machine-learning-formulas Reference: https://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021 Reference: https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables @JPSHERMAN
  • 33.
    THE MACHINE LEARNINGAPPLICATION DATA OUTPUT AND FRACTIONAL SCORES. HOW CLOSE TO “CAT-NESS” IS THIS? DATA SETS, TRAINING SETS AND HOW TO MEASURE. 33 0.99732134 0.87569821 0.62587471 0.0000111 @JPSHERMAN
  • 34.
    THE MACHINE LEARNINGAPPLICATION WHAT IF THE QUERY ISN’T “CATS” BUT “FUZZY ANIMALS”? THIS IS A CORE VALUE OF STRUCTURED MARKUP AS AN ATTRIBUTE SIGNAL DATA SETS, TRAINING SETS AND HOW TO MEASURE. 34 0.9632115 0.9178585 0.9244844 0.9371025 @dawnieandois an inspirationto all of us. @JPSHERMAN
  • 35.
    THE RECIPE FORMACHINE LEARNING LEARN TO RANK PLUGIN (LTR) IT’S NOT AS HARD AS YOU MAY THINK. 35 LTR is a Lucene-compatible plugin that allows the application of machine learning. It uses a wide variety of ranking signals. ● QUERY INDEPENDENT: Looks only at the body of indexed content ● QUERY DEPENDENT: Looks at both the query and the document, most often a TF- IDF score. ● QUERY LEVEL FEATURES: Looks only at the query @JPSHERMAN
  • 36.
    HOW MACHINES PREDICTINTENT A SUPPORT ORGANIZATION’S PRIMARY TASK IS TO HELP THIS IS THE CORE OF FINDABILITY - TO DELIVER THE RIGHT INFORMATION - AT THE RIGHT TIME - TO RESOLVE AN ISSUE - QUICKLY - THROUGH ANY CHANNEL THROUGH THE CONTEXT OF SUPPORT 36 @JPSHERMAN
  • 37.
    CONFIGURING SYSTEM FOREXPERIMENTS STARTING SMALL: - WE SELECTED A SINGLE PRODUCT - RED HAT OPENSHIFT - WE SELECTED A SINGLE USE-CASE - TROUBLESHOOTING - WE SELECTED THE CONTENT - SOLUTION CONTENT TYPE MORE DATA SETS, TRAINING SETS AND EVALUATION SETS 37 @JPSHERMAN
  • 38.
    CONFIGURING SYSTEM FOREXPERIMENTS STARTING SMALL: - SOLUTION CONTENT IS 30% OF CONTENT. MORE DATA SETS, TRAINING SETS AND EVALUATION SETS 38 DOCUMENTATION VIDEOS ARTICLES SECURITY PRODUCT SOLUTION @JPSHERMAN
  • 39.
    CONFIGURING SYSTEM FOREXPERIMENTS APPROXIMATELY 100K INDIVIDUAL PIECES OF CONTENT. - 90K WENT TO TRAINING PARTITION - 10K WENT TO EVALUATION MORE DATA SETS, TRAINING SETS AND EVALUATION SETS 39 @JPSHERMAN
  • 40.
    RUN THE RELEVANCYALGORITHM RUN THE EVALUATION - FIRST CHECK: - DOES THIS LOOK RIGHT? - CONFIRM/ CORRECT SAMPLE - DEFINE “SUCCESS” WHAT PERCENT EQUALS “GOOD” LETS ASSUME FAILURE. WHAT CAN BE DONE? THIS REQUIRES SOME HUMAN INTERVENTION 40 @JPSHERMAN
  • 41.
    RELEVENCY TUNING: THEMACHINE PARTS FIXING THE MACHINE FIRST 41 - CHECK “WEIGHTS”: - SIGNAL WEIGHT - CTR WEIGHT - IMPRESSION WEIGHT - SYNONYM WEIGHT - INTERNAL LINK WEIGHT - SERP IMPRESSIONS @JPSHERMAN
  • 42.
    RELEVENCY TUNING: THEMETADATA PARTS TUNING THE UNDERLYING STRUCTURE OF THE DATA 42 - CHECK “WEIGHTS”: - TITLE, DESCRIPTION & KEYWORD METADATA - STRUCTURED MARKUP - SCHEMA - PAGE COMPONENTS ABSTRACT CONCLUSION COMMENTS - ENTITIES/ TAXONOMIES/ ONTOLOGIES @JPSHERMAN
  • 43.
    RELEVENCY TUNING: THECONTENT PARTS PRACTICAL EXAMPLE OF WHY CONTENT IS KING 43 - CHECK CONTENT: - IS YOUR CONTENT DESCRIPTIVE? - DOES IT HAVE DIVERSE LANGUAGE/ WORDS? - IS IT ORGANIZED? - ARE THEIR DIFFERENT CONTENT TYPES? THIS IS “GOOD SEO FOR CONTENT” @JPSHERMAN
  • 44.
    TUNING THE MACHINEFOR IMPROVEMENT TRAINING DATA - CONTROL GROUP EVALUATION DATA - EXPERIMENTAL GROUP USE CTR AS A SUCCESS SIGNAL DETERMINE SIGNIFICANCE CONFIRM WITH METRICS. 44 @JPSHERMAN
  • 45.
    PUTTING IT ALLTOGETHER CONTENTQUALITY - WRITE GREAT, DESCRIPTIVE CONTENT METADATA & IA - DONTIGNORE YOURMETADATA, SCHEMA & STRUCTURE MACHINE RELEVANCY - THE MACHINE WILL ATTEMPTTO UNDERSTAND, BUT TUNING REQUIRES HUMANS. MEASUREMENT - DEFINE WHAT SUCCESS IS, SEGMENT& MEASURE CTR. MACHINE LEARNING & INTENTION DETECTION IS A BALANCING ACT 45 @JPSHERMAN
  • 46.
    THANK YOU SOMUCH 46@JPSHERMAN PLEASE FEEL FREE TO TALK TO ME. BECAUSE NO ONE DOES THIS ALONE… THANK YOU: - JASON BARNARD - MANIKANDAN SIVANESAN - JIM SCARBOROUGH - DAWN ANDERSON - MARIANNE SWEENY - TREY GRAINGER - CHARLIE HULL - JAIMIE ALBERICO - HAMLET BATISTA - BRITNEY MULLER - MARTHA VAN BERKEL - GRANT INGERSOLL - JR OAKES - MICHAEL KING - MARK TRAPHAGEN - BARRY ADAMS - JENN HOFFMAN - JENNY HALASZ