The document discusses automating the assessment of search quality by analyzing search result properties. It proposes calculating a quality score from 0-1 for each result based on factors like keyword matching in titles/snippets, uniqueness of titles, result set size, and result age. The scores would be averaged to assess a search term's results. This automated approach aims to mimic human evaluation of search quality without requiring manual reviews. The analysis focuses on the first page of results and user-visible aspects to quickly gauge changes' impacts without an in-depth content review.
SIKM Leaders July 2012 - Understanding your Search Logpekadad
Â
Presentation used for the SIKM Leaders call for July 2012. Covers the challenges of the long tail of your search log and some ideas to grapple with those
SIKM Leaders July 2012 - Understanding your Search Logpekadad
Â
Presentation used for the SIKM Leaders call for July 2012. Covers the challenges of the long tail of your search log and some ideas to grapple with those
On April 4, 2013 Jenn Mathews AKA the SEOGoddess spoke in front of a large group of tech savvy individuals looking to gain more understanding into the world of SEO. Jenn's talk included updates from Google's recent Panda and Penguin updates as well as incorporating Social media into SEO. This is a preview of her PPT for that talk.
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Personalizing Content Using Taxonomy with Megan Gilhooly, Vice President Cust...LavaConConference
Â
Watch the recording! - https://youtu.be/8P8LMgcaZpg
Technical content is playing an increasingly important role in the overall digital experience for leading companies. The goal of your content must be to provide relevant, personalized answers to technical questions about your product as quickly as possible. This lets you unlock the true value and ROI in your technical content resources.
Megan will demonstrate:
What is a taxonomy
How a taxonomy helps you facilitate excellence in personalized content delivery
The basics of taxonomy design for technical content
Keyword Intelligence - Socialize West 2011Ron Jones
Â
This presentation was given at the Socialize West 2011 conference in San Francisco. It covers the basics of keyword research principles and strategies but also explores keyword tools / techniques for use with social media.
It provides links to mainstream keyword research tools as well as social media related keyword tools and how to use them.
Much of the content came from a book that has recently been authored by Ron Jones and published by Wiley Publishing called - Keyword Intelligence: Keyword Research for Search, Social and Beyond.
Ditch the Keyword Based SEO Content StrategyNicole Hess
Â
Focusing on quality content rather than churning out blog post after blog post appears to be agreed upon by most, yet when it comes to action, a different story often emerges. SEOs still receive requests to, “find a keyword that we can create a blog for” and slightly more evolved, “tell me how many blogs it will take to rank for this one keyword.”
Instead of following these requests it’s time we explore the needs of our clients and develop the strategies that will deliver on them. To do this, I share five methods I’ve used with clients to move from a keyword based approach to a topic and intent strategy.
Developments by Google from the Panda update in 2011 to the rollout of RankBrain and recent changes in the Adwords Keyword Planner...the writing's on the wall - it’s time SEOs have a new strategy for content.
Going after single keywords based on search volume is a strategy only the big brands can achieve, and even they are having difficulties with that approach nowadays. Whether you work for a big brand or a local startup, SEOs need strategies and tools to handle the advancements of Google and create great content.
In this presentation, Nicole Hess shares the tools and processes that she and the team at her agency, Greenlane Search Marketing, use to research topics, search intent and semantics.
Covered are the main steps to their processes, examples, and a Greenlane custom tool that has cut our research time by 375% (no joke, you’ll see!).
Specifically included are:
1. Intro and status check on SEO research: What are you really trying to find?
2. Think Outside the Keyword Box: Creative brainstorming.
3. Topics: Getting data behind topics and the keywords that fuel them.
4. Know your competition: Defining the topics your competitors are strong at.
5. Search intent discovery: From purchases to research and comparison, identify what content the searcher wants to find.
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
A discussion on using AI for extracting/ representing search intent for e-commerce queries. Presented at MICES 2021
About The Author:
Aritra Mandal is an applied researcher on the search team at eBay. He focuses on query understanding and is leveraging AI/ML, structured data, and knowledge graphs to improve the search engine for e-commerce marketplace.
Aritra received his B.Eng in computer science from Birla Institute of Technology, Mesra and his MS in computer and information science from Indiana University–Purdue University Indianapolis.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Â
Heather Hedden, Senior Consultant at Enterprise Knowledge, presented “The Role of Taxonomy and Ontology in Semantic Layers” at a webinar hosted by Progress Semaphore on April 16, 2024.
Taxonomies at their core enable effective tagging and retrieval of content, and combined with ontologies they extend to the management and understanding of related data. There are even greater benefits of taxonomies and ontologies to enhance your enterprise information architecture when applying them to a semantic layer. A survey by DBP-Institute found that enterprises using a semantic layer see their business outcomes improve by four times, while reducing their data and analytics costs. Extending taxonomies to a semantic layer can be a game-changing solution, allowing you to connect information silos, alleviate knowledge gaps, and derive new insights.
Hedden, who specializes in taxonomy design and implementation, presented how the value of taxonomies shouldn’t reside in silos but be integrated with ontologies into a semantic layer.
Learn about:
- The essence and purpose of taxonomies and ontologies in information and knowledge management;
- Advantages of semantic layers leveraging organizational taxonomies; and
- Components and approaches to creating a semantic layer, including the integration of taxonomies and ontologies
Findability is the ease of which someone can locate the information they want. Often, it is confused with search – but search is just one method of achieving findability. Search allows people to enter in words that they hope are contained in the content they want to retrieve. Findability includes any method of locating this content, including but not limited to searching. Pingar DiscoveryOne improves findability.
On April 4, 2013 Jenn Mathews AKA the SEOGoddess spoke in front of a large group of tech savvy individuals looking to gain more understanding into the world of SEO. Jenn's talk included updates from Google's recent Panda and Penguin updates as well as incorporating Social media into SEO. This is a preview of her PPT for that talk.
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Personalizing Content Using Taxonomy with Megan Gilhooly, Vice President Cust...LavaConConference
Â
Watch the recording! - https://youtu.be/8P8LMgcaZpg
Technical content is playing an increasingly important role in the overall digital experience for leading companies. The goal of your content must be to provide relevant, personalized answers to technical questions about your product as quickly as possible. This lets you unlock the true value and ROI in your technical content resources.
Megan will demonstrate:
What is a taxonomy
How a taxonomy helps you facilitate excellence in personalized content delivery
The basics of taxonomy design for technical content
Keyword Intelligence - Socialize West 2011Ron Jones
Â
This presentation was given at the Socialize West 2011 conference in San Francisco. It covers the basics of keyword research principles and strategies but also explores keyword tools / techniques for use with social media.
It provides links to mainstream keyword research tools as well as social media related keyword tools and how to use them.
Much of the content came from a book that has recently been authored by Ron Jones and published by Wiley Publishing called - Keyword Intelligence: Keyword Research for Search, Social and Beyond.
Ditch the Keyword Based SEO Content StrategyNicole Hess
Â
Focusing on quality content rather than churning out blog post after blog post appears to be agreed upon by most, yet when it comes to action, a different story often emerges. SEOs still receive requests to, “find a keyword that we can create a blog for” and slightly more evolved, “tell me how many blogs it will take to rank for this one keyword.”
Instead of following these requests it’s time we explore the needs of our clients and develop the strategies that will deliver on them. To do this, I share five methods I’ve used with clients to move from a keyword based approach to a topic and intent strategy.
Developments by Google from the Panda update in 2011 to the rollout of RankBrain and recent changes in the Adwords Keyword Planner...the writing's on the wall - it’s time SEOs have a new strategy for content.
Going after single keywords based on search volume is a strategy only the big brands can achieve, and even they are having difficulties with that approach nowadays. Whether you work for a big brand or a local startup, SEOs need strategies and tools to handle the advancements of Google and create great content.
In this presentation, Nicole Hess shares the tools and processes that she and the team at her agency, Greenlane Search Marketing, use to research topics, search intent and semantics.
Covered are the main steps to their processes, examples, and a Greenlane custom tool that has cut our research time by 375% (no joke, you’ll see!).
Specifically included are:
1. Intro and status check on SEO research: What are you really trying to find?
2. Think Outside the Keyword Box: Creative brainstorming.
3. Topics: Getting data behind topics and the keywords that fuel them.
4. Know your competition: Defining the topics your competitors are strong at.
5. Search intent discovery: From purchases to research and comparison, identify what content the searcher wants to find.
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
A discussion on using AI for extracting/ representing search intent for e-commerce queries. Presented at MICES 2021
About The Author:
Aritra Mandal is an applied researcher on the search team at eBay. He focuses on query understanding and is leveraging AI/ML, structured data, and knowledge graphs to improve the search engine for e-commerce marketplace.
Aritra received his B.Eng in computer science from Birla Institute of Technology, Mesra and his MS in computer and information science from Indiana University–Purdue University Indianapolis.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Â
Heather Hedden, Senior Consultant at Enterprise Knowledge, presented “The Role of Taxonomy and Ontology in Semantic Layers” at a webinar hosted by Progress Semaphore on April 16, 2024.
Taxonomies at their core enable effective tagging and retrieval of content, and combined with ontologies they extend to the management and understanding of related data. There are even greater benefits of taxonomies and ontologies to enhance your enterprise information architecture when applying them to a semantic layer. A survey by DBP-Institute found that enterprises using a semantic layer see their business outcomes improve by four times, while reducing their data and analytics costs. Extending taxonomies to a semantic layer can be a game-changing solution, allowing you to connect information silos, alleviate knowledge gaps, and derive new insights.
Hedden, who specializes in taxonomy design and implementation, presented how the value of taxonomies shouldn’t reside in silos but be integrated with ontologies into a semantic layer.
Learn about:
- The essence and purpose of taxonomies and ontologies in information and knowledge management;
- Advantages of semantic layers leveraging organizational taxonomies; and
- Components and approaches to creating a semantic layer, including the integration of taxonomies and ontologies
Findability is the ease of which someone can locate the information they want. Often, it is confused with search – but search is just one method of achieving findability. Search allows people to enter in words that they hope are contained in the content they want to retrieve. Findability includes any method of locating this content, including but not limited to searching. Pingar DiscoveryOne improves findability.
Similar to SIKM Leaders July 2012 - Understanding your Search Log (20)
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Â
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Â
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Â
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Welocme to ViralQR, your best QR code generator.ViralQR
Â
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Â
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Â
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Â
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Monitoring Java Application Security with JDK Tools and JFR Events
Â
SIKM Leaders July 2012 - Understanding your Search Log
1. Search analytics –
Understanding the long tail
SIKM Leaders July 2012
Lee Romero
blog.leeromero.org
July 2012
2. About me
My background and early career are both in software engineering.
I've worked in the knowledge management field for the last 12+
years – almost all of it in the technology of KM
I’ve worked with various search solutions for the last 7-8 years –
and spent most of that time trying to figure out how to measure
their usefulness and improve them in any way I can.
I’ve spoken at both Enterprise Search Summit and Taxonomy Boot
Camp twice.
My writings on search analytics have been featured by a number of
experts in the field including Lou Rosenfeld and Avi Rappoport
2
3. Search Analytics
Definition: Search analytics is the field of analyzing and
aggregating usage statistics of your search solution to
understand user behavior and to improve the experience.
Some search analytics are focused on SEO / SEM activities (for
internet searches).
The focus here will be on enterprise search, so will primarily be
focusing on the aspect of improving the user experience.
Further, I will primarily focus here on keyword search and
understanding the user language found in search logs
Always remember – analytics without action does not have much
value.
3
5. Understanding your search log
For enterprise search solutions1, the “80-20” rule is not true
The language variability is very high in a couple of ways (covered
in the next few slides)
Yet having a good understanding of the language, frequency and
commonality in your search log is critical to being able to make
sustainable improvements to your search
The remainder of this presentation first provides some evidence
supporting my claim and then will cover some ideas and research
into this problem
1 This does not seem to apply equally to e-commerce solutions
5
6. Some facts about search terms
There’s an anecdote that goes something like, “80% of your
searches are from 20% of your search terms”
• Equivalently, some will say that you can make significant impact by paying
attention to a few of your most common terms (you can, but in limited ways)
Fact: in enterprise search solutions the curve is much shallower:
This chart shows the
inverted power curve for
two different solutions
I’m currently working
with
In the second case, it takes 13% of terms to cover 50% of searches,
and that is over 7000 distinct terms in a typical month!
6
7. Some facts about search terms: part 2
Another myth: a large percent of searches repeat over and over
again
Fact: on enterprise search solutions, there is surprisingly little
commonality month-to-month
Over a recent six month period, which saw a total of ~289K distinct
search terms, only 11% of terms occurred in more than 1 month!
# of months # terms % of searches
1 257665 89.2%
2 17994 6.2%
3 5790 2.0%
4 2900 1.0%
5 2019 0.7%
6 2340 0.8%
7
8. Some facts about search terms: part 3
Another myth: a good percentage of your search terms will repeat in
sequential periods
Fact: There is much more churn even month-to-month than you
might expect – in the period covered below, only about 13% of
terms repeated from one month to the next (covering about 36%
of searches)
8
9. What to do with your search log?
The summary of the previous slides:
• It is hard to understand a decent percentage of terms within a
given time period (month)!
• If you could do that, the problem during the next time period isn’t
that much easier!
The next sections describe a couple of research projects I’ve been
working on to tackle these issues
9
11. Categorizing your users’ language
Given the challenges previously laid out, using the search log to
understand user needs seems very challenging
Beyond the first several dozen terms, it is hard to understand what
users are looking for
• And those several dozen terms cover a vanishingly small percentage of all
searches!
However, it would be very useful to understand your users’
information needs if we could somehow understand the entirety
of the search log
How do we handle this? Categorize the search terms!
11
12. Categorizing your users’ language, p2
So we need to categorize search terms to really be able to
understand our users’ information needs.
To do this, we face two challenges
1. What categorization scheme should we use?
2. How do we apply categorization in a repeatable, scalable and manageable
way?
For the first challenge, I would recommend you use your taxonomy
(you do have one, right?)
The second challenge is a bit more difficult but is addressed later in
this deck
12
13. Categories to use
Proposal: Start with your own taxonomy and its vocabularies as the
categories into which search terms are grouped
Some searches will not fit into any of these categories, so you can
anticipate the need to add further categories
As an aside, this exercise actually provides a great measurement
tool for your taxonomy
• You can quantitatively assess the percent of your users’ language that is
classifiable with your taxonomy
• A number you may wish to drive up over time (through evolution of your
taxonomy)
13
14. Automating categorization
Now we turn to the hairier challenge – how can we categorize
search terms?
To describe the problem, we have:
1. A set of categories, which may be hierarchically related (most taxonomies
are)
2. A set of search terms, as entered by users, that need to be assigned to
those categories
Search Term
Category
Category ? Search Term
Category
Search Term
Category
Category Category
Search Term
Category
... Search Term
... ? ...
...
14
15. Automating categorization, p2
The proposed solution is based on a couple of concepts:
1. You can think of this categorization problem as search!
2. You are taking each search term and searching in an index in which the
potential search results are categories!
Question: What is the “body” of what you are searching?
Answer: Previously-categorized search terms!
Using this approach, you can consider the set of previously-
categorized search terms as a corpus against which to search
• You can apply all of the same heuristics to this search as any search:
• Word matching (not string matching)
• Stemming
• Relevancy (word ordering, proximity, # of matches, etc.)
15
16. Automating categorization, p3
Here’s a depiction of this solution
Previously
categorized
terms Search Term
Category
Category
Category Search Term
Category Search Term
Category Category Previously
Category categorized Search Term
terms
... Search Term
...
... ...
Previously
categorized
terms
This red oval represents the “matching” process
– it takes as input the search terms to be
categorized, the set of categories along with
previously-matched search terms and produces
as output a set of categories associated with the
new search terms
16
17. Automating categorization, p4: Bootstrapping
This approach depends on matching to previously-categorized terms
• Every time you categorize a new search term, you expand the set of
categorized terms, enabling more matches in the future
Bootstrapping: You can take the names of the categories (the terms
in your taxonomy) as the first set of “categorized search terms”
• This allows you to start with no search terms having been categorized at all
• You run a first round of matching against the categories to find first-level
matches
• Take those that seem like “good” matches and pull those into the set of
categorized search terms for a second iteration, etc.
• Using this in initial testing resulted in 10% of distinct terms from a month
being associated with at least one category
Another aspect: Any manual categorization of common search terms
will add to the success of categorization
17
18. Automating categorization, p5: Iterative
Previously
categorized Search Term
Category terms
Category
Search Term
Category
Category New categorizations Search Term
Category Category
Category Previously Search Term
categorized
... terms Search Term
...
... ...
New categorizations
Previously
categorized
terms
New categorizations
19. Automating categorization, p5: Iterative
This approach also needs to be applied iteratively
• You start with a set of categorized search terms and a new set of
(uncategorized) search terms
• You then apply this matching to the uncategorized search terms, getting a set
of newly-categorized search terms (with some measure of probability of
“correctness” of the match, i.e., relevancy)
• You pull in the newly-categorized search terms and run the matching process
again
• Each time, as you expand the set of categorized search terms (from a
previous match), you increase the possibility of more matches (in
subsequent matches)
19
20. Automating categorization, p6: Iterative
It will be beneficial to have a human review the set of matches for
each iteration and determine if they are accurate enough
• The measurement of relevancy is intended to do this but would likely only be
partially successful
Over time, using this process, you build up a larger and larger set of
categorized search terms
• This makes it more likely in future iterations that more terms will be
categorizable
20
21. Automating categorization, p7: No matches
There will always be search terms that do not get matched.
• This may be because the terminology used does not match
• This may be because there are no categories in the global taxonomy that
would be useful for categorization
The first issue would require a human to recognize the association
(thus, categorizing the term and then enabling matches on future
uses of that term)
The second issue would require adding in new categories (not part
of the global taxonomy)
• And then categorizing the term into the newly-added category(ies)
21
22. Summary
With this approach, we can take a set of search terms at any time
and categorize them (partially) automatically
• Over time, the accuracy of the matching will improve through human review-
and-approval of matches
We then are able to relate these information needs to a variety of
other pieces of data:
• Volume of content available to users – significant mismatches can highlight
need for new content
• Rating of content in these categories – can highlight that a particular area of
interest has content but it isn’t quality content
• Downloads of content in these categories – could highlight navigational
issues (e.g., when a category is much more highly represented in search
than in downloads)
This does not require directly working with end-users and is scalable
22
23. Additional benefits: Measuring your taxonomy
As mentioned earlier, part of the challenge will be that there will be
terms that do not match the starting categories (i.e., the global
taxonomy)
This actually highlights some valuable insight obtainable from this:
• We can identify gaps in our taxonomy (terms requiring new categories)
• We can identify areas of our taxonomy where we have many search terms
associated with a taxonomy term and consider if we need to either add or
split search terms in order to better match our users’ real language
• We can identify areas of the taxonomy that are of little use in terms of the
language used by our users
23
24. Additional benefits: Linguistic statistics
Word Distinct Terms Searches
management 3128 8283
Word counts – independent of term usage, sap 1931 3873
strategy 1414 3728
what are the most common individual business 1558 3599
words? it 1343 2992
process 1515 2920
data 1264 2899
project 1249 2823
model 1296 2791
plan 987 2170
Word networks – we can understand
the inter-relationships between
individual words (which pairs
occur commonly together,
which words occur commonly
for a given word)
These are not as much about information needs as about understanding the language
users use (so this insight can help shape categorization)
These are also very useful to prioritize your efforts in reviewing your search logs
24
25. Additional benefits: Comparing to your content space
With the statistics described in the previous slide, you could,
conceivably compare it to the same analysis applied to your
“content space”
For example, derive the statistics for the titles of content available in
your search
• Do you find significant differences? This could represent differences in the
names people apply to things and what they expect to use to find the content
Another interesting angle is to use other controlled lists as the
matched terms in a category
• People names (applied this and found about 8% of terms match a person’s
name)
• Client names
25
27. The Problem
Search sucks!
Yes, the common refrain from many users – “search doesn’t return
what I’m looking for” or “I can never find what I’m looking for”
There are many tools available to improve the users’ experience,
including:
• Improving the UI
• Improving the content included
• Manipulating settings in the engine to modify relevancy
calculations, possibly even the engine itself
The challenge for many of these is, once you make a change, how
do you know it has improved the results?
27
28. A solution?
One way to assess the impact is to have a set of users perform
either a set of pre-defined searches or a set of their own searches
and then evaluate the quality of results
The challenge with this is that it is very labor intensive, can take a
long calendar time and is hard to do iteratively.
An alternative could be to automate this evaluation!
It is important to keep in mind that this is not about the relevancy of
the results or determining whether the engine is returning the
“right” items
• It’s about assessing the user-perceived quality of a set of
results given a set of criteria for a search
28
29. Automating evaluation
The idea is to automate some of the analysis of the quality of the
result set by examining properties of the result set
This approach attempts to perform a simple test similar to what a
human user would do in scanning a set of search results
• It uses the data returned by the search engine and displayed on
the first page of results
• It does not do a “deep” review of content
29
30. The approach
The algorithm takes the following approach:
• For each search term, it executes the query against the search
engine and retrieves the results
‒For each individual result, it calculates a quality score from 0.0 to
1.0 (a higher score implies the result looks like a better result)
‒The individual scores for a search term’s set of results are
averaged to get a single score for that search term
• In addition, the current POC outputs data in a tabular format
including most of the individual elements returned by the search
engine along with the derived score
30
31. What are we looking at in assessing quality?
Facets that influence quality
• Focusing primarily on user-visible aspects
First page
Result set size
Snippet
Title
Age
Uniqueness of
title
31
32. What are we looking at in assessing quality?
Factors that influence quality
• Only examining the first page of results
• Similarity / dissimilarity of keywords to title
• Similarity / dissimilarity of keywords to excerpt
• Uniqueness of titles within the result set (just first page)
• Size of total result set
• Age of results
• Looking for specific “known” targets
• (one “cheat”) Presence of keywords in “concepts” identified by
engine
32
33. What are we looking at in assessing quality?
Others that may be explored
• Balance across sources of content (does it match overall ratio?)
• Ratings of individual results
• Web domain of content (following an internet expectation that “some sources are
better than others”)
• Match of terms could be altered to consider synonyms
• Examining taxonomy values
‒ Could apply matching to taxonomy values?
‒ Could be a “bonus” to items that have taxonomy?
• May want to make weights (e.g., impact of age) consider source or class of
content
• Currently, in our search engine, best bets are automatically included.
‒ Would prefer to have them not included to see where they end up organically.
• Also, in our search engine, the exact order on a page has not been replicated so
we can’t include the exact order as a factor
33
34. Validating the approach
Does this reflect how a human user would perceive the quality?
• This idea seems reasonable, but do we really have a way to
determine if it is valid
‒Or, do we run the risk that this would lead to “local maximums” for
the factors measured but not meaningfully improve the user’s
experience?
• So far, I have 2 independent ways to assess this
‒Comparing the results of this against a human assessment
‒Comparing the results of this against other factors that have been
used as indicators of quality in the past
34
35. Validating the approach, p2
Comparing against a human assessment
• One of our on-going operations in GCKM is to review the quality of
results for a very small number of terms
‒The below takes the output of the most recent of this for our a
subset of our “super search terms” and compares it against the
programmatically calculated quality
‒There is at least a correlation
0.8
between the automated score y = 0.2781x + 0.3826
0.7
R² = 0.5803
(the Y axis) and the manual 0.6
score (the X axis) Automated Score
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1 1.2
Manual Score
35
36. Validating the approach, p3
Comparing against searches/term
• Within our search program, we use the ratio of searches per visit
for a term as an indicator of the quality the results
‒The more pages of results a user looks at for a term, indicates
that it’s harder for the user to find what they are looking for
‒The following chart displays a comparison between searches/visit
(X-axis) and the automated quality score (Y-axis)
‒Again, we can see that there 80
is a correlation, though perhaps y = -0.6857x + 55.234
R² = 0.5225
70
not as strong as 60
50
compared to the manual 40
review 30
20
10
0
50 40 30 20 10 0
Quality Linear (Quality)
36
37. Validating the approach, p4
Summing up
• At this point, I am confident that the quality assessment we are
producing automatically is reflecting the user’s general experience.
‒On individual items, it can vary significantly but in aggregate it
appears to be valid
‒I have not yet dug into this but the automation enables the
weights of each factor to be adjusted and it’s possible that we can
get the automated score closer still to the “real” quality of results
through adjusting weights
37
38. Additional benefits of this tool
Better analysis
• Given that this utility can output data in a spreadsheet format, this
presents some other capabilities
‒Estimate total “search impressions” for specific targets
• Analyze “search impressions” vs. usage
‒Analyze spread of returned results across sources
‒Analyze quality along a variety of dimensions (source,
taxonomy values, etc.)
‒Comparing results sets between terms that should show
similar results
• E.g., how similar are the results really for two synonyms?
‒Also, comparing result sets along a temporal dimension
• How much change is there from one month (week) to the next?
‒Analyzing factors by depth into the “long tail”
‒Evaluating the quality of results for auto-complete terms
38
39. Quality of results split by taxonomy on the content
Better analysis - examples
• Quality of results averaged over the service area assigned to
content
Quality by Service Area of content
38.0
37.5
37.0
36.5
Overall Avg
36.0
35.5
35.0
34.5
34.0
33.5
33.0
Enterprise Human Capital Outsourcing Strategy & Technology
Applications (Consulting) Operations Integration
39
40. Quality of results by depth into the “long tail”
Better analysis - examples
• A chart of the quality of the result pages by how far into the long
tail a search term is
Quality by Depth into the "long tail"
60.0
50.0
40.0
30.0
20.0 y = 55.685x-0.14
R² = 0.5253
10.0
0.0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
8000
8500
9000
9500
10000
10500
11000
11500
12000
12500
13000
13500
14000
14500
15000
15500
16000
16500
17000
17500
40
41. Quality over time – comparing before and after an upgrade
Better analysis - examples
• This chart shows the # of terms by their change in quality through
an upgrade of our search engine – overall change was +2%!
Change in Quality through an upgrade
450
400
Worse Better
350
300
250
200
150
100
50
0
11%
13%
15%
17%
19%
21%
23%
25%
27%
29%
31%
33%
35%
37%
39%
41%
44%
47%
49%
51%
54%
56%
59%
66%
81%
-9%
-7%
-5%
-3%
-1%
1%
3%
5%
7%
9%
-46%
-39%
-34%
-31%
-29%
-27%
-25%
-23%
-21%
-19%
-17%
-15%
-13%
-11%
41
42. And, finally
For more about search analytics, I highly would recommend:
• “Search Analytics for your Site” by Lou Rosenfeld
• www.searchtools.com – edited by Avi Rappoport
Also, you can find my own writings on search analytics (along with a
variety of other KM topics) on my blog:
• blog.leeromero.org
42