SlideShare a Scribd company logo
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Applying NLP & Machine Learning to Keyword Analysis
—
Dan Segal
Corporate Taxonomist, IBM
2Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
• Keyword governance
• Design of smart content platforms; e.g.,
• Automated tagging
• Assisted authoring
• Chatbots
• SEO platform migration, aka “The Big Sort”
• Over 300K keywords across 10 business units
• Out-of-date classifications
• In need of cleanup and pruning
Business Drivers
3Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
• Not a lot of text for ML to process
• Without context, keywords are just
strings
• Strings can be ambiguous:
• cloud
• hybrid
• paas
• storage
• Looking for semantic similarity, not just
syntactic
Challenges of Keyword Classification
4Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Ranking URLs put Keywords into Context
• Page rank suggests
user intent
• Page presents the
keyword in context
• For text analysis,
the page is a proxy
for the keyword
5Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Select Keywords
Refine training data to improve
performance
Classify Keywords with
Classifier API
• Goal: repeatable, scalable automated
classification
Solution Approach / Workflow
Train Classifier API
Keyword/Concept pairs
• Supervised learning using keyword / Concept
training sets
Use NLP API to scrape SERPs
and identify Concepts
• Pre-trained NLP for first-pass Concept
identification
• Grouping of co-occurring keywords by Concepts
Normalize Concepts to
taxonomy
• Concept cleanup and normalization
6Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Concept Identification Use NLP API to scrape SERPs
and identify Concepts
• Compile list of keywords with top 10 SERPs for each
• Pass URLs to NLP API
• API returns a list of high-level Concepts for each URL
• Concepts are derived from pre-trained model
• High-throughput: 60-70 URLs / minute
7Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Concept Cleanup and Normalization Normalize Concepts to
taxonomy
• Identify groups of keywords that
cluster around Concepts
• Reduce noise by filtering for:
• Relevance score
• Query volume
• Number of unique keywords per group
• Number of occurrences per keyword
• Duplicate and excluded Concepts
• Outputs:
• Training sets for ML classifier
• Seed Concepts for taxonomy
8
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Weeding Out the “Laughers”
Retrieved Concept:
X-Force
http://dbpedia.org/resource/X-Force
Keyword:
x-force
ranking URLs
8
Trained Classifier API
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Keyword Classification Classify Keywords with
Classifier API
9
INPUT: keyword list (CSV) CLASSIFIED KEYWORDS (CSV, RDF)
post-processing
OUTPUT: keyword classes (JSON)
Keywords
Keyword Group
(from Topic taxonomy)
Primary Brand
(from Product taxonomy)
Connecting the Parts
10Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Business Unit
(from Product taxonomy)
Incorporating Keywords into the Ontology
11Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Keywords
Branded
keywords
Unbranded
keywords
keyword for
keyword for
keyword for
Business
Units
Products
markets
product
Topics
Industries
Job Role
for industry
solution for
has need
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Execution
• Pre-trained models
enable first-pass, high-
throughput annotation
and sorting of hundreds of
thousands of keywords
and their SERPs
• For optimum results, the
process requires
customization,
parametrization, and
human review.
Solution
• Off-the-shelf API’s
provide rapid access to
NLP and ML tooling with
low technical debt.
• Keyword analysis time
reduced from months to
weeks.
Architecture
• A robust taxonomy is
essential for
consolidating API-
generated categories and
integrating keywords into
the ontology.
Next Steps
• Custom models for
domain-specific
annotation
• Applying the ontology;
e.g., site navigation,
search, autotagging, AI-
assisted authoring
Lessons Learned
12
Thank you
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Dan Segal
Corporate Taxonomist
—
ibm.com
13
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Backup slides
15
Toolset
16Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
Tool Use Examples
Natural language processing (NLP) • Text extraction
• Concept identification
• Watson NLU (API)
Machine learning • Text classification
• Keyword clustering
• Watson NLC (API)
Data transformation • Data cleanup
• Data conversion; e.g., ML outputs
(JSON) to CSV and RDF
• OpenRefine
Ontology management • Domain modeling
• Keyword / Group associations
• TopBraid Composer
• TopBraid EVN
Sample scripts and data available at: https://github.com/dan-segal/keyword-classification/

More Related Content

What's hot

How To Drive Product Page Rankings With A Funnel Of Content And Links
How To Drive Product Page Rankings With A Funnel Of Content And LinksHow To Drive Product Page Rankings With A Funnel Of Content And Links
How To Drive Product Page Rankings With A Funnel Of Content And Links
Search Engine Journal
 
Hannah Rogers - Facing Your SEO Fears: Forecasting
Hannah Rogers - Facing Your SEO Fears: ForecastingHannah Rogers - Facing Your SEO Fears: Forecasting
Hannah Rogers - Facing Your SEO Fears: Forecasting
HannahRogers52
 
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022 SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SophieBrannon
 
Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...
Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...
Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...
Aleyda Solís
 
Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
Dawn Anderson MSc DigM
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowBrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
Will Critchlow
 
SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...
SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...
SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...
Aleyda Solís
 
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdfEcommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Himani Kankaria
 
[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs
Areej AbuAli
 
BrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdf
BrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdfBrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdf
BrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdf
Natalia Witczyk
 
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptxBeth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
BethBarnham1
 
AI Restart 2023: Daniel Nytra - AI + e-shop + e-mailing
AI Restart 2023: Daniel Nytra - AI + e-shop + e-mailingAI Restart 2023: Daniel Nytra - AI + e-shop + e-mailing
AI Restart 2023: Daniel Nytra - AI + e-shop + e-mailing
Taste
 
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise WebsitesBrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
Manick Bhan
 
SEO Of Tomorrow_ The Rise Of Automation.pdf
SEO Of Tomorrow_ The Rise Of Automation.pdfSEO Of Tomorrow_ The Rise Of Automation.pdf
SEO Of Tomorrow_ The Rise Of Automation.pdf
Tom Pool
 
Entities in SEO
Entities in SEOEntities in SEO
Entities in SEO
Dixon Jones
 
Seo strategy guide 2019
Seo strategy guide 2019Seo strategy guide 2019
Seo strategy guide 2019
Sanjay Patwal
 
Seo plan
Seo planSeo plan
Data-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsData-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costs
adlift
 
kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )
kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )
kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )
Kevin Indig
 
SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022
Nitin Manchanda
 

What's hot (20)

How To Drive Product Page Rankings With A Funnel Of Content And Links
How To Drive Product Page Rankings With A Funnel Of Content And LinksHow To Drive Product Page Rankings With A Funnel Of Content And Links
How To Drive Product Page Rankings With A Funnel Of Content And Links
 
Hannah Rogers - Facing Your SEO Fears: Forecasting
Hannah Rogers - Facing Your SEO Fears: ForecastingHannah Rogers - Facing Your SEO Fears: Forecasting
Hannah Rogers - Facing Your SEO Fears: Forecasting
 
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022 SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
SEO Strategy: Where The F**K Do I Even Start? - Brighton SEO April 2022
 
Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...
Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...
Identifying Top Converting Queries at Every Stage of the Customer Journey #SM...
 
Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
 
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will CritchlowBrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
BrightonSEO: How to generate 8 million SEO test ideas - Will Critchlow
 
SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...
SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...
SEO Reporting to Impress: How to Successfully Report your SEO Efforts & Resul...
 
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdfEcommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
Ecommerce SEO - How to Prepare for 2023 - Himani Kankaria.pdf
 
[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs[LondonSEO 2020] BigQuery & SQL for SEOs
[LondonSEO 2020] BigQuery & SQL for SEOs
 
BrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdf
BrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdfBrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdf
BrightonSEO 2023 - Introduction to Search Engines Beyond Google - N Witczyk.pdf
 
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptxBeth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
 
AI Restart 2023: Daniel Nytra - AI + e-shop + e-mailing
AI Restart 2023: Daniel Nytra - AI + e-shop + e-mailingAI Restart 2023: Daniel Nytra - AI + e-shop + e-mailing
AI Restart 2023: Daniel Nytra - AI + e-shop + e-mailing
 
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise WebsitesBrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
BrightonSEO - Master Crawl Budget Optimization for Enterprise Websites
 
SEO Of Tomorrow_ The Rise Of Automation.pdf
SEO Of Tomorrow_ The Rise Of Automation.pdfSEO Of Tomorrow_ The Rise Of Automation.pdf
SEO Of Tomorrow_ The Rise Of Automation.pdf
 
Entities in SEO
Entities in SEOEntities in SEO
Entities in SEO
 
Seo strategy guide 2019
Seo strategy guide 2019Seo strategy guide 2019
Seo strategy guide 2019
 
Seo plan
Seo planSeo plan
Seo plan
 
Data-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsData-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costs
 
kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )
kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )
kevin Indig - Internal Link Building on Steroids (Tech SEO Boost )
 
SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022SEO at Scale - BrightonSEO April 2022
SEO at Scale - BrightonSEO April 2022
 

Similar to Applying NLP and Machine Learning to Keyword Analysis

Natural language Processing for Smart contracts in Blockchain
Natural language Processing for Smart contracts in BlockchainNatural language Processing for Smart contracts in Blockchain
Natural language Processing for Smart contracts in Blockchain
Capgemini
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
Nick Pentreath
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Amazon Web Services
 
Machine Learning at Hand with Power BI
Machine Learning at Hand with Power BIMachine Learning at Hand with Power BI
Machine Learning at Hand with Power BI
Ivo Andreev
 
APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...
APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...
APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...
apidays
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
IXIASOFT
 
Semantic SharePoint
Semantic SharePointSemantic SharePoint
Semantic SharePoint
Semantic Web Company
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
DataWorks Summit
 
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
IXIASOFT
 
Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...
Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...
Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...
Information Development World
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
Maruti Gollapudi
 
Non-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFNon-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SF
Amazon Web Services
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
Amazon Web Services
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Nick Pentreath
 
Product Development in the Cloud
Product Development in the Cloud Product Development in the Cloud
Product Development in the Cloud
Amazon Web Services
 
Voice Powered Analytics
Voice Powered AnalyticsVoice Powered Analytics
Voice Powered Analytics
Amazon Web Services
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
Karin Patenge
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
Ivo Andreev
 
ENT206 Product Development in the Cloud
ENT206 Product Development in the CloudENT206 Product Development in the Cloud
ENT206 Product Development in the Cloud
Amazon Web Services
 

Similar to Applying NLP and Machine Learning to Keyword Analysis (20)

Natural language Processing for Smart contracts in Blockchain
Natural language Processing for Smart contracts in BlockchainNatural language Processing for Smart contracts in Blockchain
Natural language Processing for Smart contracts in Blockchain
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
 
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
Machine Learning at Hand with Power BI
Machine Learning at Hand with Power BIMachine Learning at Hand with Power BI
Machine Learning at Hand with Power BI
 
APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...
APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...
APIdays Paris 2018 - Anatomy of an API Transformation Journey Ali Bouhouch, C...
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
 
Semantic SharePoint
Semantic SharePointSemantic SharePoint
Semantic SharePoint
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
 
Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...
Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...
Data-Driven to Know We Have Effective Content with Jenifer Schlotfeldt and Co...
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Non-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFNon-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SF
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for Analytics
 
Product Development in the Cloud
Product Development in the Cloud Product Development in the Cloud
Product Development in the Cloud
 
Voice Powered Analytics
Voice Powered AnalyticsVoice Powered Analytics
Voice Powered Analytics
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
 
ENT206 Product Development in the Cloud
ENT206 Product Development in the CloudENT206 Product Development in the Cloud
ENT206 Product Development in the Cloud
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

Applying NLP and Machine Learning to Keyword Analysis

  • 1. Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Applying NLP & Machine Learning to Keyword Analysis — Dan Segal Corporate Taxonomist, IBM
  • 2. 2Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation • Keyword governance • Design of smart content platforms; e.g., • Automated tagging • Assisted authoring • Chatbots • SEO platform migration, aka “The Big Sort” • Over 300K keywords across 10 business units • Out-of-date classifications • In need of cleanup and pruning Business Drivers
  • 3. 3Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation • Not a lot of text for ML to process • Without context, keywords are just strings • Strings can be ambiguous: • cloud • hybrid • paas • storage • Looking for semantic similarity, not just syntactic Challenges of Keyword Classification
  • 4. 4Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Ranking URLs put Keywords into Context • Page rank suggests user intent • Page presents the keyword in context • For text analysis, the page is a proxy for the keyword
  • 5. 5Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Select Keywords Refine training data to improve performance Classify Keywords with Classifier API • Goal: repeatable, scalable automated classification Solution Approach / Workflow Train Classifier API Keyword/Concept pairs • Supervised learning using keyword / Concept training sets Use NLP API to scrape SERPs and identify Concepts • Pre-trained NLP for first-pass Concept identification • Grouping of co-occurring keywords by Concepts Normalize Concepts to taxonomy • Concept cleanup and normalization
  • 6. 6Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Concept Identification Use NLP API to scrape SERPs and identify Concepts • Compile list of keywords with top 10 SERPs for each • Pass URLs to NLP API • API returns a list of high-level Concepts for each URL • Concepts are derived from pre-trained model • High-throughput: 60-70 URLs / minute
  • 7. 7Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Concept Cleanup and Normalization Normalize Concepts to taxonomy • Identify groups of keywords that cluster around Concepts • Reduce noise by filtering for: • Relevance score • Query volume • Number of unique keywords per group • Number of occurrences per keyword • Duplicate and excluded Concepts • Outputs: • Training sets for ML classifier • Seed Concepts for taxonomy
  • 8. 8 Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Weeding Out the “Laughers” Retrieved Concept: X-Force http://dbpedia.org/resource/X-Force Keyword: x-force ranking URLs 8
  • 9. Trained Classifier API Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Keyword Classification Classify Keywords with Classifier API 9 INPUT: keyword list (CSV) CLASSIFIED KEYWORDS (CSV, RDF) post-processing OUTPUT: keyword classes (JSON)
  • 10. Keywords Keyword Group (from Topic taxonomy) Primary Brand (from Product taxonomy) Connecting the Parts 10Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Business Unit (from Product taxonomy)
  • 11. Incorporating Keywords into the Ontology 11Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Keywords Branded keywords Unbranded keywords keyword for keyword for keyword for Business Units Products markets product Topics Industries Job Role for industry solution for has need
  • 12. Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Execution • Pre-trained models enable first-pass, high- throughput annotation and sorting of hundreds of thousands of keywords and their SERPs • For optimum results, the process requires customization, parametrization, and human review. Solution • Off-the-shelf API’s provide rapid access to NLP and ML tooling with low technical debt. • Keyword analysis time reduced from months to weeks. Architecture • A robust taxonomy is essential for consolidating API- generated categories and integrating keywords into the ontology. Next Steps • Custom models for domain-specific annotation • Applying the ontology; e.g., site navigation, search, autotagging, AI- assisted authoring Lessons Learned 12
  • 13. Thank you Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Dan Segal Corporate Taxonomist — ibm.com 13
  • 14. Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation
  • 15. Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Backup slides 15
  • 16. Toolset 16Text Analytics Forum / November 7, 2018 / © 2018 IBM Corporation Tool Use Examples Natural language processing (NLP) • Text extraction • Concept identification • Watson NLU (API) Machine learning • Text classification • Keyword clustering • Watson NLC (API) Data transformation • Data cleanup • Data conversion; e.g., ML outputs (JSON) to CSV and RDF • OpenRefine Ontology management • Domain modeling • Keyword / Group associations • TopBraid Composer • TopBraid EVN Sample scripts and data available at: https://github.com/dan-segal/keyword-classification/