SlideShare a Scribd company logo
FeatureRank
-
An Insight Data Science Consulting Project
Kuhan Wang
October 8th, 2015
1 / 11
Consulting Scenario
Company X wishes to maximize user engagement through
optimal placement of advertisements on content URLs.
Ad Type: Tourism
Keyword: Cuba
Keyword:
Package Tour
Keyword: Airplane
Ad Type X
Keyword 1
Keyword 2
Keyword 3
Keyword N
.
.
.
Example: Tourism ads not ideal on investment content URL.
2 / 11
A Pipeline to Analyze Textual Features
Developed and implemented a pipeline to analyze
importance of textual feature on content URLs relative to
engagement.
Scrape
URL
Process
Text
Model
Features
Extract
Keywords
Update
Keywords
Collect Data, Reiterate
Begin
3 / 11
User Engagement Data
Occurrences
Counts
Summary of Engagement Data
Page Loaded
Ad Viewed
Ad Clicked
Summary of Engagement Data
4 / 11
Modeling
Attempted linear regression.
Classify engagement as yes/no.
- Features are bags of words from content URL.
Word Count
0 1 2 3 4 5 6 7 8 9 10
Probability[%]
0
0.2
0.4
0.6
0.8
1
Logistic Classification Model
Ad Clicked
Ad Not Clicked
Logistic Classification Model
5 / 11
Validation
Randomly split data into training/test sets.
- Generate distribution of validation scores.
Precision
0.55 0.6 0.65 0.7 0.75 0.8 0.85
Recall
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
NumberofMCToys
0
5
10
15
20
25
30
Distribution of Precision vs Recall
〉Precision, Recall〈
6 / 11
Deliverables
Extracted keywords:
Rank Ad Type 1 Ad Type 2 Ad Type 3 Ad Type 4
1 debt coordinator mortgage gold
2 gift administrative home 0
3 profit minimum procurement stock
4 check minimum wage loan fund
5 balance reports trustee event
Pipeline in Python is delivered to company for
implementation.
7 / 11
About Myself
PhD Particle Physics, McGill University, researcher on the
Large Hadron Collider.
Lead the search for microscopic black holes as part of the
ATLAS Collaboration.
About project and myself at http://kuhanw.zohosites.com/.
8 / 11
Backup
Feature Frequency/Documents
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
RelativeNumberofDocuments[%]
4−
10
3−
10
2−
10
1−
10
1
Ad Type 1Ad Type 1
9 / 11
10 / 11
FeatureRank
Kuhan Wang1
1. Insight Data Science
October 2, 2015
Abstract
FeatureRank is a software tool for extracting correlations between text
ngram features and user engagement, thereby optimizing the placement
of financial widgets on URL articles.
1 Directory Structure
• /
processing.py
Pre-processing to parse relevant information from engagement csv files.
crawl.py
A simple web crawler that pulls the title and < p > tag text from URLs.
FeatureRank.py
Driver file to execute main functions.
feature_extraction_model.py
The core program that contains the machine learning algorithms.
post_processing.py
Post processing to produce evaluation metrics and ngram rankings. 11 / 11

More Related Content

What's hot

Google indexing
Google indexingGoogle indexing
Google indexing
tahoor71
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
ajaybabu1314
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
SWAMI06
 
Presentation 10all
Presentation 10allPresentation 10all
Presentation 10allguestaa4c059
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
Finalyear Projects
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEEFINALYEARSTUDENTPROJECTS
 
PageRank and Related Methods
PageRank and Related MethodsPageRank and Related Methods
PageRank and Related Methods
John Breslin
 
Linked Pasts IV - Linking Syriac Geographic Data
Linked Pasts IV - Linking Syriac Geographic DataLinked Pasts IV - Linking Syriac Geographic Data
Linked Pasts IV - Linking Syriac Geographic Data
Mathias Coeckelbergs
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
iosrjce
 

What's hot (9)

Google indexing
Google indexingGoogle indexing
Google indexing
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
 
Presentation 10all
Presentation 10allPresentation 10all
Presentation 10all
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
PageRank and Related Methods
PageRank and Related MethodsPageRank and Related Methods
PageRank and Related Methods
 
Linked Pasts IV - Linking Syriac Geographic Data
Linked Pasts IV - Linking Syriac Geographic DataLinked Pasts IV - Linking Syriac Geographic Data
Linked Pasts IV - Linking Syriac Geographic Data
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
 

Similar to Insight Consulting Project

Plenary paper-2012-weideman-academic-content-web-visibility-presence
Plenary paper-2012-weideman-academic-content-web-visibility-presencePlenary paper-2012-weideman-academic-content-web-visibility-presence
Plenary paper-2012-weideman-academic-content-web-visibility-presence
Cape Peninsula University of Technology
 
Introduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ AnelenIntroduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ Anelen
Daigo Tanaka, Ph.D.
 
One Stop Recommendation
One Stop RecommendationOne Stop Recommendation
One Stop Recommendation
IRJET Journal
 
One Stop Recommendation
One Stop RecommendationOne Stop Recommendation
One Stop Recommendation
IRJET Journal
 
Macran
MacranMacran
Macran
Pradip Rahul
 
How to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO StrategyHow to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO Strategy
Search Engine Journal
 
Calculating Rank of Web Documents Using Its Content and Link Analysis
Calculating Rank of Web Documents Using Its Content and Link AnalysisCalculating Rank of Web Documents Using Its Content and Link Analysis
Calculating Rank of Web Documents Using Its Content and Link Analysis
IRJET Journal
 
Phishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoostPhishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoost
IRJET Journal
 
Software Project Management Slide
Software Project Management SlideSoftware Project Management Slide
Software Project Management Slide
Ting Yin
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET Journal
 
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET Journal
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Marianne Sweeny
 
Text Mining of VOOT Application Reviews on Google Play Store
Text Mining of VOOT Application Reviews on Google Play StoreText Mining of VOOT Application Reviews on Google Play Store
Text Mining of VOOT Application Reviews on Google Play Store
IRJET Journal
 
Akshay_salvi_Resume (1)
Akshay_salvi_Resume (1)Akshay_salvi_Resume (1)
Akshay_salvi_Resume (1)Akshay Salvi
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Semantic Web Company
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
csandit
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
Jan-Willem Bobbink - Freelance SEO Consultant
 

Similar to Insight Consulting Project (20)

Plenary paper-2012-weideman-academic-content-web-visibility-presence
Plenary paper-2012-weideman-academic-content-web-visibility-presencePlenary paper-2012-weideman-academic-content-web-visibility-presence
Plenary paper-2012-weideman-academic-content-web-visibility-presence
 
Introduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ AnelenIntroduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ Anelen
 
One Stop Recommendation
One Stop RecommendationOne Stop Recommendation
One Stop Recommendation
 
One Stop Recommendation
One Stop RecommendationOne Stop Recommendation
One Stop Recommendation
 
Macran
MacranMacran
Macran
 
How to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO StrategyHow to Scale and Grow your Enterprise Technical SEO Strategy
How to Scale and Grow your Enterprise Technical SEO Strategy
 
Calculating Rank of Web Documents Using Its Content and Link Analysis
Calculating Rank of Web Documents Using Its Content and Link AnalysisCalculating Rank of Web Documents Using Its Content and Link Analysis
Calculating Rank of Web Documents Using Its Content and Link Analysis
 
SiddharthaSharma_Resume
SiddharthaSharma_ResumeSiddharthaSharma_Resume
SiddharthaSharma_Resume
 
Phishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoostPhishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoost
 
Software Project Management Slide
Software Project Management SlideSoftware Project Management Slide
Software Project Management Slide
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
 
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
 
Text Mining of VOOT Application Reviews on Google Play Store
Text Mining of VOOT Application Reviews on Google Play StoreText Mining of VOOT Application Reviews on Google Play Store
Text Mining of VOOT Application Reviews on Google Play Store
 
Akshay_salvi_Resume (1)
Akshay_salvi_Resume (1)Akshay_salvi_Resume (1)
Akshay_salvi_Resume (1)
 
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
 
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENTTOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
3_5
3_53_5
3_5
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Insight Consulting Project

  • 1. FeatureRank - An Insight Data Science Consulting Project Kuhan Wang October 8th, 2015 1 / 11
  • 2. Consulting Scenario Company X wishes to maximize user engagement through optimal placement of advertisements on content URLs. Ad Type: Tourism Keyword: Cuba Keyword: Package Tour Keyword: Airplane Ad Type X Keyword 1 Keyword 2 Keyword 3 Keyword N . . . Example: Tourism ads not ideal on investment content URL. 2 / 11
  • 3. A Pipeline to Analyze Textual Features Developed and implemented a pipeline to analyze importance of textual feature on content URLs relative to engagement. Scrape URL Process Text Model Features Extract Keywords Update Keywords Collect Data, Reiterate Begin 3 / 11
  • 4. User Engagement Data Occurrences Counts Summary of Engagement Data Page Loaded Ad Viewed Ad Clicked Summary of Engagement Data 4 / 11
  • 5. Modeling Attempted linear regression. Classify engagement as yes/no. - Features are bags of words from content URL. Word Count 0 1 2 3 4 5 6 7 8 9 10 Probability[%] 0 0.2 0.4 0.6 0.8 1 Logistic Classification Model Ad Clicked Ad Not Clicked Logistic Classification Model 5 / 11
  • 6. Validation Randomly split data into training/test sets. - Generate distribution of validation scores. Precision 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Recall 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 NumberofMCToys 0 5 10 15 20 25 30 Distribution of Precision vs Recall 〉Precision, Recall〈 6 / 11
  • 7. Deliverables Extracted keywords: Rank Ad Type 1 Ad Type 2 Ad Type 3 Ad Type 4 1 debt coordinator mortgage gold 2 gift administrative home 0 3 profit minimum procurement stock 4 check minimum wage loan fund 5 balance reports trustee event Pipeline in Python is delivered to company for implementation. 7 / 11
  • 8. About Myself PhD Particle Physics, McGill University, researcher on the Large Hadron Collider. Lead the search for microscopic black holes as part of the ATLAS Collaboration. About project and myself at http://kuhanw.zohosites.com/. 8 / 11
  • 9. Backup Feature Frequency/Documents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RelativeNumberofDocuments[%] 4− 10 3− 10 2− 10 1− 10 1 Ad Type 1Ad Type 1 9 / 11
  • 11. FeatureRank Kuhan Wang1 1. Insight Data Science October 2, 2015 Abstract FeatureRank is a software tool for extracting correlations between text ngram features and user engagement, thereby optimizing the placement of financial widgets on URL articles. 1 Directory Structure • / processing.py Pre-processing to parse relevant information from engagement csv files. crawl.py A simple web crawler that pulls the title and < p > tag text from URLs. FeatureRank.py Driver file to execute main functions. feature_extraction_model.py The core program that contains the machine learning algorithms. post_processing.py Post processing to produce evaluation metrics and ngram rankings. 11 / 11