Detecting Online Commercial Intention (OCI) Honghua Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, Ying Li WWW’ 06...
Outline <ul><li>Introduction </li></ul><ul><li>Defining (OCI): Online Commercial Intention </li></ul><ul><li>Learning Onli...
Introduction <ul><li>Two major online user activities: </li></ul><ul><ul><li>Browsing activity </li></ul></ul><ul><ul><li>...
Introduction  cont. <ul><li>OCI (Online Commercial Intention) : understanding whether a user has intention to purchase or ...
Defining OCI (Online Commercial Intention) <ul><li>Defining OCI to be a  function  from a  query  or a  Web page  to a bin...
Learning Online Commercial Intention <ul><li>Taxonomy-based </li></ul><ul><ul><li>Using existing concept hierarchies or ca...
Web Page OCI Detector <ul><li>Input: a Web Page  P </li></ul><ul><li>Output: OCI (commercial or non-commercial) of  P   </...
Keyword Extraction and Selection <ul><li>Keyword extraction: both inner text and tag attributes of all the training data. ...
Keyword Extraction and Selection  cont. <ul><li>Define two aspects of properties for each keyword  k  in a page  p :   </l...
Query OCI Detector <ul><li>Four type of data sources for query OCI: </li></ul><ul><ul><li>Constituent terms of search quer...
Detecting OCI based on Top Search Result Landing Pages <ul><li>Using top-10 result pages generated by MSN. </li></ul><ul><...
Detecting OCI based on Top Search Result Landing Pages  cont.
Detecting OCI based on First Search Result Page
Experiments <ul><li>Data </li></ul><ul><ul><li>1408  US English queries. </li></ul></ul><ul><ul><li>Collect the first sear...
Evaluation Methodology <ul><li>For Web OCI detector, due to  unbalanced problem , they selected all commercial pages and t...
Evaluating Page OCI Detector <ul><li>CP (Precision), CR (Recall), CF (F-measure) </li></ul>
Evaluating Page OCI Detector  cont.
Evaluating Query OCI Detector
OCI Analysis for a Stratified Query Sample based on Query Frequency <ul><li>Divided query frequency into 5: Single, Very l...
Conclusion <ul><li>They present the framework of building machine learning models to learn OCI (queries and Web pages) bas...
Upcoming SlideShare
Loading in …5
×

Detecting Online Commercial Intention (OCI)

1,787 views

Published on

Published in: Technology, Business
2 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,787
On SlideShare
0
From Embeds
0
Number of Embeds
63
Actions
Shares
0
Downloads
12
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide

Detecting Online Commercial Intention (OCI)

  1. 1. Detecting Online Commercial Intention (OCI) Honghua Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, Ying Li WWW’ 06 Advisor: Chia-Hui Chang Student: Teng-Kai Fan Date: 2009-08-24
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Defining (OCI): Online Commercial Intention </li></ul><ul><li>Learning Online Commercial Intention </li></ul><ul><ul><li>Web Page OCI Detector </li></ul></ul><ul><ul><li>Query OCI Detector </li></ul></ul><ul><li>Experiment </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>Two major online user activities: </li></ul><ul><ul><li>Browsing activity </li></ul></ul><ul><ul><li>Searching activity </li></ul></ul><ul><li>Three categories for user’s search intention: </li></ul><ul><ul><li>Navigational: reach to a particular web site. </li></ul></ul><ul><ul><li>Informational: acquire information on web pages. </li></ul></ul><ul><ul><li>Transactional: perform some “web-mediated” activity. </li></ul></ul>
  4. 4. Introduction cont. <ul><li>OCI (Online Commercial Intention) : understanding whether a user has intention to purchase or participate in commercial service. </li></ul>
  5. 5. Defining OCI (Online Commercial Intention) <ul><li>Defining OCI to be a function from a query or a Web page to a binary value: Commercial or Non-Commercial. </li></ul><ul><li>The goal is to compute two functions </li></ul><ul><ul><li>OCI : Q -> {Commercial, Non-Commercial} </li></ul></ul><ul><ul><li>OCI : P -> {Commercial, Non-Commercial} </li></ul></ul>
  6. 6. Learning Online Commercial Intention <ul><li>Taxonomy-based </li></ul><ul><ul><li>Using existing concept hierarchies or categories. </li></ul></ul><ul><li>Machine learning approach </li></ul><ul><ul><li>Extracting features from page content and building the classifiers based on those features. </li></ul></ul><ul><ul><li>Labeling Process: Human-evaluation approach. </li></ul></ul>
  7. 7. Web Page OCI Detector <ul><li>Input: a Web Page P </li></ul><ul><li>Output: OCI (commercial or non-commercial) of P </li></ul>SVM
  8. 8. Keyword Extraction and Selection <ul><li>Keyword extraction: both inner text and tag attributes of all the training data. </li></ul><ul><li>Feature selection: </li></ul><ul><ul><ul><li>Pr( k | C ): the probability of the keyword k occurring in a Web page belonging to class C. </li></ul></ul></ul>
  9. 9. Keyword Extraction and Selection cont. <ul><li>Define two aspects of properties for each keyword k in a page p : </li></ul><ul><li>For a page p with n keywords can be represented in 2*n dimensions: </li></ul>
  10. 10. Query OCI Detector <ul><li>Four type of data sources for query OCI: </li></ul><ul><ul><li>Constituent terms of search query. </li></ul></ul><ul><ul><ul><li>Ex.: “airline ticket deals ”, “digital camera price ”. </li></ul></ul></ul><ul><ul><li>Content of top landing pages recommended by search engine. </li></ul></ul><ul><ul><li>Content of search result page. </li></ul></ul><ul><ul><ul><li>Including title, short descriptions, and URL links. </li></ul></ul></ul><ul><ul><li>The number of user clicks of landing pages recommended by search engine. </li></ul></ul>
  11. 11. Detecting OCI based on Top Search Result Landing Pages <ul><li>Using top-10 result pages generated by MSN. </li></ul><ul><li>Using the Web page OCI detector to detect the OCI of top 10 landing pages. </li></ul>
  12. 12. Detecting OCI based on Top Search Result Landing Pages cont.
  13. 13. Detecting OCI based on First Search Result Page
  14. 14. Experiments <ul><li>Data </li></ul><ul><ul><li>1408 US English queries. </li></ul></ul><ul><ul><li>Collect the first search result page for 1408 queries. </li></ul></ul><ul><ul><li>Collect the top 10 landing pages for 1408 queries </li></ul></ul><ul><ul><li>Randomly pick 26186 English Web pages. </li></ul></ul><ul><li>Labeling Analysis </li></ul>
  15. 15. Evaluation Methodology <ul><li>For Web OCI detector, due to unbalanced problem , they selected all commercial pages and the equals number of non-commercial to train model. </li></ul><ul><li>For query OCI detector: </li></ul><ul><ul><li>Compare the model based on first search result page and top N result landing pages . </li></ul></ul><ul><ul><li>Using 3-fold cross validation. </li></ul></ul><ul><li>Measures: Precision, Recall and F-Measure </li></ul>
  16. 16. Evaluating Page OCI Detector <ul><li>CP (Precision), CR (Recall), CF (F-measure) </li></ul>
  17. 17. Evaluating Page OCI Detector cont.
  18. 18. Evaluating Query OCI Detector
  19. 19. OCI Analysis for a Stratified Query Sample based on Query Frequency <ul><li>Divided query frequency into 5: Single, Very low, Low, Mid, and High. </li></ul><ul><li>Randomly select 10000 queries for each level. </li></ul><ul><li>Observation: Query set with high frequency have larger portion of queries with commercial intention. </li></ul>
  20. 20. Conclusion <ul><li>They present the framework of building machine learning models to learn OCI (queries and Web pages) based on any web page content. </li></ul>

×