Detecting Online Commercial Intention (OCI)

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

Favorites, Groups & Events

Detecting Online Commercial Intention (OCI) - Presentation Transcript

  1. Detecting Online Commercial Intention (OCI) Honghua Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, Ying Li WWW’ 06 Advisor: Chia-Hui Chang Student: Teng-Kai Fan Date: 2009-08-24
  2. Outline
    • Introduction
    • Defining (OCI): Online Commercial Intention
    • Learning Online Commercial Intention
      • Web Page OCI Detector
      • Query OCI Detector
    • Experiment
    • Conclusion
  3. Introduction
    • Two major online user activities:
      • Browsing activity
      • Searching activity
    • Three categories for user’s search intention:
      • Navigational: reach to a particular web site.
      • Informational: acquire information on web pages.
      • Transactional: perform some “web-mediated” activity.
  4. Introduction cont.
    • OCI (Online Commercial Intention) : understanding whether a user has intention to purchase or participate in commercial service.
  5. Defining OCI (Online Commercial Intention)
    • Defining OCI to be a function from a query or a Web page to a binary value: Commercial or Non-Commercial.
    • The goal is to compute two functions
      • OCI : Q -> {Commercial, Non-Commercial}
      • OCI : P -> {Commercial, Non-Commercial}
  6. Learning Online Commercial Intention
    • Taxonomy-based
      • Using existing concept hierarchies or categories.
    • Machine learning approach
      • Extracting features from page content and building the classifiers based on those features.
      • Labeling Process: Human-evaluation approach.
  7. Web Page OCI Detector
    • Input: a Web Page P
    • Output: OCI (commercial or non-commercial) of P
    SVM
  8. Keyword Extraction and Selection
    • Keyword extraction: both inner text and tag attributes of all the training data.
    • Feature selection:
        • Pr( k | C ): the probability of the keyword k occurring in a Web page belonging to class C.
  9. Keyword Extraction and Selection cont.
    • Define two aspects of properties for each keyword k in a page p :
    • For a page p with n keywords can be represented in 2*n dimensions:
  10. Query OCI Detector
    • Four type of data sources for query OCI:
      • Constituent terms of search query.
        • Ex.: “airline ticket deals ”, “digital camera price ”.
      • Content of top landing pages recommended by search engine.
      • Content of search result page.
        • Including title, short descriptions, and URL links.
      • The number of user clicks of landing pages recommended by search engine.
  11. Detecting OCI based on Top Search Result Landing Pages
    • Using top-10 result pages generated by MSN.
    • Using the Web page OCI detector to detect the OCI of top 10 landing pages.
  12. Detecting OCI based on Top Search Result Landing Pages cont.
  13. Detecting OCI based on First Search Result Page
  14. Experiments
    • Data
      • 1408 US English queries.
      • Collect the first search result page for 1408 queries.
      • Collect the top 10 landing pages for 1408 queries
      • Randomly pick 26186 English Web pages.
    • Labeling Analysis
  15. Evaluation Methodology
    • For Web OCI detector, due to unbalanced problem , they selected all commercial pages and the equals number of non-commercial to train model.
    • For query OCI detector:
      • Compare the model based on first search result page and top N result landing pages .
      • Using 3-fold cross validation.
    • Measures: Precision, Recall and F-Measure
  16. Evaluating Page OCI Detector
    • CP (Precision), CR (Recall), CF (F-measure)
  17. Evaluating Page OCI Detector cont.
  18. Evaluating Query OCI Detector
  19. OCI Analysis for a Stratified Query Sample based on Query Frequency
    • Divided query frequency into 5: Single, Very low, Low, Mid, and High.
    • Randomly select 10000 queries for each level.
    • Observation: Query set with high frequency have larger portion of queries with commercial intention.
  20. Conclusion
    • They present the framework of building machine learning models to learn OCI (queries and Web pages) based on any web page content.

+ ceyaceya, 3 months ago

custom

246 views, 0 favs, 2 embeds more stats

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 246
    • 232 on SlideShare
    • 14 from embeds
  • Comments 1
  • Favorites 0
  • Downloads 0
Most viewed embeds
  • 13 views on http://web204seminar.blogspot.com
  • 1 views on http://203.208.39.132

more

All embeds
  • 13 views on http://web204seminar.blogspot.com
  • 1 views on http://203.208.39.132

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories