• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Profiling a Person With Search Log Data
 

Profiling a Person With Search Log Data

on

  • 1,920 views

presented at ASIST Conference 2009

presented at ASIST Conference 2009

Statistics

Views

Total Views
1,920
Views on SlideShare
1,918
Embed Views
2

Actions

Likes
0
Downloads
15
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Profiling a Person With Search Log Data Profiling a Person With Search Log Data Presentation Transcript

    • Profiling a Person With Log Data Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address] Interested in how much descriptive information we can generate about a people by leveraging search log data .
    • What Did We Find Out?
      • We can tell quite a lot!
      • The State of Web Search
    • The Power of Search and the Web
      • Search is the top online activity
      • Search drives over 7 billion monthly queries in the U.S.
      • Online activity has a huge impact on people’s daily lives:
        • 70 minutes less with family
        • 30 minutes less TV
        • 8.5 minutes less sleep
      Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
    • Analysis of Search Marketplace Holding fairly stable over the last year or so, albeit with some Bing flux
    • Search Logs
      • Contains the trace data recorded when a person visits the search engine, submits a query, views results, etc
      • On one hand, logs have been criticized for not being rich enough (i.e., only have behaviors but not the ‘why ’ factors)
      • On the other hand, logs have been criticized for recording too much about us (i.e., logging a lot of personal information about a person)
      search logs How much we can learn about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of what a person is doing, of why they are doing it, and to predict what are they going to do next?
    • An illustrative example
    • How much can we tell from a single query?
      • ASIS&T is an acronym for the American Society of Information Science and Technology
      • Good probability that this user is an academic , a researcher, a librarian, or a student in one of these disciplines
      • Leveraging demographic information :
        • 57 percent female / 43 percent male probability
        • 66.2 percent chance works in the information science field
        • 55.6 percent probability this user has master’s degree
    • How much can we tell from a single query?
      • Leveraging demographic information (cont’d):
        • 32.3 percent probability this user has a doctorate
        • 53 percent likelihood works in academia.
      • Using IP , we can locate the geographical area
      • Based on time , could infer that:
        • this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel
        • or looking for presentations or papers from the meeting (if the query is submitted after the conference).
      Theoretically, we can tell a lot ! However, with billions of queries per month, we can’t do the analysis by hand like this example. To develop user profiles, we need automated methods . Research Question - How complete of a profile can one develop for a Web search engine user from search log data? [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
    • Specific aspects with automated methods …
      • Location
      • Geographical interest
      • Topical interest
      • Topical complexity
      • Content desires
      • Commercial intent
      • Purchase intent
      • Potential to click on a link
      • Gender
      • User identification
      • – where the user is at
      • – where the user is going
      • – what the user is interested in
      • – how motivated is the user
      • – Info, Nav, Transactional
      • – eCommerce related
      • – getting ready to buy
      • – will user click on link
      • - demographic targeting/personalization
      • - specific user targeting
      • – IP look-up script
      • – query term usage
      • – tools like Open Calais
      • – n-grams pattern analysis
      • – binary tree, k-mans clustering
      • – tools like MSN adLabs
      • – session analysis
      • – time series analysis
      • - tools like MSN adLabs
      • (need a whole lot of data)
    • A comment about user identification
      • we can tell a lot about a person within a group of people with search logs (i.e., behaviors) …
      … identifying a particular individual is much more difficult with just search logs (probably takes ~12 – 18 months of data). Given a group of folks who use a search engine, …
    • User Profiling Framework
      • Classify user aspects into two levels: internal and external .
      • Internal aspects refer to attributes of the users themselves.
      • External aspects relate to the behavior or interest of the users.
      • Interaction between internal and external aspects. Can infer external aspects from internal aspects. External aspects reflect internal aspects
    • Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address]
    • Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
    • Back
    • Back