The geographical life of search

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    The geographical life of search - Presentation Transcript

    1. The Geographical Life of Search Ricardo Baeza-Yates(1), Christian Middleton(2), Carlos Castillo(1) (1) Yahoo! Research; Barcelona, Spain (2) Universitat Pompeu Fabra; Barcelona, Spain
    2. Contents  Introduction  Experimental Framework  Generic Top-Level Domains (gTLD)  Country Top-Level Domains (ccTLD)  Internal Search Traffic of a Country  Traffic Among Continents and Countries  Geographic Dynamics  Conclusions
    3. Introduction  Three-fold goals a) Understand how search engines are used from a geographic point of view b) Search process implies a social network, and geographical user behavior gives information of this network and how the Web reflects society c) Search traffic among countries is interesting for the development of distributed search engines
    4. Introduction  Main objective “Describe how users behave, based on their location and clicked URL”  Explainfrom where a user need comes and where it is resolved  Implies traffic of information and transactions
    5. Experimental Framework • Top-Level Domains (TLD) – Generic (gTLD) -- .org, .com, .edu, … – Country Code (ccTLD) -- .cl, .es, .fr, … • Query log – 840M clicked queries from Yahoo! log (2008) – Eliminated 0.01% of inter-country traffic – 593K unique gTLD hosts – 166K unique ccTLD hosts
    6. Generic TLD (gTLD)  Traffic contribution to .com  Most countries have 65% traffic to .com (least is >45%)  Some countries have significant servers on .com: ◦ Canada, Spain, Great Britain
    7. Generic TLD (gTLD) Traffic to other gTLD
    8. Generic TLD (gTLD)  Which countries host gTLDs?  US is Top-1 on all domains  .gov & .edu are highly concentrated (use of secondary level)  .biz, .net, .mil are geographically spread
    9. Server Location – Vanity TLDs  Vanity Score  Hostingprobability ◦ Non-uniform ◦ Median 0.66
    10. Internal search traffic of country “Visits to pages in which both the searcher and the clicked page are in the same country.”  Ratio of internal destination:  Ratio of traffic from internal sources:
    11. Internal search traffic of country  Sites on domain only  People only interested on relevant to people from that sites from their country. country.
    12. Traffic to ccTLD ◦ At least 4% of outgoing traffic ◦ Omitted 60 countries with internal traffic ◦ Significance of .uk and .ru
    13. Source & destination Entropy  Sources entropy  Destination entropy
    14. Intra-country & Incoming Entropy  Inverse correlation (r = -0.772) “when ccTLD has narrow set of traffic sources, country is one of the most important sources”
    15. Traffic amongst Continents  .uk visited from Europe, Oceania, and Africa  Diversified traffic in South America  .es sites highly influent in South America
    16. Traffic Similarity  Linguistic/geograp hic characteristics: a) Indo-European b) French c) Dutch d) Swazi e) Tetum f) English
    17. Traffic Similarity  Analyzed 24 demographic features  Laplacian Eigenmaps over country similarity graph ◦ Feature x such that minimizes:  Most relevant features: ◦ Oil consumption ◦ Latitude of center of country ◦ Human Development Index (HDI) ◦ Number of mobile phones ◦ Total area
    18. Similarity between Continents
    19. Geographic Dynamics  Compared with 2005 Yahoo! Log  Similar traffic to gTLDs, and location of servers hosting gTLDs.  In2005, European ccTLDs had more diversified sources, while in 2008 it was the Asian ccTLDs.  Demographic features: ◦ Latitude, HDI, unemployment rate, migration, and area of country. t This confirms our observation that countries sharing latitude or human development have similar traffic.
    20. Conclusions  Analyzed data from large query log to describe the way the user behaves based on their location and the URL clicked.  Findings: ◦ .com has the largest share of traffic t Most hosts are located in US ◦ Inverse correlation between incoming traffic entropy and internal referral ratio. ◦ Traffic among countries is concentrated in few domains (.uk, .ru) – linguistic & cultural ◦ At a continent level, .uk is the most common domain among continents.
    21. Conclusions  Findings: ◦ Users located in North America, Europe, Asia, and Africa visit similar set of domains. ◦ Users in South America and Oceania show different behavior. ◦ Results show that language can be more important than geography, based on search similarity. ◦ From demographics analysis, we observed that countries in similar latitudes or with similar HDI share traffic destinations.
    22. The Geographical Life of Search Ricardo Baeza-Yates(1), Christian Middleton(2), Carlos Castillo(1) (1) Yahoo! Research; Barcelona, Spain (2) Universitat Pompeu Fabra; Barcelona, Spain

    + Carlos CastilloCarlos Castillo, 1 month ago

    custom

    118 views, 0 favs, 0 embeds more stats

    Ricardo Baeza-Yates, Christian Middleton, Carlos Ca more

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 118
      • 118 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Tags