• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Clustering and Exploring Search Results using Timeline Constructions (Omar Alonso, Michael Gertz, Recardo Baeza-Yates) presented by   Anil Kumar Attuluri                       10/17/2011                 
  • 2. Outline
      • Motivation
      • Background
      • Methods and Prototype
      • Evaluation
      • Conclusion
      • Examples
  • 3. Motivation
  • 4. Temporal Information
  • 5. Temporal Information
  • 6. Survey results (using Amazon Mechanical Turk)
    • Q. Do you think current timelines for organizing and clustering search results (such as in Google's timeline) are useful for some of your daily search activities?
    • 76% answered "yes"
    • Q. Do you use timelines to explore search results?
    • 71% answered  "yes"
  • 7. Use Cases
      • History - information about a place or a person during a period of time in the past. There is no decent timeline for happenings during World War II. 
      • Research - information about a topic in a way sorted with oldest first and newest last. No timeline for volcanic activity in US is available.
      • Events - details of a soccer world cups listed on a timeline. Timeline based lists are not available.
  • 8. Background
  • 9. Hit lists and Clustering
    • Hit lists
      • It is a set of all the documents retrieved based on a search query. 
      • The documents are sorted based on their rank.
    • Clustering
      • Clustering is process where the search results (hit lists) are categorized and put into different clusters based on cluster labels.
      • Useful for providing a better exploration interface to the end user.
  • 10. TimeML
      • TimeML is a Formal Specification Language for Events and Temporal Expressions.
      • EVENT - A fresh flow of lava, gas and debris erupted there Saturday.
      • TIMEX3 - June 11, 1989 , or the Summer of 2002 . 
      • SIGNAL - They will investigate the role of the US before , during and after the genocide.
      • LINKS - John drove to Boston. During his drive he ate a donut.  
  • 11. Amazon Mechanical Turk (AMT)
      • Amazon's platform to perform Human Intelligence Tasks (HIT) by humans which cannot be completed by computers yet.
      • Requesters - who place HITs , Workers - who perform HITs.
      • An Application Programming Interface is provided for the Requesters to submit their HITs and to retrieve the results.
  • 12. Methods and Prototype
  • 13. Time Annotated Document Model
    •   Time and Timelines
      • Chronon   is an atomic time interval which is a single day. Ex: May 10 2011.
      • Granules are contiguous sequence of chronons. Ex: week, month, year.
      • Granules composition has a lattice structure.
      • Timelines = {T d (day), T w (week), T m (month), T y (year)}
      • Chronons have precedence relationship
  • 14. Time Annotated Document Model
    •   Temporal Expressions
      • Document timestamp collected during crawling.
      • Explicit temporal expressions. Ex. March 12 2005.
      • Implicit temporal expressions. Ex. Columbus day 2008.
      • Relative Temporal Expressions. Ex. Two days from now.
  • 15. Time Annotated Document Model
    •   Temporal Document Profile
      • Temporal document profile is defined as:                                                 tdp: D -> [E x C x P]*
    •         E =     E e U E i U E r
    •         C =     set of all chronons
    •         P =     set of all positions of a temporal expression in a 
    •                    document
      • Simply stating  tdp consists of tuples in the form (e i , c i , p i ) 
      • The tuples in tdp are organized as follows:
    •        ( explicit set, implicit set, dts, realtive set )
  • 16. Timeline Construction and Document Exploration
    •   Constructing a Time Outline     
      • Chronons are extracted from the hit list L q  .
      • Minimum and Maximum chronons describe the lower and upper bound of time outline.
      • Documents are organized in a temporal range which forms the time outline.
  • 17. Timeline Construction and Document Exploration
    •   Document Clustering
      • Chronons are normalized.
    •       g -  granularity. It can be day, week, month or year
      • Documents are mapped to clusters.
      • Main cluster and hot spots   are determined.
  • 18. Timeline Construction and Document Exploration
    •   Ranking Documents in a Cluster
      • Ranks are determined as follows.
      • Given two documents d and d', d is ranked higher than d' if either of the following two conditions hold.
    •      1. rank(d,y j ) > rank(d',y j )
    •      2. rank(d,y j ) = rank(d',y j ) and d is ranked higher in L q than d'
    •          L q - set of result documents of a query q
    •           y j    - cluster y j    
  • 19. Timeline Construction and Document Exploration
    •   Cluster Exploration
      • The cluster can be refined based on timeline for exploration   of results in each cluster.
    •      Ex: refine T y into T m or T w
    • Temporal Snippets
      • Temporal Snippets outline the main events in a document. They are created by pulling the most relevant sentences that contain temporal expressions. TSnippet algorithm is used.
    •   Document Annotation Pipeline
      • First, extract time related metadata like document timestamp during the crawl time from Web server.
      • Second, run the POS tagger on each document which tags parts of speech and inserts sentence delimiters needed for temporal document annotation.
      • Third, run a temporal expression tagger based on TimeML standard. An XML mark up is created (called tdp) which is added to the document.
    •   Exploratory User Interface
  • 22. Evaluation
  • 23. Evaluation
    •   Evaluation guidelines         
      • Precision - fraction of retrieved documents that are relevant. All relevant documents must be included in the timeline.
      • Presentation - diplaying the timeline in an intuitive graphical user interface.
  • 24. Evaluation
    •   DMOZ 
      • It is a multilingual open content directory. The World cup category was picked for evaluation. 
      • Results showed that more clusters were generated by TCluster algorithm and therefore proved to be more precise.
    • TimeBank  
      • It contains news articles that have been annotated using TimeML. 
      • The usage of temporal expressions in documents showed a 50% increase in the number of clusters discovered by TCluster.  
  • 25. Evaluation
    •   Relevance Evaluation using AMT
      • Goal was to evaluate the quality of search results using TCluster in combination with temporal snippets.
      • 10 random informational queries for Wikipedia featured articles were used.  Average response was 4.04% (with an 80% agreement level)
      • Top ten most active topics on Twitter were used. Average response was 4.33% (with an 80% agreement level)
  • 26. Conclusion
  • 27. Conclusion
      • A framework to make the search applications time-aware.
      • TCluster algorithm provides flexibility allowing users to not only explores the results over a timeline but also to explore the results at multiple time granularities.
      • A user engaged in time-related investigations would benefit from this model when traditional information retrieval and search engines cannot offer much.
  • 28. Examples
  • 29. Google search based on time
  • 30. timesearch.info
  • 31. historyworld.net
  • 32. Linkedin timeline
  • 33. Thank You!