1. News Headlines: What They Can Tell Us?
Sahisnu Mazumder, Bazir Bishnoi and Dhaval Patel
Department of Computer Science and Engineering
Indian Institute of Technology, Roorkee, India.
MOTIVATION: The Focused News Search
Online news media contents are dynamic, voluminous and evolving by nature. At any
point of time, reader may be interested to know -
When “ipl auction” related news was at its peak ?
What are the top-5 news sources that have signi icantly talked about “Narendra Modi” in
past 5 months?
What are the top trending news concepts at present?
- Answering all such queries using current news search engines is very dif icult. We need a
information harvesting platform that tracks the news content published in online news
media, analyze it and provide real-time news analytics to the reader. Indirectly, such news
analytical platform helps news reader to decide what news concepts to be explored, what
timeline needs to be followed and where to go. Thus, focused news search can be performed on
the web.
I-CARE
2014
Time-aware News Concept Graph : Capturing Temporal
Dynamics of News Concepts and Their Relationships
News Data Analytics and Applications
Perspective I: Potentiality and
Biasness of News Sources
Perspective II: What News Concepts are Popular and When? News Analytics Applications
Why News Headlines?
Harvesting News Headlines : The First Step (Data Set Collection)
Generic Structure of TNCG
We build a news crawler that searches a pre-de ined list of 87 news websites at an
interval of 30 min. and then, collects information about the news headlines published
during that time interval.
Each news headline is stored in the form
of a record having a unique headline ID,
headline text, start-timestamp, end-
timestamp, source id, source url and
category of the source (e.g., business,
sports, technology etc.).
News headlines are summarized, audited textual information and represents the key
idea of the corresponding news article.
News headlines are useful for discovering and studying news concepts and their
relationships over a course of time.
Properties of Links labelled with
“Is_Related_To” relationship: Initial
Start_timestamp (time when the
concept-pair co-occurred for the irst
time), Last end_timestamp (time when
the concept-pair co-occurred for the last
time), Duration_List (list of timestamps of
co-occurrences of the concept-pair) and
Relationship support (frequency of co-
occurrences ).
Properties of Concept Nodes: Concept_Name (name of the concept), Concept_Type
(Whether it is a personal entity or not) and
Attribute_List (list of attribute values if it
is a personal entity).
Frequency of publishing news related to “ipl” (related to sports)
by three news sources over 22 weeks of past 5 months.
Frequency of publishing news related to “naremdra_modi” (related
to politics) by three news sources over 22 weeks of past 5 months.
indianexpress has published more news
about “ipl” compared to indiatoday and
nbcsports. During week 11 to 20, the rate of
publishing of “ipl" related news has raised to a
sign icant extent due to the IPL tournaments
in India.
In case of “narendra modi”, news source
zeenews has dominated hindustantimes
and thehindu. And rate of publishing news
related to “narendra modi” was high during
week 8 to 18 (election time).
Time-aware Query Expansion: returns a set of
related concepts co-occurred with an input news concept
within a speci ied input time-span. The Active Concepts
in the igure on left shows result of time-aware query
expansion for “ipl_7” and on week 17.
Entity Relationship Mining: inding top-k relations and
their evolution pattern for a given entity over a given time-
span. E.g., “rahul gandhi" was cited more with “narendra
modi" compared to “arvind kejriwal" during Election 2014.
Other News Analytics Applications:-
Te m p o r a l R a n k - a w a r e N e w s C o n c e p t
Recommendation, Concept-based Community
Discovery, News Trend Analysis etc.
See paper “News Headlines: What They Can Tell Us?”
for details.
Objective: In this paper, we utilize the news headlines published in online news media
to develop the news analytic platform.
Par al TNCG
News Webpage
Par al TNCG
Cloud of related concepts for “kkr” during 11th May to 8th June, 2014
News Concepts: nouns and collocation of nouns and numbers. E.g. - “ipl-7”, “narendra
modi”, “election 2014” etc.
A Time-aware News Concept Graph (TNCG) is a property graph where nodes are news
concepts and two nodes are connected by a link if the two concepts co-occur in the same
news headline.
Given a set of news headlines, we extract the news concepts from each headline and
proceed to build the TNCG. The igure below shows the 3-step process of constructing
TNCG from a single news headline.
ipl_7
gulab_ gang
satya_
nadella
aiims
kkr
gaza
israel
ebola
kashmir_
lood
isro
Concepts
died out in
past 3 months
Concepts
emerged in
past 3 months