SlideShare a Scribd company logo
1 of 1
Download to read offline
News Headlines: What They Can Tell Us?
Sahisnu	Mazumder,	Bazir	Bishnoi	and	Dhaval	Patel
Department	of	Computer	Science	and	Engineering
Indian	Institute	of	Technology,	Roorkee,	India.
MOTIVATION: The Focused News Search
Online	news	media	contents	are	dynamic,	voluminous	and	evolving	by	nature.	 	At	any	
point	of	time,		reader	may	be	interested	to	know	-
 When	“ipl	auction”		related	news	was	at	its	peak	?
 What	are	the	top-5	news	sources	that	have	signi icantly	talked	about	“Narendra	Modi”			in	
past	5	months?	
 What	are	the	top	trending	news	concepts	at	present?			
				-	Answering	all	such	queries	using	current	news	search	engines	is	very	dif icult.	We	need	a	
information	harvesting	platform	that	tracks	the	news	content	published	in	online	news	
media,	analyze	it	and	provide	real-time	news	analytics	to	the	reader.		Indirectly,	such	news	
analytical	platform	helps	news	reader	to	decide	what	news	concepts	to	be	explored,	what	
timeline	needs	to	be	followed	and	where	to	go.	Thus,		focused	news		search	can	be	performed	on	
the	web.
I-CARE
2014
Time-aware News Concept Graph : Capturing Temporal
Dynamics of News Concepts and Their Relationships
News Data Analytics and Applications
Perspective	I:	Potentiality	and	
						Biasness	of	News	Sources	
Perspective	II:	What	News	Concepts	are	Popular	and	When? News	Analytics	Applications
Why	News	Headlines?
Harvesting	News	Headlines	:	The	First	Step	(Data	Set	Collection)
Generic	Structure	of	TNCG
 We	build	a	news	crawler	that	searches	a	pre-de ined	list	of	87	news	websites	at	an	
interval	of	30	min.	and	then,	collects	information	about	the	news	headlines	published	
during	that	time	interval.
 Each	news	headline	is	stored	in	the	form	
of	a	record	having	a	unique	headline	ID,	
headline	 text,	 start-timestamp,	 end-
timestamp,	 source	 id,	 source	 url	 and	
category	 of	 the	 source	 (e.g.,	 business,	
sports,	technology	etc.).
 News	headlines	are	summarized,	audited	textual	information	and	represents	the	key	
idea	of	the	corresponding	news	article.	
 News	 headlines	 are	 useful	 for	 discovering	 and	 studying	 news	 concepts	 and	 their	
relationships	over	a	course	of	time.
 Properties	 of	 Links	 labelled	 with	
“Is_Related_To”	 relationship:	 Initial	
Start_timestamp	 (time	 when	 the	
concept-pair	 co-occurred	 for	 the	 irst	
time),	 Last	 end_timestamp	 (time	 when	
the	concept-pair	co-occurred	for	the	last	
time),	Duration_List	(list	of	timestamps	of	
co-occurrences	 of	 the	 concept-pair)	 and	
Relationship	support	(frequency	of	 	co-
occurrences	).
 Properties	of	Concept	Nodes:	Concept_Name	(name	of	the	concept),	Concept_Type	
(Whether	it	is	a	personal	entity	or	not)	and	
Attribute_List	(list	of	attribute	values	if	it	
is	a	personal	entity).
Frequency of publishing news related to “ipl” (related to sports)
by three news sources over 22 weeks of past 5 months.
Frequency of publishing news related to “naremdra_modi” (related
to politics) by three news sources over 22 weeks of past 5 months.
indianexpress	 has	 published	 more	 news	
about	 “ipl”	 compared	 to	 indiatoday	 and	
nbcsports.			During	week	11	to	20,	the	rate	of	
publishing	of	“ipl"	related	news	has	raised	to	a	
sign icant	extent	due	to	the	IPL	tournaments	
in	India.
In	 case	 of	 “narendra	 modi”,	 news	 source	
zeenews	 has	 dominated	 hindustantimes	
and	thehindu.	And	rate	of	publishing	news	
related	to	“narendra	modi”	was	high	during	
week	8	to	18	(election	time).
Time-aware	 Query	 Expansion:	 returns	 a	 set	 of	
related	concepts	co-occurred	with	an	input	news	concept	
within	a	speci ied	input	time-span.	The	Active	Concepts	
in	the	 igure	on	left	shows	result	of	time-aware	query	
expansion	for	“ipl_7”	and	on	week	17.
Entity	Relationship		Mining:	 inding	top-k	relations	and	
their	evolution	pattern	for	a	given	entity	over	a	given	time-
span.	E.g.,	“rahul	gandhi"	was	cited	more	with	“narendra	
modi"	compared	to	“arvind	kejriwal"	during	Election	2014.
Other	News	Analytics	Applications:-
Te m p o r a l 	 R a n k - a w a r e 	 N e w s 	 C o n c e p t	
Recommendation,	 Concept-based	 Community	
Discovery,	News	Trend	Analysis	etc.	
See	paper	“News	Headlines:	What	They	Can	Tell	Us?”	
for	details.
Objective:		In	this	paper,	we	utilize	the	news	headlines	published	in	online	news	media	
to	develop	the	news	analytic	platform.	
Par al TNCG
News Webpage
Par al TNCG
Cloud of related concepts for “kkr” during 11th May to 8th June, 2014
News	Concepts:	nouns	and	collocation	of	nouns	and	numbers.	E.g.	-	“ipl-7”,	“narendra	
modi”,	“election	2014”	etc.
			A	Time-aware	News	Concept	Graph	(TNCG)	is	a	property	graph	where	nodes	are	news	
concepts	and	two	nodes	are	connected	by	a	link	if	the	two	concepts	co-occur	in	the	same	
news	headline.
		Given	a	set	of	news	headlines,	we	extract	the	news	concepts	from	each	headline	and	
proceed	to	build	the	TNCG.	The	 igure	below	shows	the	3-step	process	of	constructing	
TNCG	from	a	single	news	headline.	
 	ipl_7
 	gulab_	gang
 satya_
					nadella
 aiims
 kkr
 	gaza
 	israel
 	ebola
 kashmir_
					 lood
 	isro
Concepts	
died	out	in	
past	3	months
Concepts	
emerged	in	
past	3	months

More Related Content

Similar to Poster_of_Paper_#_4

Twitter_Hashtag_Prediction.pptx
Twitter_Hashtag_Prediction.pptxTwitter_Hashtag_Prediction.pptx
Twitter_Hashtag_Prediction.pptxSayaliKawale2
 
Proposal final
Proposal finalProposal final
Proposal finalMido Razaz
 
Everything That You Must Know About Media Literacy
Everything That You Must Know About Media LiteracyEverything That You Must Know About Media Literacy
Everything That You Must Know About Media LiteracyOtoo tutions
 
India spend
India spendIndia spend
India spendDasra
 
76201960
7620196076201960
76201960IJRAT
 
Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...Ashley Thomas
 
Discussion leader powerpoint
Discussion leader powerpointDiscussion leader powerpoint
Discussion leader powerpointgracep26
 
Discussion leader powerpoint
Discussion leader powerpointDiscussion leader powerpoint
Discussion leader powerpointgracep26
 
Divya Bhaskar market research by jayshah316
Divya Bhaskar market research by jayshah316Divya Bhaskar market research by jayshah316
Divya Bhaskar market research by jayshah316Jay Shah
 
leewayhertz.com-AI in market research Charting a course from raw data to stra...
leewayhertz.com-AI in market research Charting a course from raw data to stra...leewayhertz.com-AI in market research Charting a course from raw data to stra...
leewayhertz.com-AI in market research Charting a course from raw data to stra...KristiLBurns
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET Journal
 
A Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live TweetsA Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live Tweetsijtsrd
 
Detection of Fake News Using Machine Learning
Detection of Fake News Using Machine LearningDetection of Fake News Using Machine Learning
Detection of Fake News Using Machine LearningIRJET Journal
 
This research proposal will cover 10
This research proposal will cover 10This research proposal will cover 10
This research proposal will cover 10samaralamoudi
 
Future of market research
Future of market researchFuture of market research
Future of market researchAniket Aggarwal
 

Similar to Poster_of_Paper_#_4 (20)

Twitter_Hashtag_Prediction.pptx
Twitter_Hashtag_Prediction.pptxTwitter_Hashtag_Prediction.pptx
Twitter_Hashtag_Prediction.pptx
 
Proposal final
Proposal finalProposal final
Proposal final
 
Financial application
Financial applicationFinancial application
Financial application
 
Internship summary
Internship summaryInternship summary
Internship summary
 
Audience Lessons
Audience LessonsAudience Lessons
Audience Lessons
 
Python term project
Python term projectPython term project
Python term project
 
Everything That You Must Know About Media Literacy
Everything That You Must Know About Media LiteracyEverything That You Must Know About Media Literacy
Everything That You Must Know About Media Literacy
 
India spend
India spendIndia spend
India spend
 
vishwas
vishwasvishwas
vishwas
 
76201960
7620196076201960
76201960
 
Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...
 
Discussion leader powerpoint
Discussion leader powerpointDiscussion leader powerpoint
Discussion leader powerpoint
 
Discussion leader powerpoint
Discussion leader powerpointDiscussion leader powerpoint
Discussion leader powerpoint
 
Divya Bhaskar market research by jayshah316
Divya Bhaskar market research by jayshah316Divya Bhaskar market research by jayshah316
Divya Bhaskar market research by jayshah316
 
leewayhertz.com-AI in market research Charting a course from raw data to stra...
leewayhertz.com-AI in market research Charting a course from raw data to stra...leewayhertz.com-AI in market research Charting a course from raw data to stra...
leewayhertz.com-AI in market research Charting a course from raw data to stra...
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
 
A Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live TweetsA Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live Tweets
 
Detection of Fake News Using Machine Learning
Detection of Fake News Using Machine LearningDetection of Fake News Using Machine Learning
Detection of Fake News Using Machine Learning
 
This research proposal will cover 10
This research proposal will cover 10This research proposal will cover 10
This research proposal will cover 10
 
Future of market research
Future of market researchFuture of market research
Future of market research
 

Poster_of_Paper_#_4

  • 1. News Headlines: What They Can Tell Us? Sahisnu Mazumder, Bazir Bishnoi and Dhaval Patel Department of Computer Science and Engineering Indian Institute of Technology, Roorkee, India. MOTIVATION: The Focused News Search Online news media contents are dynamic, voluminous and evolving by nature. At any point of time, reader may be interested to know -  When “ipl auction” related news was at its peak ?  What are the top-5 news sources that have signi icantly talked about “Narendra Modi” in past 5 months?  What are the top trending news concepts at present? - Answering all such queries using current news search engines is very dif icult. We need a information harvesting platform that tracks the news content published in online news media, analyze it and provide real-time news analytics to the reader. Indirectly, such news analytical platform helps news reader to decide what news concepts to be explored, what timeline needs to be followed and where to go. Thus, focused news search can be performed on the web. I-CARE 2014 Time-aware News Concept Graph : Capturing Temporal Dynamics of News Concepts and Their Relationships News Data Analytics and Applications Perspective I: Potentiality and Biasness of News Sources Perspective II: What News Concepts are Popular and When? News Analytics Applications Why News Headlines? Harvesting News Headlines : The First Step (Data Set Collection) Generic Structure of TNCG  We build a news crawler that searches a pre-de ined list of 87 news websites at an interval of 30 min. and then, collects information about the news headlines published during that time interval.  Each news headline is stored in the form of a record having a unique headline ID, headline text, start-timestamp, end- timestamp, source id, source url and category of the source (e.g., business, sports, technology etc.).  News headlines are summarized, audited textual information and represents the key idea of the corresponding news article.  News headlines are useful for discovering and studying news concepts and their relationships over a course of time.  Properties of Links labelled with “Is_Related_To” relationship: Initial Start_timestamp (time when the concept-pair co-occurred for the irst time), Last end_timestamp (time when the concept-pair co-occurred for the last time), Duration_List (list of timestamps of co-occurrences of the concept-pair) and Relationship support (frequency of co- occurrences ).  Properties of Concept Nodes: Concept_Name (name of the concept), Concept_Type (Whether it is a personal entity or not) and Attribute_List (list of attribute values if it is a personal entity). Frequency of publishing news related to “ipl” (related to sports) by three news sources over 22 weeks of past 5 months. Frequency of publishing news related to “naremdra_modi” (related to politics) by three news sources over 22 weeks of past 5 months. indianexpress has published more news about “ipl” compared to indiatoday and nbcsports. During week 11 to 20, the rate of publishing of “ipl" related news has raised to a sign icant extent due to the IPL tournaments in India. In case of “narendra modi”, news source zeenews has dominated hindustantimes and thehindu. And rate of publishing news related to “narendra modi” was high during week 8 to 18 (election time). Time-aware Query Expansion: returns a set of related concepts co-occurred with an input news concept within a speci ied input time-span. The Active Concepts in the igure on left shows result of time-aware query expansion for “ipl_7” and on week 17. Entity Relationship Mining: inding top-k relations and their evolution pattern for a given entity over a given time- span. E.g., “rahul gandhi" was cited more with “narendra modi" compared to “arvind kejriwal" during Election 2014. Other News Analytics Applications:- Te m p o r a l R a n k - a w a r e N e w s C o n c e p t Recommendation, Concept-based Community Discovery, News Trend Analysis etc. See paper “News Headlines: What They Can Tell Us?” for details. Objective: In this paper, we utilize the news headlines published in online news media to develop the news analytic platform. Par al TNCG News Webpage Par al TNCG Cloud of related concepts for “kkr” during 11th May to 8th June, 2014 News Concepts: nouns and collocation of nouns and numbers. E.g. - “ipl-7”, “narendra modi”, “election 2014” etc. A Time-aware News Concept Graph (TNCG) is a property graph where nodes are news concepts and two nodes are connected by a link if the two concepts co-occur in the same news headline. Given a set of news headlines, we extract the news concepts from each headline and proceed to build the TNCG. The igure below shows the 3-step process of constructing TNCG from a single news headline.  ipl_7  gulab_ gang  satya_ nadella  aiims  kkr  gaza  israel  ebola  kashmir_ lood  isro Concepts died out in past 3 months Concepts emerged in past 3 months