SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 14
A Review on Topic Detection and Term-Term Relation Analysis in Big
Data
K Swanthana1, K Swapnika2
1,2 Assistant professor, Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology,
Telangana, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Topic models provide a way to aggregate
vocabulary from a document corpus to form latent topics. In
particular, Latent Dirichlet Allocation (LDA)isoneofthemost
popular topic modeling approaches. Learning the meaningful
topic models with massive documentcollectionswhichcontain
millions of documents and billions of tokens is challenging.
Topic detection is a tool to detect topics from media attracts
much attention. Generally, a topic is characterized by a set of
informative keywords/terms. Traditional approaches are
usually based on various topic models, suchasLatentDirichlet
Allocation (LDA). They cluster terms into a topic by mining
semantic relations between terms. However, they neglect the
co-occurrence relations found across the document, which
leads to the detection of incompleteinformation. Furthermore,
the inability to discover latent co-occurrence relations via the
context or other bridge terms will prevent the important but
rare topics being detected. To solve this problem we integrate
semantic relations and implicit co-occurrence relations for
topic detection. Specifically, the approach combines multiple
relations into a term graph and detects topics from the graph
using a graph analytical method. By combing mutually
complementary relations it cannot detect the topics more
effectively. This approach can mine important rare topics by
leveraging latent co-occurrence relations in huge corpus.
Key Words: Topic model, LDA, Semantic, Co-
occurrence, Latent Co-occurrence, Corpus
1. INTRODUCTION
Due to the rapid development of internet, the amount of
online text is experiencing explosive growth by means of
online news media and social media. The rich information
within the text data can be utilized to reveal some
meaningful trends/topics or the evolution of certain social
phenomena, such as the presidential election of the USA.
Besides, it can also be exploited for detecting some
emergency events or natural disasters, such as 2014
Shanghai Stampede1 and 2014 Kangding earthquake.2 In
general, an event is considered as something non-trivial
happening at a specific date/time and in a specific location
[1]. A topic can be considered as a kind of “abstract” event
which consists of some “concrete” events with semantic
relatedness, here “events” and “topics”areinter-changeable.
Topic detection is a sub-task ofTopic DetectionandTracking
(TDT) [2]. It aims at detecting topics or trends from various
text corpuses such as online media. Topic detection as a
fundamental problem of information retrieval can help the
decision makers to efficiently detect meaningful topics.
Therefore, it has attracted much attention such as public
opinion monitoring, decision supporting and emergency
management.
Automatic identification of semantic content of
documents has become increasingly important due to its
effectiveness in many tasks, including information retrieval,
information filtering and organization of documents
collections in digital libraries. The identification of the
topic(s) that a document addresses increases our
understanding of that document, the characteristics of the
collection as a whole and the interplay between distinct
topics. In collections where the temporal ordering of
documents is not of importance, studying a snapshot of the
collection at any given time is sufficient to deduct as much
information as possible about the various topics of interest
in the collection. On the other hand, many document
collections exhibit temporal relationships that are often
times utilized to aid the topic discoveryprocess.Theanalysis
of the temporal dimension of collections has become an
important field of study in many applications, including
weblog topic mining [5], evolution of author and paper
networks [4], news event analysis [6] and social interaction
of researchers [7,3].
In topic detection,a topicisgenerallyrepresentedas
a set of descriptive and collocated terms. Firstly, document
clustering techniques are applied in topic detection to
cluster content-similar documents and extract keywords
from clustered document sets as the representation of the
topics. Currently, most of the approaches use various topic
models, a type of generative probabilisticmodel suchasLDA
[8], pLSA [9] and their extensions, to detect topics. Among
them, LDA has been proved to be a powerful algorithm
because of its ability on mining the semantic information
from the text data. Terms having semantic relations with
each other are collected as a topic.
Latent Dirichlet Allocation (LDA) [10] has been shown to be
a highly effective unsupervised learning methodology for
finding distinct topics in document collections. LDA is a
generative process that models each document as a mixture
of topics where each topic corresponds to a multinomial
distribution over words. The learned document-topic and
topic-word distributions are then used to identify the best
topics for the documents and the most descriptivewordsfor
each topic. However, the original formulationofLDAfocuses
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 15
on analyzing a snapshot of collections and views the
collection as being generated at a single point in time.
However, there are two challenges to existing topic
model based approaches. Firstly, as claimed by Sayyadi and
Raschid [1], “Current topic modeling methods do not
explicitly consider word co-occurrences”. “Co-occurrence”
means two terms co-occur in the same document.
Unfortunately, due to the fact that“Extendingtopic modeling
to include co-occurrence can be a computationally daunting
challenge” [1], their proposed graph analytical approach
only made an approximation to this extension: they merely
took into account co-occurrence information alone while
ignoring semantic information. Therefore, how to combine
semantic relations and co-occurrence relations to
complement each other remains to be a challenge. Secondly,
existing approaches usually focus on detectingprominent or
distinct topics by mining explicit semantic relations or
frequent co-occurrence relations. However, they neglect to
uncover latent co-occurrence relations. The inability to
uncover latent relations prevents the important but rare
topics, which are hidden in large scale and noisy data
collections, from being detected. Such important rare topics
have two attributes: significant for human decision making
but rare that cannot be discovered easily [11]. In other
words, their features are commonly implicit or latent. Here
gives some examples: the latent omenssuchastheabnormal
behaviors of some animals may reveal that the disasters
such as earthquake will occur soon; the early incubations of
the disease may trigger the subsequent cancer. The reason
why such topics are commonly ignored is: they differ from
the distinct topics indicating common patterns;besides,they
are not outliers or noises in the sense of anomaly detection
[11].
2. RELATED WORK
Numerous statistical approaches for modeling text
documents have been proposed to model text documents.
Probabilistic latentsemantic analysis(pLSA)[12]modelsthe
generation of each document through activating multiple
topics over words. This model improves upon the singular
value decomposition based latent semantic analysis (LSA)
[13] which cannot handle polysemy. pLSA, on the other
hand, uses a distribution indexed by the trainingdocuments,
leading to the fact that the number of parameters to be
estimated grows linearly with the number of training
documents in thecollection. Thus,practical applicationswith
large training documents are susceptible to over fittingwith
this model. Latent Dirichlet Allocation (LDA) [10] is another
generative model that has become popular in recent years
that overcomes the over fitting problem of pLSA by using a
Dirichlet for modeling the distribution of topics for each
document.
To uncover latent co-occurrencerelationsanddiscover
important rare topics Chance Discovery (CD) theory and
Idea Discovery (ID) theory are used. These are defined as a
rare but important event or situation which has a strong
impact on human decision making [11,14]. CD and ID as
extensions of Knowledge Discoveryhave beenusedtodetect
chances by uncovering latent co-occurrencerelationsamong
terms. In the ID process, text data is analyzed and converted
into a term graph by mining co-occurrence relations, where
latent co-occurrence relations are uncovered and visualized
to capture chances. Latent co-occurrence relations means
there may be no frequent co-occurrence relations between
two terms (the terms do not frequently co-occur in thesame
documents); but, the two terms can be implicitly
related/linked by considering the “context” of one of the
terms or other bridge terms. Here “context” denotes
neighbors of the term, which are strongly interconnected in
the form of a community (cluster). In other words, latent co-
occurrence relations between two terms cannot be
measured in an isolated term-term view; the context of the
term should be taken into account. To address these
challenges, We use an integrated approach to integrate
semantic information and co-occurrenceinformationamong
terms for topic detection. Specifically, the approach fuses
multiple types of relations into a uniform term graph by
incorporating ID theory with topic modeling method.
Firstly, an Idea Discovery algorithm called Idea Graph is
adopted to mine co-occurrence relations (especially latent
co-occurrence relations) for converting the corpus into a
term graph. Then, a semantic relations extraction approach
is proposed based on LDA to enrich the graph with semantic
information. Lastly, a graph analytical method is presented
to exploit the graph for detecting topics.
To the best of our knowledge, the coupling of ID and topic
model for topic detection has not been seen as follows:
(1) It can detect topics more effectively to support human
decision making by combing mutually complementary
relations: semantic relations and co-occurrence relations.
(2) It can mine important rare topics by leveraginglatent co-
occurrence relations, which may aid human to perceive the
topics with great significance.
3. TOPIC MODELING APPROACHES
Topic Detection and Tracking (TDT) is an integral partof the
Translingual Information Detection, Extraction, and
Summarization (TIDES) program [2]. TDT mainly contains
two sub-task: Topic Detection and Topic Tracking. Topic
detection (Event detection) aims at detecting novel
topics/events from text corpus while topic tracking is
dedicated to tracking the evolution of existing topics over
temporal dimension. Topic detection has attracted much
attention in machine learning, information retrieval and
social media modeling [1,11,15]. Specifically,topicdetection
can be classified into two types: New Event Detection (NED)
and Retrospective Event Detection (RED).
NED aims atdetectingnewlyencounteredtopics/events
from online text streams [15]. Rill et al. [16] proposed a
system to detect emerging political topics in Twitter. The
detected topics can be used to extend existing knowledge
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 16
bases for better concept-level sentiment analysis. Hou et al.
[17] proposed a multifaceted news analysis approach to
detect events from online news. They represented newsasa
link-centric heterogeneous network and formalized news
analysis and mining task aslink discoveryproblem.Based on
that, they presented a unified probabilistic model for topic
extraction and inner relationship discovery within events.
RED is dedicated to discovering the events from the
historical corpus in an offline way [18]. Yang et al. [18]
proposed an agglomerative clustering algorithm, named
augmented Group Average Clustering, to clusterarticlesinto
events. They also employed an iterative bucketing and re-
clustering model proposed by Cutting et al. [19] to control
the tradeoff between cluster quality and computational
efficiency. Zeng and Zhang [20] presented a variable space
Markov model for topic detection,whereseveral steps based
on space computation and a hierarchal clustering algorithm
are proposed to tackle the issuesofdocumentimbalanceand
topic transition. We note our paper only addresses RED
problem.
3.1 Probabilistic approach
As mentioned previously, some RED approaches employ
topic modeling, a type of theprobabilistic modeling,todetect
topics from the text data. A famous topic model named
Latent Dirichlet Allocation (LDA) [8] is a three-level
hierarchical Bayesian model. Each document is represented
as a finite mixture over an underlying set of latent topics,
where each topic is characterized by a distribution over
terms. Terms having strong semantic relations with each
other are clustered as a topic’s features or representation.
There are plenty of improved versions of LDA. For example,
several knowledge-based topic models have been proposed
to incorporate prior domain knowledge from the user to
generate coherent topics [21–26]. Among which, Chenand
Liu [22] proposed the AMC algorithm, topic modeling with
automatically generated Must-links and Cannot-links, to
incorporate the knowledge automatically mined from the
past learning/modeling results, which can help future
learning. Xu et al. [26] used a knowledge based topic model
to extract implicit features of product reviews for opinion
mining task.3
3.3 Graph analytical approach
Other approaches on topic detection leverage the graph
analytical method to detect the topics within the graph or
network. Sayyadi and Raschid [1] proposed a graph
analytical approach for topic detection. They used a Key
Graph algorithm [27] to convert text data into a term graph
based on co-occurrence relations between terms. Then they
employed a community detection approach to partition the
graph. Eventually, eachcommunityisregardedasa topicand
terms within the community are considered as the topic’s
features.
4. CONCLUSIONS
In this paper, we explored the solution for the problem of
ignoring the latent i.e implicit meaning of the terms which
co-occur in the documents by integrating semantic relations
with implicit or frequent co-occurrence relations and latent
co-occurrence relations for important topics detection. In
this approach topic modeling is incorporated with Chance
Discovery (CD) to capture the semantic relations and co-
occurrence relations, which facilitates effective topic
detection. Idea Discovery (ID) is to convert the analyzed
data into a term graph by mining co-occurrence relations.In
future there is a need and extension for this approach for
new event detection (NED) to satisfy the need of Big Data
era. Finally we can extend this approach to represent the
topic hierarchy, where the events as the basic elements of
the topic to be detected.
REFERENCES
[1] H. Sayyadi, L. Raschid , A graph analytical approach
for topic detection, ACMTrans. Internet Technol 13
(2) (2013) 4, 23.
[2] J. Allan, J.G. Carbonell, G. Doddington, J. Yamron, Y.
Yang, Topic detection and tracking pilot study, in:
Proceedings of the DARPA Broadcast News
Transcription and Understanding Workshop, 1998,
pp. 194–218.
[3] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X.
Lan. Group formation in large social networks:
membership, growth, and evolution. In KDD ’06:
Proceedings of the 12th ACM SIGKDD international
conference onKnowledgediscoveryanddata mining,
pages 44–54, New York, NY, USA, 2006. ACM Press.
[4] K. Brner, J. T. Maru, and R. L. Goldstone. The
simultaneous evolution of author and paper
networks. Proceedings of the National Academy of
Sciences, 101 Suppl 1:5266–5273, April 2004.
[5] Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic
approach to spatiotemporal themepattern miningon
weblogs. In WWW ’06: Proceedings of the 15th
international conference on World Wide Web, pages
533–542, New York, NY, USA, 2006. ACM Press.
[6] Q. Mei and C. Zhai. Discovering evolutionary theme
patterns from text: an exploration of temporal text
mining. In KDD ’05: Proceeding of the eleventh ACM
SIGKDD international conference on Knowledge
discovery in data mining, pages 198–207, New York,
NY, USA, 2005. ACM Press.
[7] D. Zhou, X. Ji, H. Zha, and C. L. Giles. Topic evolution
and social interactions: how authors effect research.
In CIKM ’06: Proceedings of the 15th ACM
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 17
international conference on Information and
knowledge management, pages 248–257, New York,
NY, USA, 2006. ACM Press.
[8] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet
allocation, Adv. J. Mach. Learn. Res. 3 (2003) 993–
1022.
[9] T. Hofmann, Probabilistic latent semantic analysis,
UAI (1999) 289–296.
[10] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet
allocation. Journal of Machine Learning Research,
3:993–1022, 2003.
[11] Y. Ohsawa, Chance discoveries for making decisions
in complex real world, New Gen. Comput. 20 (2)
(2002) 143–163.
[12] T. Hofmann. Probabilistic latent semantic indexing.
In SIGIR ’99: Proceedings of the 22nd annual
international ACM SIGIR conferenceonResearchand
25 development in information retrieval, pages 50–
57, New York, NY, USA, 1999. ACM Press.
[13] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W.
Furnas, and R. A. Harshman. Indexing by latent
semantic analysis. Journal of the American Societyof
Information Science, 41(6):391–407, 1990.
[14] H.Wang, Y. Ohsawa, Y. Nishihara, Innovationsupport
system for creative product design based on chance
discovery, Expert Syst. Appl. 39 (5) (2012) 4890–
4897.
[15] J. Allan, R. Papka, V. Lvrenko , On-line new event
detection and tracking, SIGIR (1998) 37–45.
[16] S Rill, D Reinel, J. Scheidt, et al., Politwi: early
detection of emerging political topics on twitter and
the impact on concept-level sentiment analysis,
Knowl. Based Syst. 69 (2014) 24–33.
[17] L. Hou, J. Li, Z. Wang, et al., NewsMiner: Multifaceted
news analysis for event search,Knowl. BasedSyst.76
(2015) 17–29.
[18] Y. Yang, T. Pierce, J.G. Carbonell, A study on
retrospective and on-line event detection, SIGIR
(1998) 28–36.
[19] D.R. Cutting, D.R. Karger, J.O. Pedersen, 803 J.W.
Tukey, Scatter/gather: A cluster-based approach to
browsing large document collections, SIGIR (1992)
318–329.
[20] J. Zeng, S. Zhang, Variable space hidden Markov
model for topic detection and analysis,Knowl.Based
Syst. 20 (7) (2007) 607–613.
[21] D. Andrzejewski, X. Zhu, M. Craven, Incorporating
domain knowledge into topic modeling via Dirichlet
forest priors, ICML (2009) 25–32.
[22] Z. Chen, B. Liu,Mining topics in documents: standing
on the shoulders of bigdata,KDD(2014)1116–1125.
[23] Z. Chen, A.Mukherjee, B.Liu,M.Hsu,M.Castellanos,R.
Ghosh, Exploiting domain knowledge in aspect
extraction, EMNLP (2013) 1655–1667.
[24] A. Mukherjee, B. Liu, Aspect extraction through
semi-supervised modeling, ACL (2012) 339–348.
[25] X. Fu, K. Yang, J.Z. Huang, L. Cui, Dynamic non-
parametric joint sentiment topic mixture model,
Knowl. Based Syst. 82 (2015) 102–114.
[26] H. Xu, F. Zhang, W. Wang, Implicit feature
identification in Chinese reviews using explicit topic
mining model, Knowl. Based Syst. 76 (2015) 166–
175.
[27] Y. Ohsawa, N.E. Benson, Y. Masahiko, KeyGraph:
automatic indexing by co-occurrencegraphbasedon
building construction metaphor, ADL (1998) 12–18.

More Related Content

What's hot

Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...ijtsrd
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringIOSR Journals
 
Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...
Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...
Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...HennaAnsari
 
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...IJMTST Journal
 
Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...IJMTST Journal
 
Q&A from ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...
Q&A from  ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...Q&A from  ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...
Q&A from ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...ARDC
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Techniquepaperpublications3
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysRoy Clariana
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersSeth Grimes
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataPanos Alexopoulos
 
Masters Project - FINAL - Public
Masters Project - FINAL - PublicMasters Project - FINAL - Public
Masters Project - FINAL - PublicMichael Hay
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - IreneSSSW
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
ExpertDiscoveryUsingLatentDirichletAllocation
ExpertDiscoveryUsingLatentDirichletAllocationExpertDiscoveryUsingLatentDirichletAllocation
ExpertDiscoveryUsingLatentDirichletAllocationAleksandr Rogozin
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
 

What's hot (17)

Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document Clustering
 
Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...
Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...
Qualitative analysis/cluster analysis/NVivo analysis /content analysis Interp...
 
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
 
Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...Identical Users in Different Social Media Provides Uniform Network Structure ...
Identical Users in Different Social Media Provides Uniform Network Structure ...
 
Q&A from ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...
Q&A from  ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...Q&A from  ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...
Q&A from ARDC webinar: ‘Data management in NHMRC’s revised National Statemen...
 
An Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search TechniqueAn Advanced IR System of Relational Keyword Search Technique
An Advanced IR System of Relational Keyword Search Technique
 
Directed versus undirected network analysis of student essays
Directed versus undirected network analysis of student essaysDirected versus undirected network analysis of student essays
Directed versus undirected network analysis of student essays
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic Data
 
Masters Project - FINAL - Public
Masters Project - FINAL - PublicMasters Project - FINAL - Public
Masters Project - FINAL - Public
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - Irene
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
ExpertDiscoveryUsingLatentDirichletAllocation
ExpertDiscoveryUsingLatentDirichletAllocationExpertDiscoveryUsingLatentDirichletAllocation
ExpertDiscoveryUsingLatentDirichletAllocation
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 

Similar to IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data

An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07IJwest
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...ijcsity
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...IRJET Journal
 
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...ijtsrd
 
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...AIRCC Publishing Corporation
 
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detectionIJICTJOURNAL
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET Journal
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profilescsandit
 
Scraping and clustering techniques
Scraping and clustering techniquesScraping and clustering techniques
Scraping and clustering techniquescsandit
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxjennifer822
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxedgar6wallace88877
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docxsmithhedwards48727
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...IJCSEA Journal
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)IJCSEA Journal
 

Similar to IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data (20)

An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07An improved technique for ranking semantic associationst07
An improved technique for ranking semantic associationst07
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
 
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
Efficient Way to Identify User Aware Rare Sequential Patterns in Document Str...
 
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
 
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
 
Scraping and clustering techniques
Scraping and clustering techniquesScraping and clustering techniques
Scraping and clustering techniques
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docx
 
Singapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docxSingapore Management UniversityInstitutional Knowledge at Si.docx
Singapore Management UniversityInstitutional Knowledge at Si.docx
 
IJET-V3I2P23
IJET-V3I2P23IJET-V3I2P23
IJET-V3I2P23
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)International Journal of Computer Science, Engineering and Applications (IJCSEA)
International Journal of Computer Science, Engineering and Applications (IJCSEA)
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturingssuser0811ec
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdfKamal Acharya
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamDr. Radhey Shyam
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdfKamal Acharya
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringDr. Radhey Shyam
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringC Sai Kiran
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfKamal Acharya
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfPipe Restoration Solutions
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGKOUSTAV SARKAR
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationDr. Radhey Shyam
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageRCC Institute of Information Technology
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxMd. Shahidul Islam Prodhan
 

Recently uploaded (20)

ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturing
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and ClusteringKIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 

IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 14 A Review on Topic Detection and Term-Term Relation Analysis in Big Data K Swanthana1, K Swapnika2 1,2 Assistant professor, Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology, Telangana, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Topic models provide a way to aggregate vocabulary from a document corpus to form latent topics. In particular, Latent Dirichlet Allocation (LDA)isoneofthemost popular topic modeling approaches. Learning the meaningful topic models with massive documentcollectionswhichcontain millions of documents and billions of tokens is challenging. Topic detection is a tool to detect topics from media attracts much attention. Generally, a topic is characterized by a set of informative keywords/terms. Traditional approaches are usually based on various topic models, suchasLatentDirichlet Allocation (LDA). They cluster terms into a topic by mining semantic relations between terms. However, they neglect the co-occurrence relations found across the document, which leads to the detection of incompleteinformation. Furthermore, the inability to discover latent co-occurrence relations via the context or other bridge terms will prevent the important but rare topics being detected. To solve this problem we integrate semantic relations and implicit co-occurrence relations for topic detection. Specifically, the approach combines multiple relations into a term graph and detects topics from the graph using a graph analytical method. By combing mutually complementary relations it cannot detect the topics more effectively. This approach can mine important rare topics by leveraging latent co-occurrence relations in huge corpus. Key Words: Topic model, LDA, Semantic, Co- occurrence, Latent Co-occurrence, Corpus 1. INTRODUCTION Due to the rapid development of internet, the amount of online text is experiencing explosive growth by means of online news media and social media. The rich information within the text data can be utilized to reveal some meaningful trends/topics or the evolution of certain social phenomena, such as the presidential election of the USA. Besides, it can also be exploited for detecting some emergency events or natural disasters, such as 2014 Shanghai Stampede1 and 2014 Kangding earthquake.2 In general, an event is considered as something non-trivial happening at a specific date/time and in a specific location [1]. A topic can be considered as a kind of “abstract” event which consists of some “concrete” events with semantic relatedness, here “events” and “topics”areinter-changeable. Topic detection is a sub-task ofTopic DetectionandTracking (TDT) [2]. It aims at detecting topics or trends from various text corpuses such as online media. Topic detection as a fundamental problem of information retrieval can help the decision makers to efficiently detect meaningful topics. Therefore, it has attracted much attention such as public opinion monitoring, decision supporting and emergency management. Automatic identification of semantic content of documents has become increasingly important due to its effectiveness in many tasks, including information retrieval, information filtering and organization of documents collections in digital libraries. The identification of the topic(s) that a document addresses increases our understanding of that document, the characteristics of the collection as a whole and the interplay between distinct topics. In collections where the temporal ordering of documents is not of importance, studying a snapshot of the collection at any given time is sufficient to deduct as much information as possible about the various topics of interest in the collection. On the other hand, many document collections exhibit temporal relationships that are often times utilized to aid the topic discoveryprocess.Theanalysis of the temporal dimension of collections has become an important field of study in many applications, including weblog topic mining [5], evolution of author and paper networks [4], news event analysis [6] and social interaction of researchers [7,3]. In topic detection,a topicisgenerallyrepresentedas a set of descriptive and collocated terms. Firstly, document clustering techniques are applied in topic detection to cluster content-similar documents and extract keywords from clustered document sets as the representation of the topics. Currently, most of the approaches use various topic models, a type of generative probabilisticmodel suchasLDA [8], pLSA [9] and their extensions, to detect topics. Among them, LDA has been proved to be a powerful algorithm because of its ability on mining the semantic information from the text data. Terms having semantic relations with each other are collected as a topic. Latent Dirichlet Allocation (LDA) [10] has been shown to be a highly effective unsupervised learning methodology for finding distinct topics in document collections. LDA is a generative process that models each document as a mixture of topics where each topic corresponds to a multinomial distribution over words. The learned document-topic and topic-word distributions are then used to identify the best topics for the documents and the most descriptivewordsfor each topic. However, the original formulationofLDAfocuses
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 15 on analyzing a snapshot of collections and views the collection as being generated at a single point in time. However, there are two challenges to existing topic model based approaches. Firstly, as claimed by Sayyadi and Raschid [1], “Current topic modeling methods do not explicitly consider word co-occurrences”. “Co-occurrence” means two terms co-occur in the same document. Unfortunately, due to the fact that“Extendingtopic modeling to include co-occurrence can be a computationally daunting challenge” [1], their proposed graph analytical approach only made an approximation to this extension: they merely took into account co-occurrence information alone while ignoring semantic information. Therefore, how to combine semantic relations and co-occurrence relations to complement each other remains to be a challenge. Secondly, existing approaches usually focus on detectingprominent or distinct topics by mining explicit semantic relations or frequent co-occurrence relations. However, they neglect to uncover latent co-occurrence relations. The inability to uncover latent relations prevents the important but rare topics, which are hidden in large scale and noisy data collections, from being detected. Such important rare topics have two attributes: significant for human decision making but rare that cannot be discovered easily [11]. In other words, their features are commonly implicit or latent. Here gives some examples: the latent omenssuchastheabnormal behaviors of some animals may reveal that the disasters such as earthquake will occur soon; the early incubations of the disease may trigger the subsequent cancer. The reason why such topics are commonly ignored is: they differ from the distinct topics indicating common patterns;besides,they are not outliers or noises in the sense of anomaly detection [11]. 2. RELATED WORK Numerous statistical approaches for modeling text documents have been proposed to model text documents. Probabilistic latentsemantic analysis(pLSA)[12]modelsthe generation of each document through activating multiple topics over words. This model improves upon the singular value decomposition based latent semantic analysis (LSA) [13] which cannot handle polysemy. pLSA, on the other hand, uses a distribution indexed by the trainingdocuments, leading to the fact that the number of parameters to be estimated grows linearly with the number of training documents in thecollection. Thus,practical applicationswith large training documents are susceptible to over fittingwith this model. Latent Dirichlet Allocation (LDA) [10] is another generative model that has become popular in recent years that overcomes the over fitting problem of pLSA by using a Dirichlet for modeling the distribution of topics for each document. To uncover latent co-occurrencerelationsanddiscover important rare topics Chance Discovery (CD) theory and Idea Discovery (ID) theory are used. These are defined as a rare but important event or situation which has a strong impact on human decision making [11,14]. CD and ID as extensions of Knowledge Discoveryhave beenusedtodetect chances by uncovering latent co-occurrencerelationsamong terms. In the ID process, text data is analyzed and converted into a term graph by mining co-occurrence relations, where latent co-occurrence relations are uncovered and visualized to capture chances. Latent co-occurrence relations means there may be no frequent co-occurrence relations between two terms (the terms do not frequently co-occur in thesame documents); but, the two terms can be implicitly related/linked by considering the “context” of one of the terms or other bridge terms. Here “context” denotes neighbors of the term, which are strongly interconnected in the form of a community (cluster). In other words, latent co- occurrence relations between two terms cannot be measured in an isolated term-term view; the context of the term should be taken into account. To address these challenges, We use an integrated approach to integrate semantic information and co-occurrenceinformationamong terms for topic detection. Specifically, the approach fuses multiple types of relations into a uniform term graph by incorporating ID theory with topic modeling method. Firstly, an Idea Discovery algorithm called Idea Graph is adopted to mine co-occurrence relations (especially latent co-occurrence relations) for converting the corpus into a term graph. Then, a semantic relations extraction approach is proposed based on LDA to enrich the graph with semantic information. Lastly, a graph analytical method is presented to exploit the graph for detecting topics. To the best of our knowledge, the coupling of ID and topic model for topic detection has not been seen as follows: (1) It can detect topics more effectively to support human decision making by combing mutually complementary relations: semantic relations and co-occurrence relations. (2) It can mine important rare topics by leveraginglatent co- occurrence relations, which may aid human to perceive the topics with great significance. 3. TOPIC MODELING APPROACHES Topic Detection and Tracking (TDT) is an integral partof the Translingual Information Detection, Extraction, and Summarization (TIDES) program [2]. TDT mainly contains two sub-task: Topic Detection and Topic Tracking. Topic detection (Event detection) aims at detecting novel topics/events from text corpus while topic tracking is dedicated to tracking the evolution of existing topics over temporal dimension. Topic detection has attracted much attention in machine learning, information retrieval and social media modeling [1,11,15]. Specifically,topicdetection can be classified into two types: New Event Detection (NED) and Retrospective Event Detection (RED). NED aims atdetectingnewlyencounteredtopics/events from online text streams [15]. Rill et al. [16] proposed a system to detect emerging political topics in Twitter. The detected topics can be used to extend existing knowledge
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 16 bases for better concept-level sentiment analysis. Hou et al. [17] proposed a multifaceted news analysis approach to detect events from online news. They represented newsasa link-centric heterogeneous network and formalized news analysis and mining task aslink discoveryproblem.Based on that, they presented a unified probabilistic model for topic extraction and inner relationship discovery within events. RED is dedicated to discovering the events from the historical corpus in an offline way [18]. Yang et al. [18] proposed an agglomerative clustering algorithm, named augmented Group Average Clustering, to clusterarticlesinto events. They also employed an iterative bucketing and re- clustering model proposed by Cutting et al. [19] to control the tradeoff between cluster quality and computational efficiency. Zeng and Zhang [20] presented a variable space Markov model for topic detection,whereseveral steps based on space computation and a hierarchal clustering algorithm are proposed to tackle the issuesofdocumentimbalanceand topic transition. We note our paper only addresses RED problem. 3.1 Probabilistic approach As mentioned previously, some RED approaches employ topic modeling, a type of theprobabilistic modeling,todetect topics from the text data. A famous topic model named Latent Dirichlet Allocation (LDA) [8] is a three-level hierarchical Bayesian model. Each document is represented as a finite mixture over an underlying set of latent topics, where each topic is characterized by a distribution over terms. Terms having strong semantic relations with each other are clustered as a topic’s features or representation. There are plenty of improved versions of LDA. For example, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user to generate coherent topics [21–26]. Among which, Chenand Liu [22] proposed the AMC algorithm, topic modeling with automatically generated Must-links and Cannot-links, to incorporate the knowledge automatically mined from the past learning/modeling results, which can help future learning. Xu et al. [26] used a knowledge based topic model to extract implicit features of product reviews for opinion mining task.3 3.3 Graph analytical approach Other approaches on topic detection leverage the graph analytical method to detect the topics within the graph or network. Sayyadi and Raschid [1] proposed a graph analytical approach for topic detection. They used a Key Graph algorithm [27] to convert text data into a term graph based on co-occurrence relations between terms. Then they employed a community detection approach to partition the graph. Eventually, eachcommunityisregardedasa topicand terms within the community are considered as the topic’s features. 4. CONCLUSIONS In this paper, we explored the solution for the problem of ignoring the latent i.e implicit meaning of the terms which co-occur in the documents by integrating semantic relations with implicit or frequent co-occurrence relations and latent co-occurrence relations for important topics detection. In this approach topic modeling is incorporated with Chance Discovery (CD) to capture the semantic relations and co- occurrence relations, which facilitates effective topic detection. Idea Discovery (ID) is to convert the analyzed data into a term graph by mining co-occurrence relations.In future there is a need and extension for this approach for new event detection (NED) to satisfy the need of Big Data era. Finally we can extend this approach to represent the topic hierarchy, where the events as the basic elements of the topic to be detected. REFERENCES [1] H. Sayyadi, L. Raschid , A graph analytical approach for topic detection, ACMTrans. Internet Technol 13 (2) (2013) 4, 23. [2] J. Allan, J.G. Carbonell, G. Doddington, J. Yamron, Y. Yang, Topic detection and tracking pilot study, in: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 194–218. [3] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference onKnowledgediscoveryanddata mining, pages 44–54, New York, NY, USA, 2006. ACM Press. [4] K. Brner, J. T. Maru, and R. L. Goldstone. The simultaneous evolution of author and paper networks. Proceedings of the National Academy of Sciences, 101 Suppl 1:5266–5273, April 2004. [5] Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal themepattern miningon weblogs. In WWW ’06: Proceedings of the 15th international conference on World Wide Web, pages 533–542, New York, NY, USA, 2006. ACM Press. [6] Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 198–207, New York, NY, USA, 2005. ACM Press. [7] D. Zhou, X. Ji, H. Zha, and C. L. Giles. Topic evolution and social interactions: how authors effect research. In CIKM ’06: Proceedings of the 15th ACM
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 06 | June -2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 17 international conference on Information and knowledge management, pages 248–257, New York, NY, USA, 2006. ACM Press. [8] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, Adv. J. Mach. Learn. Res. 3 (2003) 993– 1022. [9] T. Hofmann, Probabilistic latent semantic analysis, UAI (1999) 289–296. [10] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. [11] Y. Ohsawa, Chance discoveries for making decisions in complex real world, New Gen. Comput. 20 (2) (2002) 143–163. [12] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conferenceonResearchand 25 development in information retrieval, pages 50– 57, New York, NY, USA, 1999. ACM Press. [13] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Societyof Information Science, 41(6):391–407, 1990. [14] H.Wang, Y. Ohsawa, Y. Nishihara, Innovationsupport system for creative product design based on chance discovery, Expert Syst. Appl. 39 (5) (2012) 4890– 4897. [15] J. Allan, R. Papka, V. Lvrenko , On-line new event detection and tracking, SIGIR (1998) 37–45. [16] S Rill, D Reinel, J. Scheidt, et al., Politwi: early detection of emerging political topics on twitter and the impact on concept-level sentiment analysis, Knowl. Based Syst. 69 (2014) 24–33. [17] L. Hou, J. Li, Z. Wang, et al., NewsMiner: Multifaceted news analysis for event search,Knowl. BasedSyst.76 (2015) 17–29. [18] Y. Yang, T. Pierce, J.G. Carbonell, A study on retrospective and on-line event detection, SIGIR (1998) 28–36. [19] D.R. Cutting, D.R. Karger, J.O. Pedersen, 803 J.W. Tukey, Scatter/gather: A cluster-based approach to browsing large document collections, SIGIR (1992) 318–329. [20] J. Zeng, S. Zhang, Variable space hidden Markov model for topic detection and analysis,Knowl.Based Syst. 20 (7) (2007) 607–613. [21] D. Andrzejewski, X. Zhu, M. Craven, Incorporating domain knowledge into topic modeling via Dirichlet forest priors, ICML (2009) 25–32. [22] Z. Chen, B. Liu,Mining topics in documents: standing on the shoulders of bigdata,KDD(2014)1116–1125. [23] Z. Chen, A.Mukherjee, B.Liu,M.Hsu,M.Castellanos,R. Ghosh, Exploiting domain knowledge in aspect extraction, EMNLP (2013) 1655–1667. [24] A. Mukherjee, B. Liu, Aspect extraction through semi-supervised modeling, ACL (2012) 339–348. [25] X. Fu, K. Yang, J.Z. Huang, L. Cui, Dynamic non- parametric joint sentiment topic mixture model, Knowl. Based Syst. 82 (2015) 102–114. [26] H. Xu, F. Zhang, W. Wang, Implicit feature identification in Chinese reviews using explicit topic mining model, Knowl. Based Syst. 76 (2015) 166– 175. [27] Y. Ohsawa, N.E. Benson, Y. Masahiko, KeyGraph: automatic indexing by co-occurrencegraphbasedon building construction metaphor, ADL (1998) 12–18.