SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 680
Mining Users Rare Sequential Topic Patterns from Tweets based on
Topic Extraction
Bhakti Patil1, Sachin Takmare2, Rahul Mirajkar3, Pramod Kharade4
1Student, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg,
Kolhapur, Maharashtra, India.
2,3,4 Professor, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg,
Kolhapur, Maharashtra, India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Twitter is an online news and social networking
service where users post and interact with messages, "tweets"
spontaneously. Most of the existing works are dedicated to
discovering the abstract "topics" that occur in a collection of
documents and creation of discrete topic. It means when a
specific user publishes successive documents then successive
relation between topics is totally ignored. In this paper, a
different approach for detecting users’ Sequential Topic
Patterns is proposed which consequentially characterizesand
detects personalized and abnormal behaviors of users and
then we prepare the problem of Mining Users Rare Sequential
Topic Patterns(URSTP) from Tweets. URSTPs are rare for all
users but relatively frequent for some specific users, so this
approach can be applied in many real-life scenarios, such as
real-time monitoring on abnormal user behaviors. Wepresent
a group of algorithms to solve suchinnovative miningproblem
using different phases such as preprocessing to extract
probabilistic topics, identifying sessions for different users,
generating all the STP candidates and selecting URSTPs by
making user-aware rarity analysis on derived STPs.
Experiment show that our approach can significant to find
special users and interpretable URSTPs, which significantly
indicate users’ characteristics.
Key Words: Sequential topics, Web mining, Topic
Extraction, Keyword Extraction, frequent patterns,
clustering.
1. INTRODUCTION
Social networking servicesuchasfacebook,Twitter,
LinkedIn creates an environment where user could spend a
lot of time on it and use it for different purposes. Based on
this interaction between users, we have a huge amount of
data for each individual user. Documents of such services
focus on some particular topic. Topic provides users
characteristics. Text mining is one and only way to mine the
piece of information for extracting topics. Generally some
probabilistic topic models such as LDA [1], classical PLSI[5]
and their extensions[3],[4],[6],[7],[8],[9] are used for topic
extraction.
In the literature most of the researchers
concentrates on adaptation of single topic to identify and
imagine social events and user behaviors [10], [11], [12].
Some researchers studied relation between the different
topics of successive documents published by same user
successively where some hidden but important information
behaviors has been neglected which uncovers personalized
behaviors of that user.
In this paper we mainly concentrates on relation
mainly between theextractedsequential topicsreferthemas
Sequential Topic Patterns (STP) that indirectly reflects user
behaviors. For a document stream some STPs may occur
frequently and so it reflects common behaviors of involved
users. But away from that, there may still exists some other
patterns which are infrequentforthegeneral population, but
occur relatively frequent for some specific user or some
specific group of users. We refer themUser-awareRareSTPs
(URSTPs). Compared to frequent patterns, discovering rare
patterns is interesting andimportant.Basically,itformulates
a new problem for rare event mining, so that it is possible to
characterize personalized and abnormal behaviors for
special users’ behavior.
In our case STPs can characterize complete
browsing behaviors of readers. Then compared with
statistical methods, miningURSTPs canbettertofind special
interests and browsing habits of users, andisthuscapable to
give effective and context-aware recommendation for them.
Our approach will concentrate on published document
streams.
Solving such important problem of mining URSTPs
in document streams, new technical provocations are raised
and will be solved in this paper. First, the input of the
approach is a text stream, so existing techniques of
probabilistic databases cannot be directly applied to solve
this problem. A preprocessing phase is required and
important thing to get conceptual and probabilistic
descriptions of documents by topic extraction, and then to
identify complete and repeated liveliness ofusersbysession
identification. Second, in case of the real-time requirements
in many applications, both the precision and the
effectiveness of mining algorithms are important, especially
for the probability computation process. Third, unlike from
frequent patterns,theuseraware rare patterncaneffectively
characterize most of personalized and abnormal behaviors
of users and can applied to different application scenarios.
And correspondingly, unsupervised mining algorithms for
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 681
this kind of rare patterns need to be designed in a manner
different from existing frequent pattern mining algorithms.
2. LITERATURE REVIEW
Topic mining in document collections has been extensively
studied in the literature.Topic Detectionand Tracking(TDT)
task covers detection and tracking of topics (events)in news
based on keywords. A lot of probabilistic generative models
for extracting topics from documents were also proposed,
such as LDA [1], PLSI and their extension[2] also models for
short texts like Twitter-LDA [3].
LDA is a three-level hierarchical Bayesian model .Each item
of a collection is modeled as a finite mixture over an
underlying set of topics. Each topic is modeled as an infinite
mixture. Instead of text modeling, the topic probabilities
provide an explicit representation of a document. Blei, Ng,
and Jordan have presented an efficient approximate
inference techniques based on variational methods and an
EM algorithm for empirical Bayes parameter estimation.
Li and McCallum proposed pachinkoallocationmodel (PAM)
that captures arbitrary and sparse correlations between
topics using a directed acyclic graph (DAG) which is not
possible in case of LDA. The child node of the DAG shows
individual words in the vocabulary, while each interiornode
represents a correlation among its children, which may be
words or other interior nodes (topics).
Zhao, Jiang, Weng, He, Lim, Yan, and Li have compared the
tweets with a traditional news medium by using
unsupervised topic modeling. They discover topics through
Twitter-LDA model then they compare Twitter topics with
topics of news using text mining techniques. Also they
concentrate on relation between tweets and retweets.
In case if real application, the content ofdocumentcollection
is temporal so various dynamic topic models have been
proposed to discover topics over time in document streams
and then offline social events are predicated. One of that is
dynamic topic model proposed by Blei and Lafferty in which
it uses state space to represent the topics. Approximate
posterior inference over the latent topics carried out by
variational approximations based on Kalman filters and
nonparametric wavelet regression. Dynamic topic models
provide a qualitative perspective into the contents of a large
document collection.
The important problem in data mining is mining Sequential
pattern. The frequency of a sequential pattern is evaluated
by using support. The mining algorithms like PrefixSpan,
FreeSpan, SPADE have been proposed based on support.
These algorithms find outsfrequent sequential patterns with
support values are not less than a user-defined threshold,
and then used by SLPMiner to deal with length-decreasing
support constraints. Muzammal et al. concentrated on
sequence-level uncertainty in sequential databases, and
proposed methods to calculate the frequency of a sequential
pattern based on expected support, using generate-and-test
or pattern-growth. This paper is an extension of our
previous work.
3. PRELIMINARIES
At first, we define some basic concepts in a usual way.
Definition 1 (Document)
A text document d in a document collection D consists of a
many number of words from a fixed vocabulary V = {w1, w2,
......., w|v|}. Document can be represented d = {c (d, w)} where
wv, c denotes the occurrence number of the word w in d.
Definition 2 (Topic)
A topic z in the text collection D is represented by a
probabilistic distribution of wordsinthegiven vocabularyV.
Definition 3 (Topic-Level Document)
Given an original document dD and a topic set T, the
corresponding topic-level document tdd is defined as a set
of topic-probability pairs, in the form of{(z,p(z|d))} where
zT.
Definition 4 (Document Stream)
A document stream is defined as a set that consists of
sequence of document number, a document published by
user ui at time ti on a specific website, and ti tj for all i j.
Definition 5 (Sequential Topic Pattern)
A Sequential Topic Pattern (STP) is defined as a topic
sequence of topics i.e. [z1, z2, ....., zn] where topic z T.
Definition 6 (Session)
A session s is defined as a subsequence of topic level
document stream associated with thesameuser,i.e.itisa set
of topic level document with its associated time for different
user.
Definition 8 (Support of STP)
It is defined as a probability of topics with respect to
sessions.
Definition 9(User-Aware Rare STP)
An STP a is called a User-aware Rare STP (URSTP) if and
only if both scaled support is less than or equal to scaled
support threshold and relativeraritygreaterthanor equal to
relative rarity threshold hold for some user u.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 682
4. MINING USERS RARE SEQUENTIAL TOPIC
PATTERNS
At first we have list of users’ tweets collected using some
API. The one that we used was... Topic detection from the
whole number of documents needs some pre-processing
initially. At the first step, we will remove stop words and
repeated posts. Stop words are such as [“at”,” the”,” how”
etc.]. We now have list of cleaned twitter posts or we have
list of cleaned documents. Tweets = [list of tweets of all
users]. Each tweet has a list of posts. Therefore, for each
user/tweet we are removing repeated words and stop
words. This new list of tweets will be the input of keyword
extraction algorithm to extract keywords. Fig.-1 shows
different topic extraction methods.
Fig -1: Keyword Extraction Methods
After that with cosine similarity, we cluster keywords. It
means that we cluster posts, which are similar to eachother.
This approach is trained to work as unsupervised topic
detection. Now, new tweets will be the input to our next
step topic extraction. So, the output will be the index of the
lookup table, which gives the list of topicsasshowninFig–2.
Fig -2: Overview of Keyword Extraction Process
Naïve Bayes classifiers which is a simple probabilistic
classifiers dependent on using Bayes' theorem with strong
(naive) independenceassumptions betweenthefeatures. We
use such a baseline method for text categorization. This
popular method solves problem of judging documents as
belonging to one category or the other such as sports or
politics, etc. with word frequencies. Naive Bayes classifiers
requires a number of parameters linear in the number of
variables i.e. features/predictors.
Naive Bayes model assigns class labels toprobleminstances,
represented as vectors of feature values, where the class
labels are drawn from some finite set. All naive Bayes
classifiers suppose that the value of a particular feature is
independent of the value of any other feature,giventheclass
variable. A naive Bayes classifier considers each of these
features to contribute independently to the probability,
regardless of any possible correlations .It works in case of a
small number of training data and estimates the parameters
required for classification. Using Bayesian probability
terminology,
Posterior = (likelihood * prior) / evidence
We use Apriori algorithmto operateondatabasescontaining
transactions (for example, collections of items bought by
customers, or details of a website frequentation. Each
transaction is a set of items (an itemset).Givena thresholdC,
the algorithm identifies the item sets which are subsets of at
least C transactions in the database.
In this method frequent subsets are carried out oneitemata
time and groups of candidates are trying out against the
data. Candidate item sets are counted by using breadth-first
search and a Hash tree structure. Itgeneratescandidateitem
sets of length K from item sets of length k-1. Then it
minimizes the candidates which have an infrequent sub
pattern.
Now we propose an approachtominingraresequential topic
patterns in document streams. The main processing
framework is shown in Fig.-3.
Fig -3: Processing framework of URSTP mining
It consists of three main phases. At first, text documents are
collected from some micro-blog sites or forums (in our case
we crawled tweets from Twitter using Twitter API ),anduse
a document stream as the input of our approach. Then, in
preprocessing phase, we first remove useless symbols such
as “@”, “#”, URL in the input tweet and stop words. Then
original stream is transformed to a topic level document
stream and then divided into many sessions to identify
complete user behaviors. Finally and most importantly, we
find out all the STP candidates in thedocumentstreamfor all
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 683
users, and finally pick out important URSTPs associated to
specific users by user-aware rarity analysis.
5. RESULT ANALYSIS
In this section, we apply our mining URSTP mechanism to
find out rare topics. In order to simulate the proposed
architecture, we implemented approach by using minimum
1GB RAM and 60GB (or above) hard disk. The results are
carried out with different file size. As the file size increases,
the time required for execution increases in both naïve
Bayes algorithm and Apriori algorithm. But as compare to
Apriori Algorithm, Naïve Bayes requires less time. The
results are shown in Fig. 1 and 2.
Chart -1: Time costs of the two algorithms
Chart -2: Performance Time Difference
Fig. 1 we compare, the time required to execute Naïve Bayes
Algorithm and Apriori Algorithm of file size ranging from
1KB to 1000KB.
In Fig. 2 we compare, the time required for Sequential Topic
Pattern and User Aware Rare Sequential Topic Patterns to
find out the same from 1 KB to 1000 KB file size.
6. CONCLUSIONS
Twitter Tweets capability of social networking sites isreally
high. In order to tackle this ability of social networking sites,
we propose some new methods. At first we extract users’
posts through API then we extract appropriate topics
depending on certain keywords. It is then shown that by
creating clusters based on keywords which are helpful in
easier detection of topicsfromusers’tweets.These extracted
topics are then advantageous to real-time monitoring on
abnormal behaviors of users. Also we proposed an effective
approach to discover special users and interesting URSTPs
from document streams i.e. user tweet, which captures
users’ personalized and abnormal behaviors and
characteristics of users’. We are interested in the dual
problem, i.e., discovering sequential topicpatternsoccurring
frequently on the whole, but relativelyrareforspecificusers.
REFERENCES
[1] D. Blei, A. Ng, and M. Jordan, “Latent dirichlet
allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–
1022, 2003.
[2] W. Li and A. McCallum, “Pachinko allocation: DAG-
structured mixture models of topic correlations,”in
Proc. ACM Int. Conf. Mach. Learn.,2006,vol.148,pp.
577–584.
[3] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,
and X. Li, “Comparing Twitter and traditional media
using topic models,” in Proc. 33rd Eur. Conf. Adv.
Inf. Retrieval, 2011, pp. 338–349.
[4] D. M. Blei and J. D. Lafferty, “Dynamic topic models,”
in Proc. ACM Int. Conf. Mach. Learn., 2006, pp. 113–
120.
[5] T. Hofmann, “Probabilistic latent semantic
indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf.
Res. Develop. Inf. Retrieval, 1999, pp. 50–57.
[6] D. Blei and J. Lafferty, “Correlated topic models,”
Adv. Neural Inf. Process. Syst., vol. 18, pp. 147–154,
2006.
[7] L. Hong and B. D. Davison, “Empirical study of topic
modeling in Twitter,” in Proc. 1st Workshop Soc.
Media Anal., 2010, pp. 80–88.
[8] Z. Hu, H. Wang, J. Zhu, M. Li, Y. Qiao, and C. Deng,
“Discovery of rare sequential topic patterns in
document stream,” in Proc. SIAM Int. Conf. Data
Mining, 2014, pp. 533–541.
[9] A. Krause, J. Leskovec, and C. Guestrin, “Data
association for topic intensity tracking,” in Proc.
ACM Int. Conf. Mach. Learn., 2006, pp. 497–504.
[10] Q. Mei, C. Liu, H. Su, and C. Zhai, “A
probabilistic approach to spatiotemporal theme
pattern mining on weblogs,” in Proc. 15th Int. Conf.
World Wide Web, 2006, pp. 533–542.
[11] W. Dou, X. Wang, D. Skau, W. Ribarsky, and
M. X. Zhou, “LeadLine: Interactive visual analysis of
text data through event identification and
exploration,” in Proc. IEEE Conf. Vis. Anal. Sci.
Technol., 2012, pp. 93–102.
[12] G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu, “Parameter
free bursty events detection in text streams,” in Proc.
31st Int. Conf. Very Large Data Bases, 2005, pp. 181–
192.

More Related Content

What's hot

Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlescsandit
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...cscpconf
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Miningijsc
 
Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...ijtsrd
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text miningIRJET Journal
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET Journal
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniquesijnlc
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalDustin Smith
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 

What's hot (19)

Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articles
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
 
E43022023
E43022023E43022023
E43022023
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniques
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 

Similar to Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction

IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPIRJET Journal
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
Prediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersPrediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersIRJET Journal
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesKausar Mukadam
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
 
A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...IJECEIAES
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...IRJET Journal
 
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET Journal
 
Dr31564567
Dr31564567Dr31564567
Dr31564567IJMER
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase IJECEIAES
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringIRJET Journal
 

Similar to Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction (20)

IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
Prediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersPrediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet Users
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
 
A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
 
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
 
Dr31564567
Dr31564567Dr31564567
Dr31564567
 
M045067275
M045067275M045067275
M045067275
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information SystemsAnge Felix NSANZIYERA
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxhublikarsn
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 

Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 680 Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction Bhakti Patil1, Sachin Takmare2, Rahul Mirajkar3, Pramod Kharade4 1Student, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg, Kolhapur, Maharashtra, India. 2,3,4 Professor, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg, Kolhapur, Maharashtra, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Twitter is an online news and social networking service where users post and interact with messages, "tweets" spontaneously. Most of the existing works are dedicated to discovering the abstract "topics" that occur in a collection of documents and creation of discrete topic. It means when a specific user publishes successive documents then successive relation between topics is totally ignored. In this paper, a different approach for detecting users’ Sequential Topic Patterns is proposed which consequentially characterizesand detects personalized and abnormal behaviors of users and then we prepare the problem of Mining Users Rare Sequential Topic Patterns(URSTP) from Tweets. URSTPs are rare for all users but relatively frequent for some specific users, so this approach can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. Wepresent a group of algorithms to solve suchinnovative miningproblem using different phases such as preprocessing to extract probabilistic topics, identifying sessions for different users, generating all the STP candidates and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiment show that our approach can significant to find special users and interpretable URSTPs, which significantly indicate users’ characteristics. Key Words: Sequential topics, Web mining, Topic Extraction, Keyword Extraction, frequent patterns, clustering. 1. INTRODUCTION Social networking servicesuchasfacebook,Twitter, LinkedIn creates an environment where user could spend a lot of time on it and use it for different purposes. Based on this interaction between users, we have a huge amount of data for each individual user. Documents of such services focus on some particular topic. Topic provides users characteristics. Text mining is one and only way to mine the piece of information for extracting topics. Generally some probabilistic topic models such as LDA [1], classical PLSI[5] and their extensions[3],[4],[6],[7],[8],[9] are used for topic extraction. In the literature most of the researchers concentrates on adaptation of single topic to identify and imagine social events and user behaviors [10], [11], [12]. Some researchers studied relation between the different topics of successive documents published by same user successively where some hidden but important information behaviors has been neglected which uncovers personalized behaviors of that user. In this paper we mainly concentrates on relation mainly between theextractedsequential topicsreferthemas Sequential Topic Patterns (STP) that indirectly reflects user behaviors. For a document stream some STPs may occur frequently and so it reflects common behaviors of involved users. But away from that, there may still exists some other patterns which are infrequentforthegeneral population, but occur relatively frequent for some specific user or some specific group of users. We refer themUser-awareRareSTPs (URSTPs). Compared to frequent patterns, discovering rare patterns is interesting andimportant.Basically,itformulates a new problem for rare event mining, so that it is possible to characterize personalized and abnormal behaviors for special users’ behavior. In our case STPs can characterize complete browsing behaviors of readers. Then compared with statistical methods, miningURSTPs canbettertofind special interests and browsing habits of users, andisthuscapable to give effective and context-aware recommendation for them. Our approach will concentrate on published document streams. Solving such important problem of mining URSTPs in document streams, new technical provocations are raised and will be solved in this paper. First, the input of the approach is a text stream, so existing techniques of probabilistic databases cannot be directly applied to solve this problem. A preprocessing phase is required and important thing to get conceptual and probabilistic descriptions of documents by topic extraction, and then to identify complete and repeated liveliness ofusersbysession identification. Second, in case of the real-time requirements in many applications, both the precision and the effectiveness of mining algorithms are important, especially for the probability computation process. Third, unlike from frequent patterns,theuseraware rare patterncaneffectively characterize most of personalized and abnormal behaviors of users and can applied to different application scenarios. And correspondingly, unsupervised mining algorithms for
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 681 this kind of rare patterns need to be designed in a manner different from existing frequent pattern mining algorithms. 2. LITERATURE REVIEW Topic mining in document collections has been extensively studied in the literature.Topic Detectionand Tracking(TDT) task covers detection and tracking of topics (events)in news based on keywords. A lot of probabilistic generative models for extracting topics from documents were also proposed, such as LDA [1], PLSI and their extension[2] also models for short texts like Twitter-LDA [3]. LDA is a three-level hierarchical Bayesian model .Each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is modeled as an infinite mixture. Instead of text modeling, the topic probabilities provide an explicit representation of a document. Blei, Ng, and Jordan have presented an efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. Li and McCallum proposed pachinkoallocationmodel (PAM) that captures arbitrary and sparse correlations between topics using a directed acyclic graph (DAG) which is not possible in case of LDA. The child node of the DAG shows individual words in the vocabulary, while each interiornode represents a correlation among its children, which may be words or other interior nodes (topics). Zhao, Jiang, Weng, He, Lim, Yan, and Li have compared the tweets with a traditional news medium by using unsupervised topic modeling. They discover topics through Twitter-LDA model then they compare Twitter topics with topics of news using text mining techniques. Also they concentrate on relation between tweets and retweets. In case if real application, the content ofdocumentcollection is temporal so various dynamic topic models have been proposed to discover topics over time in document streams and then offline social events are predicated. One of that is dynamic topic model proposed by Blei and Lafferty in which it uses state space to represent the topics. Approximate posterior inference over the latent topics carried out by variational approximations based on Kalman filters and nonparametric wavelet regression. Dynamic topic models provide a qualitative perspective into the contents of a large document collection. The important problem in data mining is mining Sequential pattern. The frequency of a sequential pattern is evaluated by using support. The mining algorithms like PrefixSpan, FreeSpan, SPADE have been proposed based on support. These algorithms find outsfrequent sequential patterns with support values are not less than a user-defined threshold, and then used by SLPMiner to deal with length-decreasing support constraints. Muzammal et al. concentrated on sequence-level uncertainty in sequential databases, and proposed methods to calculate the frequency of a sequential pattern based on expected support, using generate-and-test or pattern-growth. This paper is an extension of our previous work. 3. PRELIMINARIES At first, we define some basic concepts in a usual way. Definition 1 (Document) A text document d in a document collection D consists of a many number of words from a fixed vocabulary V = {w1, w2, ......., w|v|}. Document can be represented d = {c (d, w)} where wv, c denotes the occurrence number of the word w in d. Definition 2 (Topic) A topic z in the text collection D is represented by a probabilistic distribution of wordsinthegiven vocabularyV. Definition 3 (Topic-Level Document) Given an original document dD and a topic set T, the corresponding topic-level document tdd is defined as a set of topic-probability pairs, in the form of{(z,p(z|d))} where zT. Definition 4 (Document Stream) A document stream is defined as a set that consists of sequence of document number, a document published by user ui at time ti on a specific website, and ti tj for all i j. Definition 5 (Sequential Topic Pattern) A Sequential Topic Pattern (STP) is defined as a topic sequence of topics i.e. [z1, z2, ....., zn] where topic z T. Definition 6 (Session) A session s is defined as a subsequence of topic level document stream associated with thesameuser,i.e.itisa set of topic level document with its associated time for different user. Definition 8 (Support of STP) It is defined as a probability of topics with respect to sessions. Definition 9(User-Aware Rare STP) An STP a is called a User-aware Rare STP (URSTP) if and only if both scaled support is less than or equal to scaled support threshold and relativeraritygreaterthanor equal to relative rarity threshold hold for some user u.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 682 4. MINING USERS RARE SEQUENTIAL TOPIC PATTERNS At first we have list of users’ tweets collected using some API. The one that we used was... Topic detection from the whole number of documents needs some pre-processing initially. At the first step, we will remove stop words and repeated posts. Stop words are such as [“at”,” the”,” how” etc.]. We now have list of cleaned twitter posts or we have list of cleaned documents. Tweets = [list of tweets of all users]. Each tweet has a list of posts. Therefore, for each user/tweet we are removing repeated words and stop words. This new list of tweets will be the input of keyword extraction algorithm to extract keywords. Fig.-1 shows different topic extraction methods. Fig -1: Keyword Extraction Methods After that with cosine similarity, we cluster keywords. It means that we cluster posts, which are similar to eachother. This approach is trained to work as unsupervised topic detection. Now, new tweets will be the input to our next step topic extraction. So, the output will be the index of the lookup table, which gives the list of topicsasshowninFig–2. Fig -2: Overview of Keyword Extraction Process Naïve Bayes classifiers which is a simple probabilistic classifiers dependent on using Bayes' theorem with strong (naive) independenceassumptions betweenthefeatures. We use such a baseline method for text categorization. This popular method solves problem of judging documents as belonging to one category or the other such as sports or politics, etc. with word frequencies. Naive Bayes classifiers requires a number of parameters linear in the number of variables i.e. features/predictors. Naive Bayes model assigns class labels toprobleminstances, represented as vectors of feature values, where the class labels are drawn from some finite set. All naive Bayes classifiers suppose that the value of a particular feature is independent of the value of any other feature,giventheclass variable. A naive Bayes classifier considers each of these features to contribute independently to the probability, regardless of any possible correlations .It works in case of a small number of training data and estimates the parameters required for classification. Using Bayesian probability terminology, Posterior = (likelihood * prior) / evidence We use Apriori algorithmto operateondatabasescontaining transactions (for example, collections of items bought by customers, or details of a website frequentation. Each transaction is a set of items (an itemset).Givena thresholdC, the algorithm identifies the item sets which are subsets of at least C transactions in the database. In this method frequent subsets are carried out oneitemata time and groups of candidates are trying out against the data. Candidate item sets are counted by using breadth-first search and a Hash tree structure. Itgeneratescandidateitem sets of length K from item sets of length k-1. Then it minimizes the candidates which have an infrequent sub pattern. Now we propose an approachtominingraresequential topic patterns in document streams. The main processing framework is shown in Fig.-3. Fig -3: Processing framework of URSTP mining It consists of three main phases. At first, text documents are collected from some micro-blog sites or forums (in our case we crawled tweets from Twitter using Twitter API ),anduse a document stream as the input of our approach. Then, in preprocessing phase, we first remove useless symbols such as “@”, “#”, URL in the input tweet and stop words. Then original stream is transformed to a topic level document stream and then divided into many sessions to identify complete user behaviors. Finally and most importantly, we find out all the STP candidates in thedocumentstreamfor all
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 683 users, and finally pick out important URSTPs associated to specific users by user-aware rarity analysis. 5. RESULT ANALYSIS In this section, we apply our mining URSTP mechanism to find out rare topics. In order to simulate the proposed architecture, we implemented approach by using minimum 1GB RAM and 60GB (or above) hard disk. The results are carried out with different file size. As the file size increases, the time required for execution increases in both naïve Bayes algorithm and Apriori algorithm. But as compare to Apriori Algorithm, Naïve Bayes requires less time. The results are shown in Fig. 1 and 2. Chart -1: Time costs of the two algorithms Chart -2: Performance Time Difference Fig. 1 we compare, the time required to execute Naïve Bayes Algorithm and Apriori Algorithm of file size ranging from 1KB to 1000KB. In Fig. 2 we compare, the time required for Sequential Topic Pattern and User Aware Rare Sequential Topic Patterns to find out the same from 1 KB to 1000 KB file size. 6. CONCLUSIONS Twitter Tweets capability of social networking sites isreally high. In order to tackle this ability of social networking sites, we propose some new methods. At first we extract users’ posts through API then we extract appropriate topics depending on certain keywords. It is then shown that by creating clusters based on keywords which are helpful in easier detection of topicsfromusers’tweets.These extracted topics are then advantageous to real-time monitoring on abnormal behaviors of users. Also we proposed an effective approach to discover special users and interesting URSTPs from document streams i.e. user tweet, which captures users’ personalized and abnormal behaviors and characteristics of users’. We are interested in the dual problem, i.e., discovering sequential topicpatternsoccurring frequently on the whole, but relativelyrareforspecificusers. REFERENCES [1] D. Blei, A. Ng, and M. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993– 1022, 2003. [2] W. Li and A. McCallum, “Pachinko allocation: DAG- structured mixture models of topic correlations,”in Proc. ACM Int. Conf. Mach. Learn.,2006,vol.148,pp. 577–584. [3] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li, “Comparing Twitter and traditional media using topic models,” in Proc. 33rd Eur. Conf. Adv. Inf. Retrieval, 2011, pp. 338–349. [4] D. M. Blei and J. D. Lafferty, “Dynamic topic models,” in Proc. ACM Int. Conf. Mach. Learn., 2006, pp. 113– 120. [5] T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 1999, pp. 50–57. [6] D. Blei and J. Lafferty, “Correlated topic models,” Adv. Neural Inf. Process. Syst., vol. 18, pp. 147–154, 2006. [7] L. Hong and B. D. Davison, “Empirical study of topic modeling in Twitter,” in Proc. 1st Workshop Soc. Media Anal., 2010, pp. 80–88. [8] Z. Hu, H. Wang, J. Zhu, M. Li, Y. Qiao, and C. Deng, “Discovery of rare sequential topic patterns in document stream,” in Proc. SIAM Int. Conf. Data Mining, 2014, pp. 533–541. [9] A. Krause, J. Leskovec, and C. Guestrin, “Data association for topic intensity tracking,” in Proc. ACM Int. Conf. Mach. Learn., 2006, pp. 497–504. [10] Q. Mei, C. Liu, H. Su, and C. Zhai, “A probabilistic approach to spatiotemporal theme pattern mining on weblogs,” in Proc. 15th Int. Conf. World Wide Web, 2006, pp. 533–542. [11] W. Dou, X. Wang, D. Skau, W. Ribarsky, and M. X. Zhou, “LeadLine: Interactive visual analysis of text data through event identification and exploration,” in Proc. IEEE Conf. Vis. Anal. Sci. Technol., 2012, pp. 93–102. [12] G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu, “Parameter free bursty events detection in text streams,” in Proc. 31st Int. Conf. Very Large Data Bases, 2005, pp. 181– 192.