IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Tweet segmentation and its application to named entity recognitionShakas Technologies
Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets.
Tweet segmentation and its application to named entity recognition.LeMeniz Infotech
Tweet segmentation and its application to named entity recognition.
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Visit : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
Nearly 70% of people are concerned about the propagation of fake news. This paper aims to detect fake news in online articles through the use of semantic features and various machine learning techniques. In this research, we investigated recurrent neural networks vs. the naive bayes classifier and random forest classifiers using five groups of linguistic features. Evaluated with real or fake dataset from kaggle.com, the best performing model achieved an accuracy of 95.66% using bigram features with the random forest classifier. The fact that bigrams outperform unigrams, trigrams, and quadgrams show that word pairs as opposed to single words or phrases best indicate the authenticity of news.
Tweet segmentation and its application to named entity recognitionShakas Technologies
Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets.
Tweet segmentation and its application to named entity recognition.LeMeniz Infotech
Tweet segmentation and its application to named entity recognition.
Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Visit : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
Nearly 70% of people are concerned about the propagation of fake news. This paper aims to detect fake news in online articles through the use of semantic features and various machine learning techniques. In this research, we investigated recurrent neural networks vs. the naive bayes classifier and random forest classifiers using five groups of linguistic features. Evaluated with real or fake dataset from kaggle.com, the best performing model achieved an accuracy of 95.66% using bigram features with the random forest classifier. The fact that bigrams outperform unigrams, trigrams, and quadgrams show that word pairs as opposed to single words or phrases best indicate the authenticity of news.
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEMijnlc
Each argument begins with a conclusion, which is followed by one or more premises supporting the
conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises
support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left
implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality
warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence tosequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune
the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for
producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and
Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of
warrant generation, our model generates a greater variety of warrants than other baseline models. The
experimental results validate the effectiveness of our proposed hybrid model for generating warrants.
In this research work we have built a systems which pulls tweets , pre-process each tweet to remove unwanted artifacts and the gives stemming treatment to each token in the tweet to finally get the score value which is calculated on the basis of numerical strength expressed in mathematical expression for intimacy, trust , social distance between the parties interacting on digital media twitter. The nature of relationship with respect their tie is calculated on the bases of multivariate linear regression analysis and to further improve on this method which have conducted polynomial regression for finding a better and more accurate way to data fit the twitter tie dataset which is apparent from the statistical test value of Coefficient of determinant and other statistical test .The values of y-intercepts are interpreted and illustrated in results section .
Recently, fake news has been incurring many problems to our society. As a result, many researchers have been working on identifying fake news. Most of the fake news detection systems utilize the linguistic feature of the news. However, they have difficulty in sensing highly ambiguous fake news which can be detected only after identifying meaning and latest related information. In this paper, to resolve this problem, we shall present a new Korean fake news detection system using fact DB which is built and updated by human's direct judgement after collecting obvious facts. Our system receives a proposition, and search the semantically related articles from Fact DB in order to verify whether the given proposition is true or not by comparing the proposition with the related articles in fact DB. To achieve this, we utilize a deep learning model, Bidirectional Multi Perspective Matching for Natural Language Sentence BiMPM , which has demonstrated a good performance for the sentence matching task. However, BiMPM has some limitations in that the longer the length of the input sentence is, the lower its performance is, and it has difficulty in making an accurate judgement when an unlearned word or relation between words appear. In order to overcome the limitations, we shall propose a new matching technique which exploits article abstraction as well as entity matching set in addition to BiMPM. In our experiment, we shall show that our system improves the whole performance for fake news detection. Prasanth. K | Praveen. N | Vijay. S | Auxilia Osvin Nancy. V ""Fake News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30014.pdf
Paper Url : https://www.ijtsrd.com/engineering/information-technology/30014/fake-news-detection-using-machine-learning/prasanth-k
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...IIIT Hyderabad
Social network and publishing platforms, such as Twitter, support the concept of verification. Veri-
fied accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tan-
tamount to endorsement. However, a significant body of prior work suggests that possessing a verified
status symbolizes enhanced credibility in the eyes of the platform audience. As a result, such a station
is highly coveted among public figures and influencers. Hence, we attempt to characterize the network
of verified users on Twitter and compare the results to similar analyses performed for the entire Twit-
ter network. We extracted the whole graph of verified users on Twitter (as of July 2018) and obtained
231,246 English user-profiles and 79,213,811 connections. Subsequently, in the network analysis, we
found that the sub-graph of verified users mirrors the full Twitter users graph in some aspects, such as
possessing a short diameter. However, our findings contrast with earlier results on multiple fronts, such
as the possession of a power-law out-degree distribution, slight dissortativity, and a significantly higher
reciprocity rate, as elucidated in the paper. Moreover, we attempt to gauge the presence of salient com-
ponents within this sub-graph and detect the absence of homophily with respect to popularity, which
again is in stark contrast to the full Twitter graph. Finally, we demonstrate stationarity in the time series
of verified user activity levels.
It is in this backdrop that we attempt to deconstruct the extent to which Twitter’s verification policy
mingles the notions of authenticity and authority. To this end, we seek to unravel the aspects of a user’s
profile, which likely engender or preclude verification. The aim of the paper is two-fold: First, we test
if discerning the verification status of a handle from profile metadata and content features is feasible.
Second, we unravel the characteristics which have the most significant bearing on a handle’s verification
status. We augmented our dataset with all the 494 million tweets of the aforementioned users over a one
year collection period along with their temporal social reach and activity characteristics. Our proposed
models are able to reliably identify verification status (Area under curve AUC > 99%). We show that
the number of public list memberships, presence of neutral sentiment in tweets and an authoritative
language style are the most pertinent predictors of verification status.
To the best of our knowledge, this work represents the first quantitative attempt at characterizing
verified users on Twitter and also the first attempt at discerning and classifying verification worthy users
on Twitter.
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
This is the final report for Networks & Data Mining Techniques project focusing on mining social network to estimate public opinion about entities and associated keywords. This project mines Twitter for recent feeds and analyzes them to estimate sentiment score, discussed entity and describing keywords in each tweet. This data is then exploited to elicit overall sentiment associated with each entity. Entities and keywords extracted is also used to form an entity-keyword bigraph. This graph is further used to detect entity communities and keywords found within those communities. Presented implementation works in linear time.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
Evolving swings from social streams is receiving renewed interest and it is motivated by the growth of social
media and social streams. Non-conventional based approaches can be appropriate which include text, images,
URLs and videos. The focus is on evolving topics by social aspects of the networks and the mentions of user
links between users which are generated intentionally or unintentionally through replies, mentions and retweets.
A probability model of the mentioning behavior is proposed and the proposed model detects the evolving topic
from the anomalies measured. After a several experiments, it shows that mention anomaly based approaches
detects the evolving swing as early as text anomaly based approa0ches.
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
A Profit Maximization Scheme with Guaranteed Quality of Service in Cloud Comp...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEMijnlc
Each argument begins with a conclusion, which is followed by one or more premises supporting the
conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises
support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left
implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality
warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence tosequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune
the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for
producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and
Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of
warrant generation, our model generates a greater variety of warrants than other baseline models. The
experimental results validate the effectiveness of our proposed hybrid model for generating warrants.
In this research work we have built a systems which pulls tweets , pre-process each tweet to remove unwanted artifacts and the gives stemming treatment to each token in the tweet to finally get the score value which is calculated on the basis of numerical strength expressed in mathematical expression for intimacy, trust , social distance between the parties interacting on digital media twitter. The nature of relationship with respect their tie is calculated on the bases of multivariate linear regression analysis and to further improve on this method which have conducted polynomial regression for finding a better and more accurate way to data fit the twitter tie dataset which is apparent from the statistical test value of Coefficient of determinant and other statistical test .The values of y-intercepts are interpreted and illustrated in results section .
Recently, fake news has been incurring many problems to our society. As a result, many researchers have been working on identifying fake news. Most of the fake news detection systems utilize the linguistic feature of the news. However, they have difficulty in sensing highly ambiguous fake news which can be detected only after identifying meaning and latest related information. In this paper, to resolve this problem, we shall present a new Korean fake news detection system using fact DB which is built and updated by human's direct judgement after collecting obvious facts. Our system receives a proposition, and search the semantically related articles from Fact DB in order to verify whether the given proposition is true or not by comparing the proposition with the related articles in fact DB. To achieve this, we utilize a deep learning model, Bidirectional Multi Perspective Matching for Natural Language Sentence BiMPM , which has demonstrated a good performance for the sentence matching task. However, BiMPM has some limitations in that the longer the length of the input sentence is, the lower its performance is, and it has difficulty in making an accurate judgement when an unlearned word or relation between words appear. In order to overcome the limitations, we shall propose a new matching technique which exploits article abstraction as well as entity matching set in addition to BiMPM. In our experiment, we shall show that our system improves the whole performance for fake news detection. Prasanth. K | Praveen. N | Vijay. S | Auxilia Osvin Nancy. V ""Fake News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30014.pdf
Paper Url : https://www.ijtsrd.com/engineering/information-technology/30014/fake-news-detection-using-machine-learning/prasanth-k
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of ...IIIT Hyderabad
Social network and publishing platforms, such as Twitter, support the concept of verification. Veri-
fied accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tan-
tamount to endorsement. However, a significant body of prior work suggests that possessing a verified
status symbolizes enhanced credibility in the eyes of the platform audience. As a result, such a station
is highly coveted among public figures and influencers. Hence, we attempt to characterize the network
of verified users on Twitter and compare the results to similar analyses performed for the entire Twit-
ter network. We extracted the whole graph of verified users on Twitter (as of July 2018) and obtained
231,246 English user-profiles and 79,213,811 connections. Subsequently, in the network analysis, we
found that the sub-graph of verified users mirrors the full Twitter users graph in some aspects, such as
possessing a short diameter. However, our findings contrast with earlier results on multiple fronts, such
as the possession of a power-law out-degree distribution, slight dissortativity, and a significantly higher
reciprocity rate, as elucidated in the paper. Moreover, we attempt to gauge the presence of salient com-
ponents within this sub-graph and detect the absence of homophily with respect to popularity, which
again is in stark contrast to the full Twitter graph. Finally, we demonstrate stationarity in the time series
of verified user activity levels.
It is in this backdrop that we attempt to deconstruct the extent to which Twitter’s verification policy
mingles the notions of authenticity and authority. To this end, we seek to unravel the aspects of a user’s
profile, which likely engender or preclude verification. The aim of the paper is two-fold: First, we test
if discerning the verification status of a handle from profile metadata and content features is feasible.
Second, we unravel the characteristics which have the most significant bearing on a handle’s verification
status. We augmented our dataset with all the 494 million tweets of the aforementioned users over a one
year collection period along with their temporal social reach and activity characteristics. Our proposed
models are able to reliably identify verification status (Area under curve AUC > 99%). We show that
the number of public list memberships, presence of neutral sentiment in tweets and an authoritative
language style are the most pertinent predictors of verification status.
To the best of our knowledge, this work represents the first quantitative attempt at characterizing
verified users on Twitter and also the first attempt at discerning and classifying verification worthy users
on Twitter.
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
This is the final report for Networks & Data Mining Techniques project focusing on mining social network to estimate public opinion about entities and associated keywords. This project mines Twitter for recent feeds and analyzes them to estimate sentiment score, discussed entity and describing keywords in each tweet. This data is then exploited to elicit overall sentiment associated with each entity. Entities and keywords extracted is also used to form an entity-keyword bigraph. This graph is further used to detect entity communities and keywords found within those communities. Presented implementation works in linear time.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
Evolving swings from social streams is receiving renewed interest and it is motivated by the growth of social
media and social streams. Non-conventional based approaches can be appropriate which include text, images,
URLs and videos. The focus is on evolving topics by social aspects of the networks and the mentions of user
links between users which are generated intentionally or unintentionally through replies, mentions and retweets.
A probability model of the mentioning behavior is proposed and the proposed model detects the evolving topic
from the anomalies measured. After a several experiments, it shows that mention anomaly based approaches
detects the evolving swing as early as text anomaly based approa0ches.
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
A Profit Maximization Scheme with Guaranteed Quality of Service in Cloud Comp...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Public Integrity Auditing for Shared Dynamic Cloud Data with Group User Revoc...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
An Authenticated Trust and Reputation Calculation and Management System for C...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Route-Saver: Leveraging Route APIs for Accurate and Efficient Query Processin...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted Cloud ...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Enabling Cloud Storage Auditing With Key-Exposure Resistance1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Privacy Preserving Ranked Multi-Keyword Search for Multiple Data Owners in Cl...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Stealthy Denial of Service Strategy in Cloud Computing 1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Key-Aggregate Searchable Encryption (KASE) for Group Data Sharing via Cloud S...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Privacy-Preserving Public Auditing for Regenerating-Code-Based Cloud Storage1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
A Scalable and Reliable Matching Service for Content-Based Publish/Subscribe ...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Towards Effective Bug Triage with Software Data Reduction Techniques1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Context-Based Diversification for Keyword Queries over XML Data1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Control Cloud Data Access Privilege and Anonymity with Fully Anonymous Attrib...1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Provable Multicopy Dynamic Data Possession in Cloud Computing Systems1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Now a days Twitter has provided a way to collect and understand user’s opinions about many
private or public organizations. All these organizations are reported for the sake to create and monitor
the targeted Twitter streams to understand user’s views about the organization. Usually a user-defined
selection criteria is used to filter and construct the Targeted Twitter stream. There must be an
application to detect early crisis and response with such target stream, that require a require a good
Named Entity Recognition (NER) system for Twitter, which is able to automatically discover emerging
named entities that is potentially linked to the crisis. However, many applications suffer severely from
short nature of tweets and noise. We present a framework called HybridSeg, which easily extracts and
well preserves the linguistic meaning or context information by first splitting the tweets into
meaningful segments. The optimal segmentation of a tweet is found after the sum of stickiness score of
its candidate segment is maximized.This computed stickiness score considers the probability of
segment whether belongs to global context(i.e., being a English phrase) or belongs to local context(i.e.,
being within a batch of tweets).The framework learns from both contexts.It also has the ability to learn
from pseudo feedback. Also from the result of semantic analysis the proposed system provides with
sentiment analysis.
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
This paper presents a system that extracts information from automatically annotated tweets using well known existing opinion lexicons and supervised machine learning approach. In this paper, the sentiment features are primarily extracted from novel high coverage tweet specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment word hashtags and from tweets with emoticons. The sentence level or tweet level classification is done based on these word level sentiment features by using Sequential Minimal Optimization SMO classifier. SemEval 2013 Twitter sentiment dataset is applied in this work. The ablation experiments show that this system gains in F Score of up to 6.8 absolute percentage points. Nang Noon Kham "Lexicon Based Emotion Analysis on Twitter Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26566.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26566/lexicon-based-emotion-analysis-on-twitter-data/nang-noon-kham
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
A Baseline Based Deep Learning Approach of Live Tweetsijtsrd
In this scenario social media plays a vital role in influencing the life of people. Twitter , Facebook, Instagram etc are the major social media platforms . They act as a platform for users to raise their opinions on things and events around them. Twitter is one such micro blogging site that allows the user to tweet 6000 tweets per day each of 280 characters long. Data analyst rely on this data to reach conclusion on the events happening around and also to rate a product. But due to massive volume of reviews the analysts find it difficult to go through them and reach at conclusions. In order to solve this problem we adopt the method of sentiment analysis. Sentiment analysis is an approach to classify the sentiment of user reviews, documents etc in terms of positive good , negative bad , neutral surprise . I suggest an enhanced twitter sentiment analysis that retrieves data based on a baseline in a particular pre defined time span and performs sentiment analysis using Textblob . This scheme differs from the traditional and existing one which performs sentiment analysis on pre saved data by performing sentiment analysis on real time data fetched via Twitter API . Thereby providing a much recent and relevant conclusion. Anjana Jimmington ""A Baseline Based Deep Learning Approach of Live Tweets"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23918.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/23918/a-baseline-based-deep-learning-approach-of-live-tweets/anjana-jimmington
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxjasoninnes20
BUS 625 Week 4 Response to Discussion 2
Guided Response: Your initial response should be a minimum of 300 words in length. Respond to at least two of your classmates by commenting on their posts. Though two replies are the basic expectation for class discussions, for deeper engagement and learning, you are encouraged to provide responses to any comments or questions others have given to you.
Below there are two of my classmate’s discussion that needs I need to response to their names are Umadevi Sayana
and Britney Graves
Umadevi Sayana
TuesdayMar 17 at 7:50am
Manage Discussion Entry
Twitter mining analyzed the Twitter message in predicting, discovering, or investigating the causation. Twitter mining included text mining that designed specifically to leverage Twitter content and context tweets. With the use of text mining, twitter was able to include analysis of additional information that associates to tweets, which include hashtags, names, and other related characteristics. The mining also employs much information as several tweets, likes, retweets, and favorites trying to understand the considerations better. Twitter using text mining was successful in capturing and reflecting different events that relate to other conventional and social media. In 2013, there were over 500 million messages per day for twitter and became impossible for any human to analyze. It became important than to develop computer-based algorithms, including data mining. Twitter implements text mining in analyzing the sentiment that associates with twitter messages. It based on the analysis of the keyword that words are having a negative, positive, or neutral sentiment (Sunmoo, Noémie& Suzanne, (Links to an external site.)n.d). Positive words, for example like great, beautiful, love, and negative words of stupid, evil, and waste, do regularly have lexicons. Using text mining, Twitter was able to capture sentiments by capturing many dictionary symbols. Moreover, the sentiment applied to abbreviations, emoticons, and repeated characters, symbols, and abbreviations.
The sentiments on topics of economics, politics, and security are usually negative, and sentiments related to sports are harmful. Twitter also used text mining to collect and analyze for topic modeling techniques over time. To pull out the data from Twitter, TwitterR used. “Someone well versed in database architecture and data storage is needed to extract the relevant information in different databases and to merge them into a form that is useful for analysis” ( Sharpe, De Veaux & Velleman, 2019, p.753). It provides the interface that connects to Twitter web API; retweetedby/ids also used combined with RCurl package in finding out several tweets that retweeted. Text mining is also used in Twitter to clean the text by taking out hyperlinks, numbers, stop words, punctuations, followed by stem completion. Text mining also implemented for social network analysis.
Web mining focus on data knowledge discovery ...
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxcurwenmichaela
BUS 625 Week 4 Response to Discussion 2
Guided Response: Your initial response should be a minimum of 300 words in length. Respond to at least two of your classmates by commenting on their posts. Though two replies are the basic expectation for class discussions, for deeper engagement and learning, you are encouraged to provide responses to any comments or questions others have given to you.
Below there are two of my classmate’s discussion that needs I need to response to their names are Umadevi Sayana
and Britney Graves
Umadevi Sayana
TuesdayMar 17 at 7:50am
Manage Discussion Entry
Twitter mining analyzed the Twitter message in predicting, discovering, or investigating the causation. Twitter mining included text mining that designed specifically to leverage Twitter content and context tweets. With the use of text mining, twitter was able to include analysis of additional information that associates to tweets, which include hashtags, names, and other related characteristics. The mining also employs much information as several tweets, likes, retweets, and favorites trying to understand the considerations better. Twitter using text mining was successful in capturing and reflecting different events that relate to other conventional and social media. In 2013, there were over 500 million messages per day for twitter and became impossible for any human to analyze. It became important than to develop computer-based algorithms, including data mining. Twitter implements text mining in analyzing the sentiment that associates with twitter messages. It based on the analysis of the keyword that words are having a negative, positive, or neutral sentiment (Sunmoo, Noémie& Suzanne, (Links to an external site.)n.d). Positive words, for example like great, beautiful, love, and negative words of stupid, evil, and waste, do regularly have lexicons. Using text mining, Twitter was able to capture sentiments by capturing many dictionary symbols. Moreover, the sentiment applied to abbreviations, emoticons, and repeated characters, symbols, and abbreviations.
The sentiments on topics of economics, politics, and security are usually negative, and sentiments related to sports are harmful. Twitter also used text mining to collect and analyze for topic modeling techniques over time. To pull out the data from Twitter, TwitterR used. “Someone well versed in database architecture and data storage is needed to extract the relevant information in different databases and to merge them into a form that is useful for analysis” ( Sharpe, De Veaux & Velleman, 2019, p.753). It provides the interface that connects to Twitter web API; retweetedby/ids also used combined with RCurl package in finding out several tweets that retweeted. Text mining is also used in Twitter to clean the text by taking out hyperlinks, numbers, stop words, punctuations, followed by stem completion. Text mining also implemented for social network analysis.
Web mining focus on data knowledge discovery .
Enhancing prediction of user stance for social networks rumorsIJECEIAES
The spread of social media has led to a massive change in the way information is dispersed. It provides organizations and individuals wider opportunities of collaboration. But it also causes an emergence of malicious users and attention seekers to spread rumors and fake news. Understanding user stances in rumor posts is very important to identify the veracity of the underlying content as news becomes viral in a few seconds which can lead to mass panic and confusion. In this paper, different machine learning techniques were utilized to enhance the user stance prediction through a conversation thread towards a given rumor on Twitter platform. We utilized both conversation thread features as well as features related to users who participated in this conversation, in order to predict the users’ stances, in terms of supporting, denying, querying, or commenting (SDQC), towards the source tweet. Furthermore, different datasets for the stance-prediction task were explored to handle the data imbalance problem and data augmentation for minority classes was applied to enhance the results. The proposed framework outperforms the state-of-the-art results with macro F1-score of 0.7233
Sentimental analysis is a context based mining of text, which extracts and identify subjective information from a text or sentence provided. Here the main concept is extracting the sentiment of the text using machine learning techniques such as LSTM Long short term memory . This text classification method analyses the incoming text and determines whether the underlined emotion is positive or negative along with probability associated with that positive or negative statements. Probability depicts the strength of a positive or negative statement, if the probability is close to zero, it implies that the sentiment is strongly negative and if probability is close to1, it means that the statement is strongly positive. Here a web application is created to deploy this model using a Python based micro framework called flask. Many other methods, such as RNN and CNN, are inefficient when compared to LSTM. Dirash A R | Dr. S K Manju Bargavi "LSTM Based Sentiment Analysis" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42345.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-processing/42345/lstm-based-sentiment-analysis/dirash-a-r
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...Shakas Technologies
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evolution Model Based on Distributed Representations.
Shakas Technologies ( Galaxy of Knowledge)
#11/A 2nd East Main Road,
Gandhi Nagar,
Vellore - 632006.
Mobile : +91-9500218218 / 8220150373| land line- 0416- 3552723
Shakas Training & Development | Shakas Sales & Services | Shakas Educational Trust|IEEE projects | Research & Development | Journal Publication |
Email : info@shakastech.com | shakastech@gmail.com |
website: www.shakastech.com
Facebook: https://www.facebook.com/pages/Shakas-Technologies
Tweet Summarization and Segmentation: A Surveyvivatechijri
The use of social media is increasing day by day. It has become an important medium for getting
information about current happenings around the world. Among various social media platforms, with millions
of users, twitter is one of the most prominent social networking site. Over the years sentiment analysis is being
performed on twitter to understand what tweets that are posted mean. The purpose of this paper is to survey
various tweet segmentation and summarization techniques and the importance of Particle Swarm Optimization
(PSO) algorithm for tweet summarization [1][2].
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Student information management system project report ii.pdf
Tweet Segmentation and Its Application to Named Entity Recognition
1. Tweet Segmentation and Its Application
to Named Entity Recognition
Chenliang Li, Aixin Sun, Jianshu Weng, and Qi He, Member, IEEE
Abstract—Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of
data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer
severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch
mode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily
extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the sticki-
ness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e.,
global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter, we propose
and evaluate two models to derive local context by considering the linguistic features and term-dependency in a batch of tweets,
respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweet
data sets show that tweet segmentation quality is significantly improved by learning both global and local contexts compared with using
global context alone. Through analysis and comparison, we show that local linguistic features are more reliable for learning local con-
text compared with term-dependency. As an application, we show that high accuracy is achieved in named entity recognition by apply-
ing segment-based part-of-speech (POS) tagging.
Index Terms—Twitter stream, tweet segmentation, named entity recognition, linguistic processing, Wikipedia
Ç
1 INTRODUCTION
MICROBLOGGING sites such as Twitter have reshaped the
way people find, share, and disseminate timely infor-
mation. Many organizations have been reported to create
and monitor targeted Twitter streams to collect and under-
stand users’ opinions. Targeted Twitter stream is usually
constructed by filtering tweets with predefined selection cri-
teria (e.g., tweets published by users from a geographical
region, tweets that match one or more predefined key-
words). Due to its invaluable business value of timely infor-
mation from these tweets, it is imperative to understand
tweets’ language for a large body of downstream applica-
tions, such as named entity recognition (NER) [1], [3], [4],
event detection and summarization [5], [6], [7], opinion min-
ing [8], [9], sentiment analysis [10], [11], and many others.
Given the limited length of a tweet (i.e., 140 characters)
and no restrictions on its writing styles, tweets often contain
grammatical errors, misspellings, and informal abbrevia-
tions. The error-prone and short nature of tweets often
make the word-level language models for tweets less reli-
able. For example, given a tweet “I call her, no answer. Her
phone in the bag, she dancin,” there is no clue to guess its
true theme by disregarding word order (i.e., bag-of-word
model). The situation is further exacerbated with the limited
context provided by the tweet. That is, more than one expla-
nation for this tweet could be derived by different readers if
the tweet is considered in isolation. On the other hand,
despite the noisy nature of tweets, the core semantic infor-
mation is well preserved in tweets in the form of named
entities or semantic phrases. For example, the emerging
phrase “she dancin” in the related tweets indicates that it is
a key concept—it classifies this tweet into the family of
tweets talking about the song “She Dancin”, a trend topic in
Bay Area in January 2013.
In this paper, we focus on the task of tweet segmentation.
The goal of this task is to split a tweet into a sequence of con-
secutive n-grams (n ! 1Þ, each of which is called a segment. A
segment can be a named entity (e.g., a movie title “finding
nemo”), a semantically meaningful information unit (e.g.,
“officially released”), or any other types of phrases which
appear “more than by chance” [1]. Fig. 1 gives an example.
In this example, a tweet “They said to spare no effort to increase
traffic throughput on circle line.” is split into eight segments.
Semantically meaningful segments “spare no effort”,
“traffic throughput” and “circle line” are preserved.
Because these segments preserve semantic meaning of the
tweet more precisely than each of its constituent words
does, the topic of this tweet can be better captured in the
subsequent processing of this tweet. For instance, this
segment-based representation could be used to enhance the
extraction of geographical location from tweets because of
the segment “circle line” [12]. In fact, segment-based repre-
sentation has shown its effectiveness over word-based
representation in the tasks of named entity recognition and
event detection [1], [2], [13]. Note that, a named entity is
valid segment; but a segment may not necessarily be a
C. Li is with the State Key Lab of Software Engineering, School of
Computer, Wuhan University, P.R. China. E-mail: cllee@whu.edu.cn.
A. Sun is with the School of Computer Engineering, Nanyang Technologi-
cal University, Singapore. E-mail: axsun@ntu.edu.sg.
J. Weng is with the Accenture Analytics Innovation Center, Singapore.
E-mail: jianshu@acm.org.
Q. He is with the Relevance Science Team, LinkedIn Inc., San Francisco,
CA. E-mail: qhe@linkedin.com.
Manuscript received 4 Oct. 2013; revised 12 May 2014; accepted 15 May
2014. Date of publication 29 May 2014; date of current version 23 Dec. 2014.
Recommended for acceptance by M. Sanderson.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TKDE.2014.2327042
558 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015
1041-4347 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2. named entity. In [6] the segment “korea versus greece” is
detected for the event related to the world cup match
between Korea and Greece.
To achieve high quality tweet segmentation, we propose
a generic tweet segmentation framework, named HybridSeg.
HybridSeg learns from both global and local contexts, and has
the ability of learning from pseudo feedback.
Global context. Tweets are posted for information sharing
and communication. The named entities and semantic
phrases are well preserved in tweets. The global context
derived from Web pages (e.g., Microsoft Web N-Gram cor-
pus) or Wikipedia therefore helps identifying the meaning-
ful segments in tweets. The method realizing the proposed
framework that solely relies on global context is denoted by
HybridSegWeb.
Local context. Tweets are highly time-sensitive so that
many emerging phrases like “She Dancin” cannot be found
in external knowledge bases. However, considering a large
number of tweets published within a short time period (e.
g., a day) containing the phrase, it is not difficult to recog-
nize “She Dancin” as a valid and meaningful segment. We
therefore investigate two local contexts, namely local lin-
guistic features and local collocation. Observe that tweets
from many official accounts of news agencies, organiza-
tions, and advertisers are likely well written. The well pre-
served linguistic features in these tweets facilitate named
entity recognition with high accuracy. Each named entity is
a valid segment. The method utilizing local linguistic fea-
tures is denoted by HybridSegNER. It obtains confident seg-
ments based on the voting results of multiple off-the-shelf
NER tools. Another method utilizing local collocation
knowledge, denoted by HybridSegNGram, is proposed based
on the observation that many tweets published within a
short time period are about the same topic. HybridSegNGram
segments tweets by estimating the term-dependency within
a batch of tweets.
Pseudo feedback. The segments recognized based on local
context with high confidence serve as good feedback to
extract more meaningful segments. The learning from
pseudo feedback is conducted iteratively and the method
implementing the iterative learning is named HybridSegIter.
We conduct extensive experimental analysis on Hybrid-
Seg1
on two tweet data sets and evaluate the quality of tweet
segmentation against manually annotated tweets. Our
experimental results show that HybridSegNER and
HybridSegNGram, the two methods incorporating local con-
text in additional to global context, achieve significant
improvement in segmentation quality over HybridSegWeb,
the method use global context alone. Between the former
two methods, HybridSegNER is less sensitive to parameter
settings than HybridSegNGram and achieves better segmenta-
tion quality. With iterative learning from pseudo feedback,
HybridSegIter further improves the segmentation quality.
As an application of tweet segmentation, we propose
and evaluate two segment-based NER algorithms. Both
algorithms are unsupervised in nature and take tweet seg-
ments as input. One algorithm exploits co-occurrence of
named entities in targeted Twitter streams by applying ran-
dom walk (RW) with the assumption that named entities are
more likely to co-occur together. The other algorithm utilizes
Part-of-Speech (POS) tags of the constituent words in seg-
ments. The segments that are likely to be a noun phrase (NP)
are considered as named entities. Our experimental results
show that (i) the quality of tweet segmentation significantly
affects the accuracy of NER, and (ii) POS-based NER method
outperforms RW-based method on both data sets.
The remainder of this paper is organized as follows.
Section 2 surveys related works on tweet segmentation.
Section 3 defines tweet segmentation and describes the
proposed framework. Section 4 details how the local con-
text is exploited in the framework. In Section 5, the seg-
ment-based NER methods are investigated. In Section 6,
we evaluate the proposed HybridSeg framework and the
two segment-based NER methods. Section 7 concludes
this paper.
2 RELATED WORK
Both tweet segmentation and named entity recognition are
considered important subtasks in NLP. Many existing NLP
techniques heavily rely on linguistic features, such as POS
tags of the surrounding words, word capitalization, trigger
words (e.g., Mr., Dr.), and gazetteers. These linguistic fea-
tures, together with effective supervised learning algo-
rithms (e.g., hidden markov model (HMM) and conditional
random field (CRF)), achieve very good performance on for-
mal text corpus [14], [15], [16]. However, these techniques
experience severe performance deterioration on tweets
because of the noisy and short nature of the latter.
There have been a lot of attempts to incorporate tweet’s
unique characteristics into the conventional NLP techni-
ques. To improve POS tagging on tweets, Ritter et al. train a
POS tagger by using CRF model with conventional and
tweet-specific features [3]. Brown clustering is applied in
their work to deal with the ill-formed words. Gimple et al.
incorporate tweet-specific features including at-mentions,
hashtags, URLs, and emotions [17] with the help of a new
labeling scheme. In their approach, they measure the confi-
dence of capitalized words and apply phonetic normaliza-
tion to ill-formed words to address possible peculiar
writings in tweets. It was reported to outperform the state-
of-the-art Stanford POS tagger on tweets. Normalization of
ill-formed words in tweets has established itself as an
important research problem [18]. A supervised approach is
employed in [18] to first identify the ill-formed words.
Then, the correct normalization of the ill-formed word is
selected based on a number of lexical similarity measures.
Fig. 1. Example of tweet segmentation.
1. HybridSeg refers to HybridSegWeb, HybridSegNER, HybridSegNGram
and HybridSegIter or one of them based on the context. We do not dis-
tinguish this when the context is clear and discriminative.
LI ET AL.: TWEET SEGMENTATION AND ITS APPLICATION TO NAMED ENTITY RECOGNITION 559
3. Both supervised and unsupervised approaches have been
proposed for named entity recognition in tweets. T-NER, a
part of the tweet-specific NLP framework in [3], first seg-
ments named entities using a CRF model with orthographic,
contextual, dictionary and tweet-specific features. It then
labels the named entities by applying Labeled-LDA with
the external knowledge base Freebase.2
The NER solution
proposed in [4] is also based on a CRF model. It is a two-
stage prediction aggregation model. In the first stage, a
KNN-based classifier is used to conduct word-level classifi-
cation, leveraging the similar and recently labeled tweets. In
the second stage, those predictions, along with other linguis-
tic features, are fed into a CRF model for finer-grained clas-
sification. Chua et al. [19] propose to extract noun phrases
from tweets using an unsupervised approach which is
mainly based on POS tagging. Each extracted noun phrase
is a candidate named entity.
Our work is also related to entity linking (EL). EL is to
identify the mention of a named entity and link it to an entry
in a knowledge base like Wikipedia [20], [21], [22], [23]. Con-
ventionally, EL involves a NER system followed by a linking
system [20], [21]. Recently, Sil and Yates propose to combine
named entity recognition and linking into a joint model [23].
Similarly, Guo et al. propose a structural SVM solution to
simultaneously recognize mention and resolve the linking
[22]. While entity linking aims to identify the boundary of a
named entity and resolve its meaning based on an external
knowledge base, a typical NER system identifies entity men-
tions only, like the work presented here. It is difficult to
make a fair comparison between these two techniques.
Tweet segmentation is conceptually similar to Chinese
word segmentation (CSW). Text in Chinese is a continuous
sequence of characters. Segmenting the sequence into
meaningful words is the first step in most applications.
State-of-the-art CSW methods are mostly developed using
supervised learning techniques like perceptron learning
and CRF model [24], [25], [26], [27], [28]. Both linguistic
and lexicon features are used in the supervised learning in
CSW. Tweets are extremely noisy with misspellings, infor-
mal abbreviations, and grammatical errors. These adverse
properties lead to a huge number of training samples for
applying a supervised learning technique. Here, we exploit
the semantic information of external knowledge bases and
local contexts to recognize meaningful segments like
named entities and semantic phrases in Tweets. Very
recently, similar idea has also been explored for CSW by
Jiang et al. [28]. They propose to prune the search space in
CSW by exploiting the natural annotations in the Web.
Their experimental results show significant improvement
by using simple local features.
3 HYBRIDSEG FRAMEWORK
The proposed HybridSeg framework segments tweets in
batch mode. Tweets from a targeted Twitter stream are
grouped into batches by their publication time using a fixed
time interval (e.g., a day). Each batch of tweets are then seg-
mented by HybridSeg collectively.
3.1 Tweet Segmentation
Given a tweet t from batch T , the problem of tweet segmen-
tation is to split the ‘ words in t ¼ w1w2 . . . w‘ into m ‘
consecutive segments, t ¼ s1s2:::sm, where each segment si
contains one or more words. We formulate the tweet seg-
mentation problem as an optimization problem to maximize
the sum of stickiness scores of the m segments, shown in
Fig. 2.3
A high stickiness score of segment s indicates that it
is a phrase which appears “more than by chance”, and fur-
ther splitting it could break the correct word collocation or
the semantic meaning of the phrase. Formally, let CðsÞ
denote the stickiness function of segment s. The optimal seg-
mentation is defined in the following:
arg max
s1;...;sm
Xm
i¼1
CðsiÞ: (1)
The optimal segmentation can be derived by using dynamic
programming with a time complexity of Oð‘Þ (rf. [1] for
detail).
As shown in Fig. 2, the stickiness function of a segment
takes in three factors: (i) length normalization LðsÞ, (ii) the
segment’s presence in Wikipedia QðsÞ, and (iii) the
segment’s phraseness PrðsÞ, or the probability of s being a
phrase based on global and local contexts. The stickiness of
s, CðsÞ, is formally defined in Eq. (2), which captures the
three factors:
CðsÞ ¼ LðsÞ Á eQðsÞ
Á
2
1 þ eÀSCPðsÞ
: (2)
Length normalization. As the key of tweet segmentation is to
extract meaningful phrases, longer segments are preferred
for preserving more topically specific meanings. Let jsj be
number of words in segment s. The normalized segment
length LðsÞ ¼ 1 if jsj ¼ 1 and LðsÞ ¼ jsjÀ1
jsj if jsj 1, which
moderately alleviates the penalty on long segments.
Presence in Wikipedia. In our framework, Wikipedia serves
as an external dictionary of valid names or phrases. Specifi-
cally, QðsÞ in Eq. (2) is the probability that s is an anchor
text in Wikipedia, also known as keyphraseness in [21], [29].
Let wikiðsÞ and wikiaðsÞ be the number of Wikipedia entries
where s appears in any form and s appears in the form of
Fig. 2. HybridSeg framework without learning from pseudo feedback.
2. http://www.freebase.com/.
3. For clarity, we do not show the iterative learning from pseudo
feedback in this figure.
560 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015
4. anchor text, respectively, QðsÞ ¼ wikiaðsÞ=wikiðsÞ. Each
anchor text in Wikipedia refers to a Wikipedia entry even if
the entry has not been created. The segment that is often
used as anchor text in Wikipedia is preferred in our seg-
mentation. Note that Wikipedia here can be replaced with
any other external knowledge base by redefining QðsÞ.
Example knowledge bases include Freebase, Probase [30],
or domain-specific knowledge base like GeoNames4
if the
targeted Twitter stream is domain-specific.
Segment phraseness. The last component of Eq. (2) is to
estimate the probability of a segment being a valid phrase
using Symmetric Conditional Probability (SCP) measure,5
defined in Eq. (3).
SCPðsÞ ¼ log
PrðsÞ2
1
jsjÀ1
PjsjÀ1
i¼1 Prðw1 . . . wiÞ Prðwiþ1 . . . wjsjÞ
: (3)
In Eq. (3), PrðsÞ or Prðw1 . . . wiÞ is the approximated n-gram
probability of a segment. If s contains a single word w,
SCPðsÞ ¼ 2 log PrðwÞ.
The estimation of PrðsÞ is the key challenge in our frame-
work. In the following, we present three observations,
which are also the rationales why PrðsÞ can be estimated
from global and local contexts.
3.2 Observations for Tweet Segmentation
Tweets are considered noisy with lots of informal abbrevia-
tions and grammatical errors. However, tweets are posted
mainly for information sharing and communication among
many purposes.
Observation 1. Word collocations of named entities and
common phrases in English are well preserved in Tweets.
Many named entities and common phrases are preserved
in tweets for information sharing and dissemination. In this
sense, PrðsÞ can be estimated by counting a segment’s
appearances in a very large English corpus (i.e., global
context). In our implementation, we turn to Microsoft Web
N-Gram corpus [31]. This N-Gram corpus is derived from
all web documents indexed by Microsoft Bing in the EN-US
market. It provides a good estimate of the statistics of com-
monly used phrases in English.
Observation 2. Many tweets contain useful linguistic
features.
Although many tweets contain unreliable linguistic fea-
tures like misspellings and unreliable capitalizations [3],
there exist tweets composed in proper English. For example,
tweets published by official accounts of news agencies,
organizations, and advertisers are often well written. The
linguistic features in these tweets enable named entity rec-
ognition with relatively high accuracy.
Observation 3. Tweets in a targeted stream are not topically
independent to each other within a time window.
Many tweets published within a short time period talk
about the same theme. These similar tweets largely share
the same segments. For example, similar tweets have been
grouped together to collectively detect events, and an event
can be represented by the common discriminative segments
across tweets [13].
The latter two observations essentially reveal the same
phenomenon: local context in a batch of tweets comple-
ments global context in segmenting tweets. For example,
person names emerging from bursty events may not be
recorded in Wikipedia. However, if the names are reported
in tweets by news agencies or mentioned in many tweets,
there is a good chance to segment these names correctly
based on local linguistic features or local word collocation
from the batch of tweets. In the next section, we detail learn-
ing from local context to estimate PrðsÞ.
4 LEARNING FROM LOCAL CONTEXT
Illustrated in Fig. 2, the segment phraseness PrðsÞ is com-
puted based on both global and local contexts. Based on
Observation 1, PrðsÞ is estimated using the n-gram probabil-
ity provided by Microsoft Web N-Gram service, derived
from English Web pages. We now detail the estimation of
PrðsÞ by learning from local context based on Observations
2 and 3. Specifically, we propose learning PrðsÞ from the
results of using off-the-shelf Named Entity Recognizers
(NERs), and learning PrðsÞ from local word collocation in a
batch of tweets. The two corresponding methods utilizing
the local context are denoted by HybridSegNER and
HybridSegNGram respectively.
4.1 Learning from Weak NERs
To leverage the local linguistic features of well-written
tweets, we apply multiple off-the-shelf NERs trained on for-
mal texts to detect named entities in a batch of tweets T by
voting. Voting by multiple NERs partially alleviates the
errors due to noise in tweets. Because these NERs are not
specifically trained on tweets, we also call them weak
NERs. Recall that each named entity is a valid segment, the
detected named entities are valid segments.
Given a candidate segment s, let fs be its total frequency
in T . A NER ri may recognize s as a named entity fri;s times.
Note that fri;s fs since a NER may only recognize some of
s’s occurrences as named entity in all tweets of T . Assuming
there are m off-the-shelf NERs r1; r2; . . . ; rm, we further
denote fR
s to be the number of NERs that have detected at
least one occurrence of s as named entity, fR
s ¼
Pm
i Iðfri;sÞ:
Iðfri;sÞ ¼ 1 if fri;s 0; Iðfri;sÞ ¼ 0 otherwise.
We approximate the probability of s being a valid name
entity (i.e., a valid segment) using a voting algorithm defined
by Eq. (4):
^PrNERðsÞ ¼ wðs; mÞ Á
1
m
Xm
i
^Prri ðsÞ (4)
wðs; mÞ ¼ 1=ð1 þ eÀbðfR
s Àm=2Þ
Þ (5)
^Prri ðsÞ ¼ 1 þ
a
fri;s þ
À
fs
fri;sþ
: (6)
4. http://www.geonames.org/.
5. In our earlier work [1], we have evaluated two collocation meas-
ures, SCP and Pointwise Mutual Information (PMI). Our experimental
results show that SCP is much more effective than PMI for tweet
segmentation.
LI ET AL.: TWEET SEGMENTATION AND ITS APPLICATION TO NAMED ENTITY RECOGNITION 561
5. Our approximation contains two parts. The right part of
Eq. (4) (rf. Eq. (6)) is the average confidence that one weak
NER recognizes s as named entity. A biased estimation is
simply 1=m Á
Pm
i¼1 fri;s=fs because each fri;s=fs is a noisy ver-
sion of the true probability. However, such simple average
ignores the absolute value of fri;s which can also play an
important role here. For example, a party’s name in an elec-
tion event may appear hundreds of times in a tweet batch.
However, due to the free writing styles of tweets, only tens
of the party name’s occurrences are recognized by weak
NERs as named entity. In this case, fri;s=fs is relatively small
yet fri;s is relatively high. Thus, we design Eq. (6) that favors
both fri;s=fs and fri;s. The favor scale is controlled by a factor
a. When a is large, our function is more sensitive to the
change of fri;s=fs; when a is small, a reasonably large fri;s
leads ^Prri ðsÞ to be close to 1 despite of a relatively small value
of fri;s=fs. In this paper we empirically set a ¼ 0:2 in experi-
ments. A small constant is set to avoid dividing by zero.
The left part of Eq. (4), wðs; mÞ (rf. Eq. (5)) uses a sigmoid
function to control the impact of the majority degree of m
weak NERs on the segment, which is tuned by a factor b.
For example, in our paper we set b ¼ 10 so that as long as
more than half of weak NERs recognize s as named entity,
wðs; mÞ is close to 1. With a small b, wðs; mÞ gets closer to 1
when more weak NERs recognize s as named entity.
Considering both global context and the local context
by NER voting, we approximate PrðsÞ using a linear
combination:
PrðsÞ ¼ ð1 À ÞPrMSðsÞ þ ^PrNERðsÞ; (7)
where ^PrNERðsÞ is defined by Eq. (4) with a coupling factor
2 ½0; 1Þ, and PrMSðÁÞ is the n-gram probability provided by
Microsoft Web N-Gram service. The learning of will be
detailed in Section 4.3.
4.2 Learning from Local Collocation
Collocation is defined as an arbitrary and recurrent word com-
bination in [32]. Let w1w2w3 be a valid segment, it is expected
that sub-n-grams fw1; w2; w3; w1w2; w2w3g are positively cor-
related with one another. Thus, we need a measure that cap-
tures the extent to which the sub-n-grams of a n-gram are
correlated with one another, so as to estimate the probability
of the n-gram being a valid segment.
Statistical n-gram language modeling is to estimate the
probability of n-gram w1w2 . . . wn, which has been exten-
sively studied in speech recognition and text mining [33],
[34], [35], [36]. By using the chain rule, we express the n-
gram probability in Eq. (8):
^PrNGramðw1 . . . wnÞ ¼
Yn
i¼1
^Prðwijw1 . . . wiÀ1Þ; (8)
where ^Prðwijw1 . . . wiÀ1Þ is the conditional probability of
word wi following word sequence w1 . . . wiÀ1. Here, we aim
to quantify the strength of a n-gram being a valid segment
based on the n-gram distribution in the batch of tweets.
That is, we try to capture the dependencies between the
sub-n-grams of a n-gram. In this sense, we set ^Prðw1Þ to be 1
in Eq. (8).
Absolute Discounting Smoothing. At first glance, it seems
that applying maximum likelihood estimation is straightfor-
ward. However, because Prðw1Þ is set to 1, then
^PrNGramðw1 . . . wnÞ ¼ fw1...wn =fw1
. More importantly, due to
the informal writing style and limited length of tweets, peo-
ple often use a sub-n-gram to refer to a n-gram. For exam-
ple, either first name or last name is often used in tweets to
refer to the same person instead of her full name. We there-
fore adopt absolute discounting smoothing method [33],
[34] to boost up the likelihood of a valid segment. That is,
the conditional probability Prðwijw1 . . . wiÀ1Þ is estimated by
Eq. (9), where dðw1 . . . wiÀ1Þ is the number of distinct words
following word sequence w1 . . . wiÀ1, and k is the discount-
ing factor.
^Prðwijw1 . . . wiÀ1Þ
¼
maxffw1...wi À k; 0g
fw1...wiÀ1
þ
k Á dðw1 . . . wiÀ1Þ
fw1...wiÀ1
Á Prðwijw2 . . . wiÀ1Þ
(9)
Right-to-left smoothing (RLS). Like most n-gram models,
the model in Eq. (8) follows the writing order of left-to-right.
However, it is reported that the latter words in a n-gram
often carry more information [37]. For example, “justin
bieber” is a bursty segment in some days of tweets data in
our pilot study. Since “justin” is far more prominent than
word “bieber”, the n-gram probability of the segment is rel-
ative small. However, we observe that “justin” almost
always precedes “bieber” when the latter occurs. Given
this, we introduce a right-to-left smoothing method mainly
for name detection. Using RLS, the conditional likelihood
Prðw2jw1Þ is calculated by Eq. (10), where fw1w2
=fw2
is the
conditional likelihood of w1 preceding w2, and u is a cou-
pling factor which balances the two parts (u is empirically
set to 0:5 in our experiments). Note that, RLS is only applied
when calculating the conditional probabilities of 2-grams,
because higher order n-grams have more specific informa-
tion. For example, “social network” is more specific than
word “social” for the estimation of the valid segment “social
network analysis”.
^Prðw2jw1Þ ¼ u
maxffw1w2
À k; 0g
fw1
þ
k Á dðw1Þ
fw1
Á Prðw2jw1Þ
'
þ ð1 À uÞ
fw1w2
fw2
:
(10)
Bursty-based weighting. Similar to that in Eq. (7), the esti-
mation of local collocation can be combined with global con-
text using a linear combination with a coupling factor :
PrðsÞ ¼ ð1 À ÞPrMSðsÞ þ ^PrNGramðsÞ: (11)
However, because tweets are noisy, the estimation of a n-gram
being a valid segment is confident only when there are a lot of
samples. Hence, we prefer global context in tweet segmenta-
tion when the frequency of a n-gram is relatively small. There-
fore, we introduce a bursty-based weighting scheme for
combining local collocation and global context.
PrðsÞ ¼ ð1 À ÞPrMSðsÞ þ BðsÞ ^PrNGramðsÞ: (12)
BðsÞ, in a range of ð0; 1Þ, quantifies the burstiness of segment
s. It satisfies two constraints: a) Bðs1Þ ! Bðs2Þ if fs1
! fs2
and
562 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015
6. s1 and s2 are both i-gram segments; b) Bðs1Þ ! Bðs2Þ if
fs1
¼ fs2
and s1 is a i-gram segment and s2 is a j-gram seg-
ment and i j. We define BðsÞ for segment s of i-gram as:
BðsÞ ¼ 1=ð1 þ eÀtðiÞðfsÀfðiÞÞ
Þ; (13)
where fðiÞ is the average frequency of all i-grams in the
batch T , and tðiÞ is a scaling function tðiÞ ¼ 5=sðiÞ, and sðiÞ
is the standard deviation of the frequency of all i-grams in
the batch. That is, the local collocation measure is reliable if
there is enough samples of a segment in the batch.
4.3 Learning from Pseudo Feedback
As shown in Fig. 2, so far in the proposed HybridSeg frame-
work, each tweet is segmented independently from other
tweets in a batch, though local context are derived from all
tweets in the same batch. Recall that segmenting a tweet is
an optimization problem. The probability of phraseness of
any candidate segment in a tweet could affect its segmenta-
tion result. We therefore design an iterative process in the
HybridSeg framework to learn from the most confident seg-
ments in the batch from the previous iteration. Fig. 3 illus-
trates the iterative process where the confident named
entities voted by weak NERs are considered as the most con-
fident segments (or seed segments) in the 0th iteration. In
the subsequent iterations, the confident segments from the
previous iteration become the seed segments and the same
process repeats until the segmentation results of HybridSeg
do not change significantly. We define the stop criterion
using Jensen-Shannon divergence (JSD) of the frequency dis-
tributions of segments in two consecutive iterations.
Suppose at iteration i, HybridSeg outputs a set of seg-
ments fhs; fi
sig, where fi
s is the number of times s is a seg-
ment at iteration i. Then, fi
s=fs relatively records the
segmentation confidence of HybridSeg about s at iteration i
(recall that fs denotes the frequency of s in batch T ). Similar
to Eq. (6), we define
^Pr
i
ðsÞ ¼ 1 þ
a
fi
s þ
À
fs
fi
sþ
:
Following the same combination strategy defined by
Eq. (7), we have the following iterative updating function:
Priþ1
ðsÞ ¼ ð1 À ÞPrMSðsÞ þ ^Pri
ðsÞ: (14)
In the 0th iteration, ^Pr0
ðsÞ can be estimated based on the
voting results of weak NERs or the confident n-grams
learned from the batch of tweets.
Learning the parameter . The coupling factor in Eq. (14)
is crucial for the convergence of HybridSeg. A good should
ensure that the top confident segments from the previous
iteration are detected more times in the next iteration. This
is equivalent to maximizing the sum of detected frequency
of the top confident segments (weighted by their stickiness
scores, rf. Eq. (2)) extracted from the previous iteration.
Accordingly, learning the parameter is converted to an
optimization problem as follows:
^ ¼ arg max
mIterðÞ
¼ arg max
X
s2top-k at iteration i
Ci
ðsÞ Á fiþ1
ðsÞ:
(15)
Ci
ðsÞ is the stickiness score of s computed by HybridSeg in
the previous iteration. Based on it, top-k segments can be
retrieved. fiþ1
ðsÞ is the detected frequency of s in the cur-
rent iteration, which is an unknown function to variable .
Therefore, the optimal is intractable. In our experiments,
we use brute-force search strategy to find the optimal for
each iteration and for each tweet batch. Since the update for
Eq. (2) with a new can be easily calculated, the efficiency
is not a major concern for a fixed number of values.
Learning for the 0th iteration. Note that for the 0th itera-
tion, is learned differently because there is no segments
detected from the previous iteration.
For HybridSegNER, a good shall ensure that the confi-
dent segments voted by m weak NERs can be detected
more times in the next iteration. Let N be the segments
that are recognized by all m NER systems (i.e., N ¼
fsjfR
s ¼ mg). For each segment s 2 N , we consider its con-
fident frequency to be the minimum number of times that s
is recognized as named entity by one of the m NERs. Let the
confident frequency of s be fc;s, i.e., fc;s ¼ minm
i fri;s. Then
is learned as follows in the 0th iteration:
^ ¼ arg max
mNERðÞ ¼ arg max
X
s2N
^Pr0
ðsÞ Á fc;s Á f0
s : (16)
In this equation, ^Pr0
ðsÞ is the value computed using Eq. (4);
^Pr0
ðsÞ Á fc;s serves as a weighting factor to adjust the impor-
tance of f0
s in learning . If segment s is very likely to be a
named entity (i.e., ^Pr0
ðsÞ is high) and it has been detected
many times by all NERs (i.e., fc;s is large), then the number
of times s is successfully segmented f0
s has a big impact on
the selection of . On the other hand, if ^Pr0
ðsÞ is low, or fc;s
is small, or both conditions hold, then f0
s is less important to
selection. By defining fc;s ¼ minm
i fri;s, Eq. (16) conserva-
tively considers segments recognized by all weak NERs
because of the noisy nature of tweets. This helps to reduce
the possible oscillations resulted from different settings,
since is a global factor (i.e., not per-tweet dependent). On
the other hand, we also assume that all the off-the-shelf
NERs are reasonably good, e.g., when they are applied on
formal text. If there is a large number of NERs, then the defi-
nition of fc;s could be relaxed to reduce the impact of one or
two poor-performing NERs among them.
For HybridSegNGram, because there is no initial set of con-
fident segments, any heuristic approach may make the
adaption of drifting away from its optimal range. Given
that HybridSegNGram exploits the local collocation based on
n-gram statistical model, we argue that a common range
could exist for most targeted Twitter streams. We empiri-
cally study the impact of to HybridSegNGram in Section 6.
5 SEGMENT-BASED NAMED ENTITY RECOGNITION
In this paper, we select named entity recognition as a down-
stream application to demonstrate the benefit of tweet
segmentation. We investigate two segment-based NER
Fig. 3. The iterative process of HybridSegIter.
LI ET AL.: TWEET SEGMENTATION AND ITS APPLICATION TO NAMED ENTITY RECOGNITION 563
7. algorithms. The first one identifies named entities from a pool
of segments (extracted by HybridSeg) by exploiting the co-
occurrences of named entities. The second one does so based
on the POS tags of the constituent words of the segments.
5.1 NER by Random Walk
The first NER algorithm is based on the observation that a
named entity often co-occurs with other named entities in a
batch of tweets (i.e., the gregarious property).
Based on this observation, we build a segment graph. A
node in this graph is a segment identified by HybridSeg. An
edge exists between two nodes if they co-occur in some
tweets; and the weight of the edge is measured by Jaccard
Coefficient between the two corresponding segments. A
random walk model is then applied to the segment graph.
Let rs be the stationary probability of segment s after apply-
ing random walk, the segment is then weighted by
yðsÞ ¼ eQðsÞ
Á rs: (17)
In this equation, eQðsÞ
carries the same semantic as in Eq. (2).
It indicates that a segment that frequently appears in Wiki-
pedia as an anchor text is more likely to be a named entity.
With the weighting yðsÞ, the top K segments are chosen as
named entities.
5.2 NER by POS Tagger
Due to the short nature of tweets, the gregarious property may
be weak. The second algorithm then explores the part-of-
speech tags in tweets for NER by considering noun phrases
as named entities using segment instead of word as a unit.
A segment may appear in different tweets and its constit-
uent words may be assigned different POS tags in these
tweets. We estimate the likelihood of a segment being a noun
phrase by considering the POS tags of its constituent words
of all appearances. Table 1 lists three POS tags that are con-
sidered as the indicators of a segment being a noun phrase.
Let ws
i;j be the jth word of segment s in its ith occurrence,
we calculate the probability of segment s being an noun
phrase as follow:
^PNP ðsÞ ¼
P
i
P
j
Â
ws
i;j
Ã
jsj Á fs
Á
1
1 þ e
À5
ðfsÀfsÞ
sðfsÞ
: (18)
This equation considers two factors. The first factor esti-
mates the probability as the percentage of the constituent
words being labeled with an NP tag for all the occurrences
of segment s, where ½wŠ is 1 if w is labeled as one of the three
POS tags in Table 1, and 0 otherwise; For example, “chiam
see tong”, the name of a Singaporean politician and lawyer,6
is labeled as ^^^ (66:67 percent), NVV (3:70 percent), ^V^
(7:41 percent) and ^VN (22:22 percent).7
By considering the
types of all words in a segment, we can obtain a high proba-
bility of 0:877 for “chiam see tong”. The second factor of the
equation introduces a scaling factor to give more preference
to frequent segments, where fs and sðfsÞ are the mean and
standard deviation of segment frequency. The segments are
then ranked by yðsÞ ¼ eQðsÞ
Á ^PNP ðsÞ, i.e., replacing rs in
Eq. (17) by ^PNP ðsÞ.
6 EXPERIMENT
We report two sets of experiments. The first set of experi-
ments (Sections 6.1 to 6.3) aims to answer three questions: (i)
does incorporating local context improve tweet segmenta-
tion quality compared to using global context alone? (ii)
between learning from weak NERs and learning from local
collocation, which one is more effective, and (iii) does itera-
tive learning further improves segmentation accuracy? The
second set of experiments (Section 6.4) evaluates segment-
basednamedentityrecognition.
6.1 Experiment Setting
Tweet Data Sets. We used two tweet data sets in our experi-
ments: SIN and SGE. The two data sets were used for simu-
lating two targeted Twitter streams. The former was a
stream consisting of tweets from users in a specific
geographical region (i.e., Singapore in this case), and the
latter was a stream consisting of tweets matching some
predefined keywords and hashtags for a major event (i.e.,
Singapore General Election 2011).
We randomly selected 5;000 tweets published on one
random day in each tweet collection. Named entities were
annotated by using BILOU schema [4], [14]. After discard-
ing retweets and tweets with inconsistent annotations, 4;422
tweets from SIN and 3;328 tweets from SGE are used for
evaluation. The agreement of annotation on tweet level is 81
and 62 percent for SIN and SGE respectively. The relatively
low agreement for SGE is mainly due to the strategy of han-
dling concepts of GRC and SMC, which refer to different
types of electoral divisions in Singapore.8
Annotators did
not reach a consensus on whether a GRC/SMC should be
labeled as a location name (e.g., “aljunied grc” versus
“aljunied”). Table 2 reports the statistics of the annotated
NEs in the two data sets where fg
s denotes the number of
occurrences (or frequency) of named entity s (which is also
a valid segment) in the annotated ground truth G. Fig. 4
plots the NEs’ frequency distribution.
TABLE 1
Three POS Tags as the Indicator of a Segment Being a Noun
Phrase, Reproduced from [17]
Tag Definition Examples
N common noun (NN, NNS) books; someone
proper noun (NNP, NNPS) lebron; usa; iPad
$ numeral (CD) 2010; four; 9:30
TABLE 2
The Annotated Named Entities in SIN and SGE Data Sets,
Where fg
s Denotes the Frequency of Named Entity s in the
Annotated Ground Truth
Data set #NEs minfg
s maxfg
s
P
fg
s #NEs s.t.fg
s 1
SIN 746 1 49 1,234 136
SGE 413 1 1,644 4,073 161
6. http://en.wikipedia.org/wiki/Chiam_See_Tong.
7. V:verb including copula, auxiliaries; for example, might, gonna,
ought, is, eats.
8. http://en.wikipedia.org/wiki/Group_Representation_
Constituency.
564 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015
8. Wikipedia dump. We use the Wikipedia dump released
on 30 January, 2010.9
This dump contains 3;246;821 articles
and there are 4;342;732 distinct entities appeared as anchor
texts in these articles.
MS Web N-Gram. The Web N-Gram service provides
access to three content types: document body, document
titles and anchor texts. We use the statistics derived from
document body as at April 2010.
Evaluation Metric. Recall that the task of tweet segmenta-
tion is to split a tweet into semantically meaningful seg-
ments. Ideally, a tweet segmentation method shall be
evaluated by comparing its segmentation result against
manually segmented tweets. However, manual segmenta-
tion of a reasonably sized data collection is extremely
expensive. We choose to evaluate a tweet segmentation
method based on whether the manually annotated named
entities are correctly split as segments [1]. Because each
named entity is a valid segment, the annotated named enti-
ties serve as partial ground truth in the evaluation.
We use the Recall measure, denoted by Re, which is the
percentage of the manually annotated named entities that
are correctly split as segments. Because a segmentation
method outputs exactly one possible segmentation for each
tweet, recall measure is the same as precision in this setting.
Methods. We evaluate four segmentation methods in the
experiments: (i) HybridSegWeb learns from global context
only, (ii) HybridSegNER learns from global context and local
context through three weak NERs, (iii) HybridSegNGram
learns from global context and local context through local
collocation, and (iv) HybridSegIter learns from pseudo feed-
back iteratively on top of HybridSegNER.
The HybridSegNER method employs three weak NERs
(i.e., m ¼ 3) to detect named entities in tweets, namely,
LBJ-NER [14], Standford-NER [15], and T-NER [3].10
Note
that, the three NERs used in our experiments are not
trained using our tweets data but downloaded from their
corresponding websites. The output of the three NERs
over the annotated tweets are used in HybridSegNER. That
is, the additional context from other unlabeled tweets pub-
lished on the same day are not taken for a fair comparison.
Parameter setting. HybridSegWeb is parameter-free. For
HybridSegNER, a ¼ 0:2 in Eq. (6) and b ¼ 10 in Eq. (5). The
value in Eq. (7) is learned using an objective function in
Eq. (16). Regarding parameter settings for HybridSegNGram,
u ¼ 0:5 in Eq. (10), k ¼ 1:0 in Eqs. (9) and (10). Different
values of in Eq. (12) are evaluated. For HybridSegIter, the
top-K segments in Eq. (15) for adaption is set to K ¼ 50.
The search space for is set to be ½0; 0:95Š with a step 0:05.
6.2 Segmentation Accuracy
Table 3 reports the segmentation accuracy achieved by the
four methods on the two data sets. The results reported for
HybridSegNGram and HybridSegNER are achieved with their
best settings for fair comparison. We make three observa-
tions from the results.
(i) Both HybridSegNGram and HybridSegNER achieve
significantly better segmentation accuracy than
HybridSegWeb. It shows that local context does help
to improve tweet segmentation quality largely.
(ii) Learning local context through weak NERs is more
effective than learning from local word collocation in
improving segmentation accuracy; in particular,
HybridSegNER outperforms HybridSegNGram on both
data sets.
(iii) Iterative learning from pseudo feedback further
improves the segmentation accuracy. The scale of
improvement, however, is marginal. The next sub-
section presents a detailed analysis of HybridSeg for
possible reasons.
We also investigate the impact of Web N-Gram statistics
for HybridSegWeb by using the other two content types: doc-
ument titles and anchor texts. While the segmentation accu-
racy is improved up to 0:797 and 0:801 on SIN, the
performance is degraded to 0:832 and 0:821 on SGE. The sig-
nificant difference in performance indicates the language
mismatch problem. Since the topics in SIN are more general
[1], the specific source like document titles and anchor texts
could be more discriminative for the segmentation. For the
twitter streams that are topic specific like SGE, the language
mismatch problem could become an important concern.
6.3 Method Analysis and Comparison
We first analyze and compare HybridSegNER and
HybridSegNgram because both learn from local context.
Following this, we analyze HybridSegIter for the possible
reasons of the marginal improvement over HybridSegNER.
HybridSegNER. This method learns (rf Eq. (7)) through
objective function (rf Eq. (16)). controls the combination of
global and local contexts. To verify that can be learned
through this objective function, we plot Re and mNERðÞ (rf
Eq. (16)) in Fig. 5. For easy demonstration, we plot the nor-
malized score of mNERðÞ in the figure. Observe that mNERðÞ
is positively correlated with the performance metrics Re on
both data sets. In our experiments, we set the parameter to
be the smallest value leading to the best mNERðÞ, i.e., ¼ 0:5
Fig. 4. Frequency distribution of the annotated NEs.
TABLE 3
Recall of the Four Segmentation Methods
Method SIN SGE
HybridSegWeb 0:758 0:874
HybridSegNGram 0:806 0:907
HybridSegNER 0:857 0:942
HybridSegIter 0:858 0:946
9. http://dumps.wikimedia.org/enwiki/.
10. Due to space constraint, readers are referred to [3], [14], [15] for
details of respective NERs.
LI ET AL.: TWEET SEGMENTATION AND ITS APPLICATION TO NAMED ENTITY RECOGNITION 565
9. on SIN and ¼ 0:7 on SGE. Because is a global factor for all
tweets in a batch and mNERðÞ is computed based on a small
set of seed segments. A larger may not affect the segmenta-
tion of the seed segments because of their confident local
context. But it may cause some other segments to be wrongly
split due to their noisy local context. Observe there is minor
degradation for Re on SIN data set when 0:45 although
mNERðÞ remains the maximum.
HybridSegNGram. This method exploits the local colloca-
tion by using an variant of the absolute discounting based
n-gram model with RLS smoothing (rf. Eq. (10)) and bursty-
based weighting (rf. Eq. (12)). We now study the impact of
the RLS smoothing and bursty-based weighting against dif-
ferent coupling factor for HybridSegNGram. Specifically, we
investigate three methods with different settings:
HybridSegNGram. The method with RLS smoothing
and bursty-based weighting.
HybridSegNGramÀweight. The method with RLS
smoothing but without bursty-based weighting.
HybridSegNGramÀRLS. The method with bursty-based
weighting but without RLS smoothing.
Fig. 6 reports Re of the three methods with different set-
tings on both data sets. The results of HybridSegWeb is
included as a baseline in the figure. Observe that with bursty-
based weighting and RLS smoothing, HybridSegNGram out-
performs HybridSegWeb in a much broader range of values,
compared to the other two alternatives. Specifically,
HybridSegNGram outperforms HybridSegWeb in the ranges of
½0:06; 0:20Š and ½0:06; 0:13Š on SIN and SGE data sets respec-
tively. The figure also shows that HybridSegNGram achieves
more stable results than HybridSegNGramÀRLS and
HybridSegNGramÀweight on both data sets indicating that both
RLS and bursty-based weighting are helpful in achieving bet-
ter segmentation results. HybridSegNGram achieves its best
performance with % 0:1 on both data sets.
HybridSegNER versus HybridSegNGram. In a batch of
tweets, named entities are usually a subset of the recurrent
word combinations (or phrases). Therefore, HybridSegNGram
is expected to detect more segments with local context than
HybridSegNER does. However, a named entity may appear
very few times in a batch. If the appearances are well for-
matted, there is a good chance that HybridSegNER could
detect it, but not so for HybridSegNGram due to the limited
number of appearances. As shown in the results reported
earlier, HybridSegNER does outperform HybridSegNGram.
Furthermore, Table 4 lists the numbers of occurrences of
the named entities that are correctly detected by
HybridSegWeb, HybridSegNER, and HybridSegNGram respec-
tively, along with the percentages of the changes relative to
HybridSegWeb. It shows that HybridSegNER detects more
occurrences of named entities of n-gram on both data sets
when n ¼ 1; 2; 3; 4. The performance of HybridSegNGram,
however, is inconsistent on the two data sets.
To understand the reasons that cause inconsistent perfor-
mance of HybridSegNGram on the two data sets, we conduct
a breakdown of all n-grams in terms of ^PrNGramðsÞ. Fig. 7
shows the distributions of ^PrNGramðsÞ of the two data sets.11
Observe that there are more 2-grams in SGE than in SIN
data set that have ^PrNGramðsÞ 0:5, particularly in the
range of ½0:7; 1:0Š. For n ¼ 3; 4, almost no 3 or 4-grams have
^PrNGramðsÞ 0:4 on SIN data set. As SIN contains tweets
collected from a region while SGE is a collection of tweets
on a specific topic, the tweets in SIN are more diverse in
topics. This makes local collocation hard to capture due to
their limited number of occurrences.
In summary, HybridSegNER demonstrates more stable
performance than HybridSegNGram across different Twitter
streams and achieves better accuracy. HybridSegNGram is
more sensitive to the topic specificity of Twitter streams.
Fig. 5. Re and normalized mNERðÞ values of HybridSegNER with varying in the range of ½0; 0:95Š:
Fig. 6. The impact of on HybridSegNGram on the two data sets.
11. We ignore the n-grams whose ^PrNGramðsÞ is below 0:1:
566 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015
10. Moreover, as observed in Table 4, more than 93 percent
of the named entities detected by HybridSegNGram are also
detected by HybridSegNER. Given this, we investigate the
iterative learning HybridSegIter on top of HybridSegNER
instead of HybridSegNGram.
Iterative Learning with HybridSegIter. As reported in
Table 3, HybridSegIter achieves marginal improvements
over HybridSegNER. Table 5 also shows the results of
HybridSegIter in different iterations. It is also observed that
HybridSegIter quickly converges after two iterations. To
understand the reason behind, we analyze the segments
detected in each iteration. There are three categories of them:
Fully detected segments (FS): all occurrences of the
segments are detected from the batch of tweets.
Their PrðsÞ is further increased by considering their
local context. No more occurrences can be detected
on this category of segments in the next iteration.
Missed segments (MS): not a single occurrence of the
segment is detected from the previous iteration. In
this case, no local context information can be derived
for them to increase their PrðsÞ. They will be missed
in the next iteration.
Partially detected segments (PS): some but not all
occurrences of the segments are detected. For this cat-
egory of segments, local context can be derived from
the detected occurrences. Depending on the local
context, PrðsÞ will be adjusted. More occurrences
may be detected or missed in the next iteration.
Table 6 reports the number of segments and their num-
ber of occurrences in each of the three sets (FS, MS, and PS).
As shown in the table, very few segments are partially
detected after learning from weak NERs in 0th iteration (19
for SIN and 24 for SGE). The possible improvement can be
made in 1st iteration is to further detect the total 25 missed
occurrences in SIN (resp. 67 in SGE), which accounts for
2.03 percent (resp. 1.64 percent) of all annotated NEs in the
data set. That is, the room for further performance improve-
ment by iterative learning is marginal on both data sets.
Consider the SIN data set, on average, there are about six
detected occurrences to provide local context for each of the
19 partially detected segments. With the local context,
HybridSegIter manages to reduce the number of partially
detected segments from 19 to 11 in 1st iteration and the total
number of their missed instances are reduced from 25 to 14.
No changes are observed for the remaining 11 partially
detected segments in iteration 2. Interestingly, the number
of fully detected instances increased by 2 in 2nd iteration.
The best segmentation of a tweet is the one maximizes the
stickiness of its member segments (rf Eq. (1)). The change in
the stickiness of other segments leads to the detection of
these two new segments in the fully detected category, each
occurs once in the data set.
In SGE data set, the 24 partially detected segments
reduce to 12 in 1st iteration. No more change to these 12
partially detected segments are observed in the following
iteration. A manual investigation shows that the missed
occurrences are wrongly detected as part of some other
longer segments. For example, “NSP”12
becomes part of
“NSP Election Rally” and the latter is not annotated as a
named entity. Probably because of its capitalization, “NSP
TABLE 4
Numbers of the Occurrences of Named Entities that Are Correctly Detected by HybridSegWeb, HybridSegNER, and HybridSegNGram,
and the Percentage of Change against HybridSegWeb
SIN data set SGE data set
n HybridSegWeb HybridSegNER HybridSegNGram #Overlap HybridSegWeb HybridSegNER HybridSegNGram #Overlap
1 694 793ðþ14:3%Þ 820ðþ18:2%Þ 767 2889 3006ðþ4%Þ 2932ðþ1:5%Þ 2895
2 232 246ðþ6%Þ 172ðÀ25:9%Þ 158 519 580ðþ11:8%Þ 600ðþ15:6%Þ 524
3 7 12ðþ71:4%Þ 5ðÀ28:6%Þ 4 149 238ðþ59:7%Þ 161ðþ8:1%Þ 143
4 2 6ðþ200%Þ 1ðÀ50%Þ 0 1 4ðþ300%Þ 0ðÀ100%Þ N.A
#Overlap: number of the occurrences that are both detected by HybridSegNER and HybridSegNGram.
Fig. 7. The distributions of n-grams by PrðsÞ for n ¼ 2; 3; 4.
TABLE 5
HybridSegIter up to Four Iterations
SIN data set SGE data set
Iteration Re JSD Re JSD
0 0:857 – 0:942 –
1 0:857 0:0059 0:946 0:0183
2 0:858 0:0001 0:946 0:0003
3 0:858 0 0:946 0 12. http://en.wikipedia.org/wiki/National_Solidarity_Party_
(Singapore).
LI ET AL.: TWEET SEGMENTATION AND ITS APPLICATION TO NAMED ENTITY RECOGNITION 567
11. Election Rally” is detected by weak NERs with strong con-
fidence (i.e., all its occurrences are detected). Due to its
strong confidence, “NSP” therefore cannot be separated
from this longer segment in next iteration regardless set-
ting. Although “NSP Election Rally” is not annotated as a
named entity, it is indeed a semantically meaningful
phrase. On the other hand, a large portion of the occur-
rences for the 12 partially detected segments have been
successfully detected from other tweets.
Compared to the baseline HybridSegWeb which does not
take local context, HybridSegIter significantly reduces the
number of missed segments, from 195 to 152 or 22 percent
reduction on SIN data set, and 20 percent reduction on SGE
data set from 140 to 112. Many of these segments are fully
detected in HybridSegIter.
6.4 Named Entity Recognition
We next evaluate the accuracy of named entity recognition
based on segments. Section 5 presents two NER methods,
namely random walk-based (RW-based) and POS-based
NER. Through experiments, we aim to answer two ques-
tions: (i) which one of the two methods is more effective, and
(ii) does better segmentation lead to better NER accuracy?
We evaluate five variations of the two methods, namely
GlobalSegRW , HybridSegRW , HybridSegPOS, GlobalSegPOS,
and UnigramPOS.13
Here GlobalSeg denotes HybridSegWeb
since it only uses global context, and HybridSeg refers to
HybridSegIter, the best method using both global and local
context. The subscripts RW and POS refer to the RW-based
and POS-based NER (see Section 5).
The method UnigramPOS is the baseline which uses
words (instead of segments) and POS tagging for NER. Sim-
ilar to the work in [19], we extract noun phrases from the
batch of tweets as named entities using regular expression.
The confidence of a noun phrase is computed using a modi-
fied version of Eq. (18) by removing its first component.
Evaluation Metric. The accuracy of NER is evaluated by
Precision (P), Recall (R),14
and F1. P is the percentage of the
recognized named entities that are truly named entities; R
is the percentage of the named entities that are correctly rec-
ognized; and F1 ¼ 2 Á P Á R=ðP þ RÞ. The type of the named
entity (e.g., person, location, and organization) is ignored.
Similar to the segmentation recall measure, each occurrence
of a named entity in a specific position of a tweet is consid-
ered as one instance.
NER Results. Table 7 reports the NER accuracy of the five
methods. Because all five methods are unsupervised and
consider the top-K ranked segments as named entities, the
results reported is the highest F1 of each method achieved
for varying K 50 following the same setting in [1]. The
results show that tweet segmentation greatly improves
NER. UnigramPOS is the worst performer among all meth-
ods. For a specific NER approach, either Random Walk
or POS based, better segmentation results lead to better
NER accuracy. That is, HybridSegRW performs better than
GlobalSegRW and HybridSegPOS performs better than
GlobalSegPOS. Without local context in segmentation
GlobalSegPOS is slightly worse than GlobalSegRW by F1.
However, with better segmentation results, HybridSegPOS
is much better than HybridSegRW . By F1 measure,
HybridSegPOS achieves the best NER result. We also observe
that both the segment-based approaches HybridSegPOS and
HybridSegRW favor the popular named entities. The average
frequency for correctly/wrongly recognized entities is 4:65
TABLE 6
Fully Detected, Missed, and Partially Detected Segments for HybridSegIter (Three Iterations) and HybridSegWeb
Data set SIN data set SGE data set
Method/
Iteration
Fully
detected
Missed Partially detected Fully
detected
Missed Partially detected
#NE #Occ #NE #Occ #NE #Det #Miss #NE #Occ #NE #Occ #NE #Det #Miss
0 581 944 146 152 19 113 25 295 1464 94 168 24 2374 67
1 581 959 154 163 11 98 14 291 1858 110 191 12 1996 28
2 583 961 152 161 11 98 14 289 1856 112 193 12 1996 28
HybridSegWeb 504 647 195 214 47 113 85 234 708 140 336 40 2850 179
#NE: number of distinct segments, #Occ: number of occurrences, #Det: number of detected occurrences, #Miss: number of missed occurrences.
TABLE 7
Accuracy of GlobalSeg and HybridSeg with RW and POS
SIN dataset SGE dataset
Method P R F1 P R F1
UnigramPOS 0:516 0:190 0:278Ã
0:845 0:333 0:478Ã
GlobalSegRW 0:576 0:335 0:423Ã
0:929 0:646 0:762Ã
HybridSegRW 0:618 0:343 0:441Ã
0:907 0:683 0:779Ã
GlobalSegPOS 0:647 0:306 0:415Ã
0:903 0:657 0:760Ã
HybridSegPOS 0:685 0:352 0:465 0:911 0:686 0:783
The best results are in boldface. *indicates the difference against the best F1 is statistically significant by one-tailed paired t-test with p 0:01.
13. GlobalSegRW is the method named TwiNER in [1].
14. Note R and Re are different: Re defined in Section 6.1 measures
the percentage of the manually annotated named entities that are cor-
rectly split as segments.
568 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015
12. and 1:31 respectively based on results of HybridSegPOS on
SIN. It is reasonable since the higher frequency leads to
strong gregarious property for the random walk approach.
Also, more instances of the named entity results in a better
POS estimation for POS based approach.
For comparison, Table 8 reports the performance of the
three weak NERs on the two data sets. Compared with
results in Table 7, all three weak NERs perform poorly on
both data sets.
Precision@K. Fig. 8 plots the Precision@K for the five
methods on the two data sets with varying K from 20 to
100. The Precision@K reports the ratio of named entities
among the top-K ranked segments by each method. Note
that, Precision@K measures the ranking of the segments
detected from a batch of tweets; the individual occurrences
of each segment in the ranking are not considered. This is
different from the measures (e.g., Pr) reported in Table 7
where the occurrences of the named entities are considered
(i.e., whether a named entity is correctly detected at a spe-
cific position in a given tweet).
As observed in Fig. 8, on SIN data set, all methods using
POS tagging for NER enjoy much better precision. RW
based methods deliver much poorer precisions due to the
lack of co-occurrences in the tweets. As shown in Table 2,
82 percent of the annotated named entities appear only
once in SIN. Among the three POS based methods,
HybridSegPOS dominates the best precisions on all K values
from 20 to 100. On SGE data set, the differences in preci-
sions between POS based methods and RW based meth-
ods become smaller compared to those on SIN data set.
The reason is that in SGE data set, about 39 percent
of named entities appear more than once, which gives
higher chance of co-occurrences. Between the two best
performing methods HybridSegPOS and GlobalSegPOS, the
former outperforms the latter on six K values plotted
between 40 and 90.
7 CONCLUSION
In this paper, we present the HybridSeg framework which
segments tweets into meaningful phrases called segments
using both global and local context. Through our frame-
work, we demonstrate that local linguistic features are more
reliable than term-dependency in guiding the segmentation
process. This finding opens opportunities for tools devel-
oped for formal text to be applied to tweets which are
believed to be much more noisy than formal text.
Tweet segmentation helps to preserve the semantic
meaning of tweets, which subsequently benefits many
downstream applications, e.g., named entity recognition.
Through experiments, we show that segment-based named
entity recognition methods achieves much better accuracy
than the word-based alternative.
We identify two directions for our future research. One is
to further improve the segmentation quality by considering
more local factors. The other is to explore the effectiveness of
the segmentation-based representation for tasks like tweets
summarization, search, hashtag recommendation, etc.
ACKNOWLEDGMENTS
This paper was an extended version of two SIGIR confer-
ence papers [1], [2]. Jianshu Weng’s contribution was done
while he was with HP Labs Singapore.
REFERENCES
[1] C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee,
“Twiner: Named entity recognition in targeted twitter stream,” in
Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012,
pp. 721–730.
[2] C. Li, A. Sun, J. Weng, and Q. He, “Exploiting hybrid contexts for
tweet segmentation,” in Proc. 36th Int. ACM SIGIR Conf. Res.
Develop. Inf. Retrieval, 2013, pp. 523–532.
[3] A. Ritter, S. Clark, Mausam, and O. Etzioni, “Named entity recog-
nition in tweets: An experimental study,” in Proc. Conf. Empirical
Methods Natural Language Process., 2011, pp. 1524–1534.
[4] X. Liu, S. Zhang, F. Wei, and M. Zhou, “Recognizing named enti-
ties in tweets,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguis-
tics: Human Language Technol., 2011, pp. 359–367.
[5] X. Liu, X. Zhou, Z. Fu, F. Wei, and M. Zhou, “Exacting social
events for tweets using a factor graph,” in Proc. AAAI Conf. Artif.
Intell., 2012, pp. 1692–1698.
[6] A. Cui, M. Zhang, Y. Liu, S. Ma, and K. Zhang, “Discover break-
ing events with popular hashtags in twitter,” in Proc. 21st ACM
Int. Conf. Inf. Knowl. Manage., 2012, pp. 1794–1798.
[7] A. Ritter, Mausam, O. Etzioni, and S. Clark, “Open domain event
extraction from twitter,” in Proc. 18th ACM SIGKDD Int. Conf.
Knowledge Discovery Data Mining, 2012, pp. 1104–1112.
[8] X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang, “Entity-
centric topic-oriented opinion summarization in twitter,” in Proc.
18th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining,
2012, pp. 379–387.
TABLE 8
Accuracy of the Three Weak NERs, Where *Indicates the
Difference against the Best F1 is Statistically Significant by
One-Tailed Paired t-Test with p 0:01
SIN data set SGE data set
Method P R F1 P R F1
LBJ-NER 0.335 0.357 0:346Ã
0.674 0.402 0:504Ã
T-NER 0.273 0.523 0:359Ã
0.696 0.341 0:458Ã
Stanford-NER 0.447 0.448 0:447 0.685 0.623 0:653
Fig. 8. Precision@K on two data sets.
LI ET AL.: TWEET SEGMENTATION AND ITS APPLICATION TO NAMED ENTITY RECOGNITION 569
13. [9] Z. Luo, M. Osborne, and T. Wang, “Opinion retrieval in twitter,”
in Proc. Int. AAAI Conf. Weblogs Social Media, 2012, pp. 507–510.
[10] X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang, “Topic senti-
ment analysis in twitter: a graph-based hashtag sentiment classifi-
cation approach,” in Proc. 20th ACM Int. Conf. Inf. Knowl. Manage.,
2011, pp. 1031–1040.
[11] K.-L. Liu, W.-J. Li, and M. Guo, “Emoticon smoothed language
models for twitter sentiment analysis,” in Proc. AAAI Conf. Artif.
Intell., 2012, pp. 1678–1684.
[12] S. Hosseini, S. Unankard, X. Zhou, and S. W. Sadiq, “Location ori-
ented phrase detection in microblogs,” in Proc. 19th Int. Conf. Data-
base Syst. Adv. Appl., 2014, pp. 495–509.
[13] C. Li, A. Sun, and A. Datta, “Twevent: segment-based event detec-
tion from tweets,” in Proc. 21st ACM Int. Conf. Inf. Knowl. Manage.,
2012, pp. 155–164.
[14] L. Ratinov and D. Roth, “Design challenges and misconcep-
tions in named entity recognition,” in Proc. 13th Conf. Comput.
Natural Language Learn., 2009, pp. 147–155.
[15] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating non-
local information into information extraction systems by Gibbs
sampling,” in Proc. 43rd Annu. Meeting Assoc. Comput. Linguistics,
2005, pp. 363–370.
[16] G. Zhou and J. Su, “Named entity recognition using an hmm-
based chunk tagger,” in Proc. 40th Annu. Meeting Assoc. Comput.
Linguistics, 2002, pp. 473–480.
[17] K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills, J.
Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A.
Smith, “Part-of-speech tagging for twitter: annotation, features,
and experiments,” in Proc. 49th Annu. Meeting. Assoc. Comput.
Linguistics: Human Language Technol., 2011, pp. 42–47.
[18] B. Han and T. Baldwin, “Lexical normalisation of short text mes-
sages: Makn sens a #twitter,” in Proc. 49th Annu. Meeting. Assoc.
Comput. Linguistics: Human Language Technol., 2011, pp. 368–378.
[19] F. C. T. Chua, W. W. Cohen, J. Betteridge, and E.-P. Lim,
“Community-based classification of noun phrases in twitter,” in
Proc. 21st ACM Int. Conf. Inf. Knowl. Manage., 2012, pp. 1702–1706.
[20] S. Cucerzan, “Large-scale named entity disambiguation based on
wikipedia data,” in Proc. Joint Conf. Empirical Methods Natural Lan-
guage Process. Comput. Natural Language Learn., 2007, pp. 708–716.
[21] D. N. Milne and I. H. Witten, “Learning to link with wikipedia,”
in Proc. 17th ACM Int. Conf. Inf. Knowl. Manage., 2008, pp. 509–518.
[22] S. Guo, M.-W. Chang, and E. Kiciman, “To link or not to link? a
study on end-to-end tweet entity linking,” in Proc. Conf. North
Amer. Chapter Assoc. Comput. Linguistics: Human Language Technol.,
2013, pp. 1020–1030.
[23] A. Sil and A. Yates, “Re-ranking for joint named-entity recognition
and linking,” in Proc. 22nd ACM Int. Conf. Inf. Knowl. Manage.,
2013, pp. 2369–2374.
[24] J. Gao, M. Li, C. Huang, and A. Wu, “Chinese word segmentation
and named entity recognition: A pragmatic approach,” in Comput.
Linguist., vol. 31, pp. 531–574, 2005.
[25] Y. Zhang and S. Clark, “A fast decoder for joint word segmen-
tation and pos-tagging using a single discriminative model,”
in Proc. Conf. Empirical Methods Natural Language Process., 2010,
pp. 843–852.
[26] W. Jiang, L. Huang, and Q. Liu, “Automatic adaption of annota-
tion standards: Chinese word segmentation and pos tagging—A
case study,” in Proc. Joint Conf. 47th Annu. Meeting ACL 4th Int.
Joint Conf. Natural Language Process. AFNLP, 2009, pp. 522–530.
[27] X. Zeng, D. F. Wong, L. S. Chao, and I. Trancoso, “Graph-
based semi-supervised model for joint chinese word segmenta-
tion and part-of-speech tagging,” in Proc. Annu. Meeting Assoc.
Comput. Linguistics, 2013, pp. 770–779.
[28] W.Jiang,M.Sun,Y.L€u,Y.Yang,and Q. Liu,“Discriminative learn-
ing with natural annotations: Word segmentation as a case study,”
inProc.Annu.MeetingAssoc.Comput.Linguistics,2013,pp.761–769.
[29] R. Mihalcea and A. Csomai, “Wikify!: linking documents to ency-
clopedic knowledge,” in Proc. 16th ACM Conf. Inf. Knowl. Manage.,
2007, pp. 233–242.
[30] W. Wu, H. Li, H. Wang, and K. Q. Zhu, “Probase: A probabilistic
taxonomy for text understanding,” in Proc. ACM SIGMOD Int.
Conf. Manage. Data, 2012, pp. 481–492.
[31] K. Wang, C. Thrasher, E. Viegas, X. Li, and P. Hsu, “An over-
view of microsoft web n-gram corpus and applications,” in
Proc. HLT-NAACL Demonstration Session, 2010, pp. 45–48.
[32] F. A. Smadja, “Retrieving collocations from text: Xtract,” Comput.
Linguist., vol. 19, no. 1, pp. 143–177, 1993.
[33] H. Ney, U. Essen, and R. Kneser, “On structuring probabilistic
dependences in stochastic language modelling,” Comput. Speech
Language, vol. 8, pp. 1–38, 1994.
[34] S. F. Chen and J. Goodman, “An empirical study of smoothing
techniques for language modeling,” in Proc. 34th Annu. Meeting
Assoc. Comput. Linguistics, 1996, pp. 310–318.
[35] F. Peng, D. Schuurmans, and S. Wang, “Augmenting naive bayes
classifiers with statistical language models,” Inf. Retrieval, vol. 7,
pp. 317–345, 2004.
[36] K. Nishida, T. Hoshide, and K. Fujimura, “Improving tweet
stream classification by detecting changes in word probability,” in
Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012,
pp. 971–980.
[37] X. Wang, A. McCallum, and X. Wei, “Topical n-grams: Phrase and
topic discovery, with an application to information retrieval,” in
Proc. IEEE 7th Int. Conf. Data Mining, 2007, pp. 697–702.
Chenliang Li received the PhD degree from
Nanyang Technological University, Singapore, in
2013. He is an associate professor at the State
Key Laboratory of Software Engineering, Com-
puter School, Wuhan University, China. His
research interests include information retrieval,
text/web mining, data mining, and natural lan-
guage processing. His papers appear in SIGIR,
CIKM, and the Journal of the Association for Infor-
mation Science and Technology.
Aixin Sun received PhD degree from the School of
Computer Engineering, Nanyang Technological
University, Singapore, where he is an associate
professor. His research interests include informa-
tion retrieval, text mining, social computing, and
multimedia. His papers appear in major interna-
tional conferences like SIGIR, KDD, WSDM, ACM
Multimedia, and journals including Data Mining and
Knowledge Discovery, IEEE Transactions on
Knowledge and Data Engineering, and the Journal
of the Association for Information Science and
Technology.
Jianshu Weng received PhD degree from
Nanyang Technological University, Singapore, in
2008. He is a consultant with Accenture Analytics
Innovation Center, Singapore. Before joining
Accenture, he was a researcher with HP Labs,
Singapore. His research interests include social
network analysis, text mining, opinion mining, col-
laborative filtering, and trust-aware resource
selection.
Qi He received the PhD degree from Nanyang
Technological University in 2008, after that he
conducted two-year postdoctoral research at
Pennsylvania State University for CiteSeerX, fol-
lowed by a research staff member position at IBM
Almaden Research Center till May 2013. He is a
senior researcher at Relevance Science, LinkedIn.
His research interests include information
retrieval/extraction, recommender systems, social
network analysis, data mining, and machine learn-
ing. He served as general cochair of ACM CIKM
2013. He received the IBM Almaden Service Research Best Paper
Award and the IBM Invention Achievement Award in 2012, Best Applica-
tion Paper Award at SIGKDD 2008, and Microsoft Research Fellowship
in 2006. He is a member of the IEEE and the ACM.
For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
570 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015