tweet segmentation

Tweet Segmentation And Its
Application To Name Entity
Recognition
Presented by:-
1) Prashant B. Tarone

CONTENTS
Introduction
Existing system
Proposed system
Modules
Architecture
Advantages
Disadvantages
Requirements
Future scope
Conclusion
Reference

Introduction
Online social and news media generate rich and timely information
about real-world events of all kinds. However, the huge amount of data
available, along with the breadth of the user base, requires a substantial
effort of information. successfully drill down to relevant topic sand events.
Social Networking Site is the phrase used to describe any Web site that
enables users to create public profiles. Using social networking site we can
follow the peoples, can make friends. We can see their tweets, posts and
can comment on it. Social media is becoming accurate sensors of real
world events.

Existing system
Implementing the summarization is not a very easy task as the large
amount of the tweets are senseless, meaningless, may contain noise which
must be discarded. The tweets are also posted at the different times. The new
tweets are also emerging continuously so the time must be recorded so that
when they are posted. The three issues must be taken into consideration, which
are
Efficiency: the algorithm must be very efficient.
Flexibility: the algorithm must be flexible.
The previous algorithms are not efficient to deal with the above three
issues. The previous algorithms are mainly used to deal with the small streams
of data sets which are static in nature so they cannot be used to deal the large
data sets which are dynamic in nature.

Proposed system
In proposed work we are doing the segmentation part which is so much
important that case if someone tweets as politics Business Sports so that time
tweet stored on that particular category. It perform multi-segmentation. In
proposed system we are providing the security the facility of blocking user id, it
means the user who tweet some irrelevant some comment or post on twitter
public. In proposed system we are using K-Means algorithm where it filter the
segmentation on different number of fields.

Modules
1)Registration Form :
Users are registered to use the social networking site. Only registered
users are allowed to use this social networking service.
2)Login Form :
Only registered user are allowed to login in the social
networking site.
3)Data Mining (Clustering):
1)Bollywood messages
2)Business messages
3)Education messages
4)Politics messages
5)Sports messages

Architecture
Registration form
Login form
Database Data Mining

Advantages
1) Twitter message are public :-
Twitter Message Are public that is they are directly available with no
privacy limitations. Every user having the permission to access it for read and
write as well as it is also possible that they can give their views about multiple
users which are called Opinion Nining.
2) It performs Multi-Segmentation:-
It means the number users tweet on different means so at that time the
tweet will be stored by default on particular category.
3) We can developing the logical protocol which helps for the security of social
networking sites.

Disadvantages
1) Static and small size:-
They mainly focus on Static and small sized data sets, and hence
are not efficient and scalable for large data and data streams.
2) Database is small size.

Requirements
Software Requirement:-
1) Operating System- Windows XP
2) Language / Front end – java (jdk 6.0)
3) Back end / Database – My Sql
Hardware Requirement:-
1) Ram 512 MB
2) Hard Disk 80 GB
3) System

Future Scope
This software design by using logical protocol. If this software is
used in real time work in social media sites, then illegal work is stopped. And
this is to be good. Illegal work is stopped and good work to be start in social
sites.
Tweet segmentation assists in staying the semantic meaning of
tweets, which consequently benefits of downstream applications, e. g.,NER.
Segment-based known as entity recognition methods achieves much better
correctness than the word-based alternative.

Conclusion
In this paper, we present the HybridSeg framework which segments tweets
into meaningful phrases called segments using both global and local context.
Through our framework, we demonstrate that local linguistic features are more
reliable than term-dependency in guiding the segmentation process. This finding
opens opportunities for tools developed for formal text to be applied to tweets
which are believed to be much more noisy than formal text. Tweet segmentation
helps to preserve the semantic meaning of tweets, which subsequently benefits
many downstream applications,e.g.,named entity recognition.Through
experiments, we show that segment-based named entity recognition methods
achieves much better accuracy than the word-based alternative. We identify two
directions for our future research.

References
1)A.Ritter,S.Clark,Mausam,and Etzioni, “Named entity recognition
In tweets: An experimental study,”in Proc.Conf.Empirical Methods Natural
Language Process.
2)www.google.com
3)www.Wikipedia.com

tweet segmentation

More Related Content

What's hot

Similar to tweet segmentation

Recently uploaded

tweet segmentation