Project report

1
(I)
TABLE OF CONTENTS
Chapter No. Topics Page No.
Student Declaration II
Certificate from the Supervisor III
Acknowledgement IV
Summary (Not more than 250 words) V
List of Figures VI
List of Tables VII
List of Symbols and Acronyms VIII
Chapter-1 Introduction 10-13
1.1 General Introduction
1.2 List some relevant current/open problems.
1.3 Problem Statement
1.4 Overview of proposed solution approach and Novelty/benefits
1.5 Give tabular comparison of other existing approaches/ solution to
the problem framed
Chapter-2 Literature Survey 14-17
2.1 Summary of papers studied
2.2 Integrated summary of the literature studied
Chapter 3: Analysis, Design and Modeling 18-21
3.1 Overall description of the project
3.2 Functional requirements
3.3 Non Functional requirements
3.4 Logical database requirements
3.5 Design Diagrams
3.3.1Use Case diagrams

2
3.3.2 Class diagrams / Control Flow Diagrams
3.3.3 Sequence Diagram/Activity diagrams
Chapter-4 Implementation and Testing 22-25
4.1 Implementation details and issues
4.1.1 Implementation Issues
4.1.2 Algorithms (Module wise- with respect to design)
4.2 Risk Analysis and Mitigation
Chapter-5 Testing (Focus on Quality of Robustness and Testing) 26-28
5.1 Testing Plan
5.2 Component decomposition and type of testing required
5.3 List all test cases
5.4 Error and Exception Handling
5.5 Limitations of the solution
Chapter-6 Findings & Conclusion 29-29
5.1 Findings
5.2 Conclusion
5.3 Future Work
References 30-30
Brief Bio-data (Resume) 31-31

3
(II)
DECLARATION
I hereby declare that this submission is my own work and that, to the best of my knowledge and
belief, it contains no material previously published or written by another person nor material which
has been accepted for the award of any other degree or diploma of the university or other institute of
higher learning, except where due acknowledgment has been made in the text.
Place: Noida Signature:
Date: 04-06-2015 Name:Utkarsh
Enrollment No:9911103587

4
(III)
CERTIFICATE
This is to certify that the work titled Sentiment Analysis of Opinions submitted by Utkarsh in
partial fulfillment for the award of degree of B. Tech of Jaypee Institute of Information
Technology University, Noida has been carried out under my supervision. This work has not been
submitted partially or wholly to any other University or Institute for the award of this or any other
degree.
Signature of Supervisor
Name of Supervisor Mr. Sudhanshu Kulshrestha
Designation Asst. Prof., Deptt. of CSE
Date 04-06-2015

5
(IV)
ACKNOWLEDGEMENT
The satisfaction which accompanies the successful completion of any project is incomplete without
the mention of names of those who made it possible, because success is the epitome of hard work,
perseverance, undeterred courage, zeal, determination and the most encouraging guidance and
advice which serve as the beacon light and crown our effort with success.
I would like to thank Mrs. Shelly Sachdeva(Project Coordinator) for constructive instructions and
appreciation.
I am deeply indebted to Mr. Sudhanshu Kulshrestha(Project Guide) for his constant guidance,
constructive consoling and unfailing encouragement throughout the completion of this project.
I would also like to thank other faculty of CSE department for their continuous help for effective
implementation of this project and also for finalization of this project report.
Lastly, I thank our families for their support and encouragement.
Signature of the Student
Name of Student Utkarsh
Enrollment Number 9911103587
Date 04-06-2015

6
(V)
SUMMARY
The Sentiment Analysis of Opinions is one of the works in Natural Language Processing and there
are various open problems exist in this field of study. In this project, the Problems is To detect
sentiments and output the scores for the overall sentiments in the given text.This project is about
detecting sentiments in a opinions/opinions given as text in simple English. It gives scores positive
if overall sentiments of the given text are positive and negative if overall sentiments of the given
text is negative otherwise zero for neutral. It is based on linguistic approach using one of the
modules of open source python library called NLTK. The other methods which are available, like
naive bias classifiers can also be used for detecting and mentioning sentiments. This project is
written in python 2.7 language using IDLE as editor and tkinter module is used to get Text as input
and to Display its output in a separate window.
__________________ __________________
Signature of Student Signature of Supervisor
Name: Utkarsh Name :Mr.Sudhanshu Kulshrestha
Date:04 - 06 - 2015

7
(VI)
LIST OF FIGURES
Sentiment Analysis Page Number 17
Use cases diagrams Page Number 20
Class diagrams Page Number 20
Activity or Flow diagrams Page Number 21
IG diagram Page Number 24

8
(VII)
LIST OF TABLES
Tables Page Number
1.Tabular comparison of other existing approaches/ solution to the 15
problem framed
3.Risk analysis and mitigation plan 23
4.Component testing 24
5.Top risk on the basis of IG diagram 24
6.Mitigation Approaches 25
7.Additional resources needed for mitigation 25
8.Testing Plan 26
9.Testing Team Details 27
10.Test Environment 27
11.Component decomposition and type of testing 28
12.List of all test cases 28

9
(VIII)
LIST OF SYMBOLS & ACRONYMS
1.NLTK: Natural Language Tool Kit
2.NLP:Natural Language Processing.
3.IDLE:Python Editor.
4.IDE :Integrated Development environment
5.OS: operating system
6.DOS: Disk operating systems

10
Chapter-1 Introduction
1.1 General Introduction
The Sentiment analysis is part of natural language processing. Natural language Processing is used
for data analytics purpose, to extract meaningful information from lots of data. This is one of the
methods to get information about current trend in the market of what people are thinking or talking
on social media. There are so many practical applications present in the current world like in
election which party is favourable or gaining popularity or a customer watching for reviews before
actually buying something online. These are few of the applications which are getting harder to
solve as size of data keeps on increasing. Big part goes to arrange this data into something
meaningful before analysing it. This part of arrangement of data is called Text Classification.
Sentiment classification and analysis is performed in python using nltk module. Python has special
module NLTK to do tasks in natural language processing. It supports multiple languages like
English, Hindi, Chinese etc to do classification of text or data into something meaningful.
Text Classification can be performed in following ways:
1. Sentiment-Classification
2. Features-based-Sentiment-classification
3. Summarization-of-sentiments
These classifications classify the complete document in accordance with the sentiments or opinions
listed in the text. Feature based approach however, classifies the sentiments based on specifications
of the entity(Noun) listed in the text. This approach reveals about good or bad quality about certain
entities based on the details listed with it. Opinions summarization is similar to text summarization
but opinion summarization gives a clear indication about the sentiment attached with the text. It
outputs the sentiment precisely not in the form of substring of the given text, It mentions the text in
the positive or negative words about the entities so that a whole document can be best described in
few words without losing the abstract of the document. These types of classification can be
performed before actually analysing the text. After text classification, it performs tagging with the
words.
Sentiment classification can be performed at different level.
1. Document Level

11
2. Sentence level
3. Word level
English is one of the most preferred language to work for natural language processing. This project
is based on opinions in English language, does not support other languages at all.
Consider an example : "I watched the movie burger. The movie was very good and the actor did an
awesome job."
"When Modi returned from U.S.A., I got my 15 lakhs as promised by PM Modi"
It clearly tells about the movie and the actor stating positive review. However the sentiment
classifier is still not able to classify sarcasm. It is still a big problem for data analytics and a topic of
research. How to perform this in a machine language is much harder. There are approaches which
perform such operations
1. Linguistic approach
2. Machine Learning
1.2 Current Open Problems/Issues
1. Linguistic Approach:
It the basic approach to deal with the sentiment analysis. It uses tagging technique with the
tokens and then starts analyzing it. The problems with this approach is
 Negation: This approach can not deal with negation very well. Few times, this
approach produces opposite in sense result .
 Grammatically incorrect sentences: This approach uses datasets to match words
during tagging. So if a sentences with polarity is formed grammatically incorrect, it
is not possible to match it with the existing datasets of polar words. The datasets
must contain all the polar words used in regular language to make it more efficient.
 Sometimes, users say something but mean something else type sentences in the text
which make sense but not analyzable by a machine.
2. Machine-Learning-approach: There are methods which classify the text like naive_bias or
S_V_M also suffers from problems like:
1. Sarcasm:.
2. Jumbling of words

12
3. Chatting text or tweets: Limited words to type
1.3 Problem Statement
To detect the given text as input, perform analysis on the data and show the score of the polarity of
input text. The score shows the polarity of the text. If it is greater than zero means sentiment is
positive else negative..
The input will be taken from the user in string format. After inputing the string, the approach used
in this project classify/toekenize the text in tokens. When tokenization is completed, it starts
operation of tagging to each token and then evaluate it. This generates a score which after
conversion from integer to string is displayed on the screen.
1.4 Overview of proposed solution approach and Novelty/benefits
Proposed solution is built in python 2.7 using nltk module and tkinter to take input from the user
and to display output. It is based on linguistic approach. It takes the string, tokenized it and matches
the tokens with the datasets in database with the tags added with tokens. Finally it evaluates the
score for each polar words and calculates the score for the given text.
The file contains code is divided into classes:
 splitter_class: The given input is in string format. The whole paragraph can not be evaluated
as it it. First this class splits the text into tokens/words using tokenizer function of nltk
module.
 pos_tagger_class: When splitting is done, this class adds tags to the each tokens so that these
tokens can be classified as verbs or nouns or adverbs or adjectives etc. This class does the
tagging work and returns the tagged sentences.
 dicionary_tagger_class: This class uses the datasets available with the project to make a
dictionary for all the tokens tokenized by splitter_class and make a dictionary of tagged
tokens.
1.5 Give tabular comparison of other existing approaches/ solution to the
problem framed
Linguistic_approach Simple approach, easy to code and good results

13
with simple texts
Naive_bias_classifier approach Machine_Learning approach, works by first
learning the text then evaluating the other part of
text based on the learning outcome, outperforms
linguistic approach
Support_Vector machine approach Machine_Learning approach, works by data
analysis and finding patterns in the data to
evaluate. Gives a better classification, Better
Results.

14
Chapter-2 Literature Studied
2.1 Summary of Papers
 Paper-1 : Sentiment Analysis And Opining Mining
By: G.Vinodhini and RM.Chandrasekaran [June 6,2012]
Department of Computer Science and Engineering, Annamalai
University, Annamalai Nagar-608002.
Summary
The big volume of data present on internet today consisting of regular updating and increasing in
size of social networks, news, entertainment, reviews, blogs, discussions forums provides a large
number of opinions. The data analytics focus of these opinions for sentiment analysis work.
Researchers are currently working to build a software to detect and classify the texts available
online. The precise information extracted from these type of resources present on internet today can
give us lots of information about user's liking, disliking, what they want or do not want to buy and it
can be used by the other party to take advantage of this information to provide better deals to the
users or help users to get better deals in case of reviews. The data available on internet after
classification and analyzing can be very valuable to the users.
This paper detailed about the survey describing about the methods in data analytics and the
problems exist in the area of data analytics /sentiment analysis.
Weblink- http://www.dmi.unict.it/~faro/tesi/sentiment_analysis/SA2.pdf
 Paper-2 : Boost up! Sentiment Categorization with Machine
Learning Techniques
By: Andrés Cassinelli, Chih-Wei Chen [ June 5,2009]
Summary
To calculate the sentiment of a given text or opinion or review, it is noted that methods have an
analysis nearly same to the past works in data analytics in reviews or sentiment analysis, it works
precisely in a better way. If these methods are applied to the multi-classfication techniques, the
results could be quite same. On applying classification techniques on the data, it first uses the data
as training set to train itself and the evaluates the rest of the data, so the technique mentioned in the
paper describes the relationship between the objects in an efficient way.
Weblink- http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
 Paper-3: Twitter as a Corpus for Sentiment Analysis and Opinion
Mining
By: Alexander Pak, Patrick Paroubek [2010]
Université de Paris-Sud, Laboratoire LIMSI-CNRS, Bâtiment 508,F-
91405 Orsay Cedex, France

15
Summary
Today Social network sites like twitter, facebook, google plus, linkedin etc are famous tools to
communicate with other people on internet. Thousands of people shares information with each
other. This information may be useful for some or waste data for some. If properly analysed, this
data could be very useful for some purposes. It may be in the form of opinions or results to others.
So these social sites can be very effective in generating information (also useful) about so many
aspects in today's life for human. But there is less work done in recent times because these social
networking sites came into existence shortly. In this paper, the author specifies the details using
Twitter, one of the most famous social network in present world, for the works of sentiment
analysis.
Weblink: http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf
2.2 Integrated Summary of the literature studied
Sentiment analysis is currently one of the popular topic in research field.There are various works
going on in this area for different languages not studied until now like Arabic, Hindi, Thai etc.
There are various open source libraries available for different languages like python, R etc which
makes the work easy to analyze the text and process it. It can be used for various purposes like in
reviewing movies, products of a companies, about companies, feeling or emotions of citizens for a
country. The most popular way to get this information on social media and analyze it. To make it
into something meaningful sense, the classifier techniques must be used.
The data must be in readable format, in English. The classifiers are used to tokenize of classify the
data. The SuperWised learning technique is used with machine learning approach to detect
sentiments and analyze the sentiments of the rest of the text . Un-Superwised learning is linguistic
approach in which text is first tokenized into tokens and added with tags to evaluate the sentiments
of the text.
How to get lots of data to evaluate:
 Social sites
1. Facebook.com
2. Twitter.com
3. LinkedIn.com
 News websites and comments
 Movie Reviewing sites
 Products selling sites
1. Flipkart
2. Snapdeal
 Blogs etc
Techniques used presently are:
 Machine learning
1. Naive_Bias_classifier
2. Suppport_Vector_Machine
3. Decision_tree

16
Text Structure:
 A array of sents/sentences
 Each sent is again tokenized called tokens
 Each word or token is padded with 2 other tags in dictionary format. These added tags make
each token to be recognized as verbs, nouns, adjectives, adverbs etc to verify if that token is
polar word or not.
 Separate datasets are there so that each token can be matched with words present in the
datasets.
First, collection of data is a concern.Useful data is what is required before analysing the
data.Sentiment analysis is performed on the data which is about a product or review and user wants
to know about if it is good or not. Sentiments can have various types of polarity or emotions about
something particular.
Summarizing the opinions is also one of the great concern for today's reseachers. summarizing the
sentiments does not deal with subset of text or its one part of text to be printed. It is printing the
data with a precise sense in fewer number of words and it also contains the subject of the text.

17
opinionative words or phrases
Features
Fig1Sentiment Classification and analysis.
Product Reviews
Sentiment_identific
ation
Feature_Selection
Sentiment_classification
Sentiment Polarity

18
Chapter-3 Analysis, Design and Modeling
3.1 Overall description of the project
3.1.1 Introduction
This software is built on windows(8) platform 64 bit, Python version -2.7 ,32 bit system.
It uses "nltk" module which can be downloaded from nltk.org . The input section uses
tkinter module to get the input and to display the output. Tkinter must be downloaded first
to run this on any system . All the listed Setups above are available free on the python
official website.
 Purpose
This software can be used by any user who wants to analyze movie reviews or
product reviews or any opinions in positive or negative.
 Scope
The opinions must be in English and simple words. It does not support other
languages. It may not handle sarcastic or negation well. So in that case, result may
vary or unexpected.
 Product perspective
This software doesn't depend on any other hardware of software other than resources
provided by a system. Python setup with nltk and tkinter module do all the work
required.
 Product functions
This software takes a string typed by user and produces the sentiment score . The
user needs to type a string and wait for the output. Output may take some time for
processing depends on the size of text typed by the user.
 User characteristics
The users can be anyone who wants to analyse data on the basis of polarity of the
sentences. It works in the same way for each user and execution time of
text_processing depends on the size of data given to the software by user.
 Constraints
The user must know English and know how to install python setups. If python
setups are installed, no pre-requisite knowledge is required to handle this software.
Hardware configuration must be met.
 Assumptions and dependencies
System must support python 2.7 32 bit. tkinter may not work with 3.x python
because of syntax change. Windows platform must be xp or wista or windows 7 or 8.
Memory must be 512mb at least. System handles text files.
3.2 Functional Requirements:
Sentiment analysis has to be performed on text in English and it gives output as:.
a. Positive
b. Negative
c. Neutral (zero)

19
3.3 Non Functional Requirements
 Data selection: data can be downloaded from standford site or various user reviews
sites or social networks. Reviews for movies and reviews for product must be
checked for separate datasets listed in the database.
 Accessibility: To access the data listed on nltk, run "nltk . download( ) on idle
 Documentation-Proper comments are there within each file for explanation.
 Maintainability - Codes does not need to be maintained if not altered.
 Portability - The user just need to run the .py file on any system to analyse
reviews/opinions.
 Reliability - It depends on the structure language of opinions.
 Response Time - Long reviews can take more time to pre-process it and then
tokenization. .
3.4 Logical database requirements
For database , separate files are added with the source code in separate folder with an extension
yaml. .yaml extension is easy to map with data members which are common for various languages
like arrays, dictionaries etc. There is no sql or other data base concepts are used in the project.Data
sets Files are attached with the source code using their director/file name paths with Python file
handling.
3.5 Design Diagrams
 3.3.1Use Case diagrams
user
1.Input String
2. takes
input
3.Press Enter
4. Start
processing the
data,tokenization
5.waiting for the Output
6.Output screen appears with
the sentiment score
Backgr
ound
Proces
sing

20
 Class diagrams / Control Flow Diagrams
Pos_Tagger_Class
+init()
+pos_tag()
Dictionary_tagger_
Class
+init()
+tag()
+tag_sentence()
Object class(python)
Splitter_class
+init()
+split()

21
 Sequence Diagram/Activity diagrams [3]

22
Chapter-4 Implementation details and issues
4.1 Implementation details and issues
The implementation is done in Python 2.7 using nltk and tkinter module. NLTK module is used for
text processing purpose which is open sourced. nltk gives many corpa for data analytics purpose.
These corpa can be used to recreate grammar or taggers which againg can be used with the tokens
for tagging and generating efficient classified data.
To download corpus like chas, books or novels listed to be used with data analytics purpose, run
nltk . download ( ) in python editor. this will download all the required documents for the
sentiment analysis purpose and can be used by importing "import nltk" .
It uses file handling in python. So check the path carefully first. All the files must be placed first
and its path names must be given to the dictionary_tagger_class.
Python 3.x may not be compatible with this code as there are many functions or tkinter changed in
3.x versions of python. It contains 3 classes:
 splitter_class: To split texts into tokens
 pos_tagger_class: for tagging purpose
 dictionary_tagger_class: make tagged tokens a dictionary data-type
4.1.1 Implementation Issues
Finding compatible functions with the nltk module and html parsing functions were few of
the issues with the project. there are many changes in python 2.7 and 3.x versions so
keeping syntax with compatible version was also one of the issues. Tkinter is also different
for python 2.7 and python 3.x as there are syntax changes in python 2.7 .
4.1.2 Algorithms (Module wise- with respect to design)
First module deals with the copying the content from web for downloading the reviews.
Second module deals with the tokenization process of texts and converting it into lists of
strings.
Third module deals with the tagging the tokens with accurate tags.
Fourth module deals with the file handling to add the files of datasets to the source code.
Fifth module deals with the making of dictionary tagged data members of text tokens.
Sixth module deals with the displaying the text attached with the polar words of text and the
result.
For Input and Output, Tkinter is being used here with python. It takes input and supplies it
to the source code of the sentiment analysis code and after processing, sentiment analysis
code returns the score for sentiment analysis which displayed on the screen using Tkinter.
Tkinter is a separate module for python.
The approach is Linguistic approach. In this approach, first, text is tokenized using
tokenizer_ function and then added tags with it. These tokens are then matched with the
existing data sets stored separately using .yaml extension. If token is found, it compares for
the attached tag with the token. On the basis of attached tag, it evaluates if it is positive or
negative. If the token s not found in the datasets, it is treated as neutral. Adjective or
Adverbs increases the score in the direction of polarity of words.

23
4.2 Risk Analysis and Mitigation
Ris
k
Id.
Description of risk Risk area Probabilit
y
(P)
Impac
t
(I)
PE
R*I
Risk
selected
for
mitigatio
n
(Y/N)
Mitigatio
n plan
Classificatio
n
1 Memory
Overflow/underflo
w
Memory 0.001 L 0.00
1
Y Try/catch
block
Code and
Unit test
2 Invalid Input( not
string)
Conversio
n of data
type
problem,
too large
numbers,
passing
string of
greater
size than
allowed
0.3 L 0.3 N Code and
unit test
7 Improper use of
function(not
passing required
parameters )
Prototypin
g
0.3 M 0.9 N Coding
Implentation

24
Interrelationship Graph
3 Performance Time of
execution
0.3 M 0.9 N Development
Process
4 Complier not
working
Compiler
problem
0.001 L 0.001 Y Re-
insall/Re-
open
Environment
and test
5 Code not working Code
altered
0.3 M 0.9 N Engineering
Specialities
6 Unwanted output Code
altered
0.1 L 0.1 N Engineering
Specialities
Memory
wt:0.001
Code Not
working
wt:0.9
Perfor
mance
wt:0.9
Unwanted
Output
wt:0.1
Prototyping
wt:0.9
Compiler
problem
wt:0.001
Data
Type/range
wt:0.3

25
S.No Risk Area # of Risk
statements
Weights(in+out) Total weight Priority
1 Code altered 4 0.1+0.1+0.1+0.9 1.2 High
2 Memory 2 0.001+0.3 0.301 Low
3 Data
type/range
2 0.3+0.9 1.2 High
4 Performance 1 0.1 0.1 Low
5 Prototyping 2 0.3+0.9 1.2 High
6 Compiler
problem
1 0.9 0.9 Medium
Top Risks as the ones with maximum total weight from the graph
Risk Id Risk Statement Risk Area Priority of Risk area
in IG
1 Code not
working/unwanted
output
Code Altered 1
Mitigation Approaches
Use Try/catch block for invalid input constraints.
Make function definition private..
For compiler problem, re-install/re-open it or check for the python path in the environment
variable.
For unwanted output, check for the range of input values or prototypes of functions.
Date Started Date To complete Owner
1 - May -2015 15 - May - 2015 Utkarsh
Additional resources needed for mitigation
Copy the source code for backup.

26
Chapter-5 Testing (Focus on Quality of Robustness and Testing)
5.2.1 Testing Plan
The source code for sentiment analysis is checked for different reviews taken from different sites. A
test file is also maintained for this purpose in a separate folder and its output is also saved. The type
of testing performed is mentioned here:.
Type of Test Will test be
performed?
Comments/explanation Software component
Requirement testing Yes
Unit Yes Listed in first program source files
Integration Yes Linked with source
file using fle handling
Database files
Performance Yes Depends on the
execution of text input
Length of text in
tkinter
Stress Yes Compiled py files
Compliance No
Security No Not hidden Dot py file for
implementation
Load No
Volume No
Example test cases Yes Number of test cases
are written in main file
and added with
datasets
Main files and
datasets
Compilation Yes For syntactical errors Python source files
Test Team Details
Test Schedule
Activity Start date Completion date Hours Comments
Obtain input
data
01/05/2015 10/05/2015 3 hours/Day Input taken from
various sources
Tester Utkarsh Performed all the test cases

27
on internet
Test region
setup
11/05/2015 15/05/2015 3 hours/Day Input taken from
various sources
on internet
TEST ENVIRONMENT- Description of test platforms
Software Items
Operating systems windows 8 Notepad
Python editor and compiler tkinter and nltk
Hardware Items
A complete system with pre-installed software for running python programs, nltk and tkinter
modules
5.2 Component decomposition and type of testing required
S.No List of various components Type of testing
required
Technique of writnig
test cases
1 TEST1 Integration White Box
2 TEST2 Performance Blak Box
3 TEST3 Example test cases Black box
5.3 List all test cases in prescribed format
Test cases for component
Test case Id Input Output Status
TEST1 Linked with file Console output score Pass
TEST2 Datasets Console output score Pass
TEST3 Numbers Integral Fail
TEST3 String Score Pass
4 TEST4 Compilation White Box

28
TEST3 Review from online
site
Score Pass
TEST4 Example test cases
linked with separate
files
Console output Pass
5..4 Error and Exception Handling (mention debugging techniques with which
you have corrected errors)
Test case id Test Case for component Debugging technique
1 Tkinter Print or tracing
2 Source code Backtracking
5.6 Limitations of the solution
The source code does not work for the following test cases:
 Grammatically ill formed sentences.
 Sentences having Sarcasm.
 Negation may not be handled well by the source code
 Too large text (in MB data of text file).Python takes lot of time to execute this much of data.
 Jumbling of words in sentences.

29
Chapter-6 Findings & Conclusion
6.1 Findings
The sentiment analysis is efficient for simple English, not for any other language. The sentence
formation must be simple and straight forward because it does not handle various cases of sentences
formation like jumbling of words or sarcastic sentences. Input can be taken from tkintr in text
format and similarly displayed. nltk module works really good for natural language processing. It
also provides other techniques to classify the text like naive-bias classifier or svm. Nltk includes
different kind of tagging functions to add tags with tokens.
6.2 Conclusion
.This approach used in the project works efficiently with plain English text. It is easy to code and
simple in understanding, does not require regular expression construction. There are built taggers
available which an be used directly with the texts. To make more efiicient, different techniques can
be grouped together.Naive_Bias_classifier or S_V_M can work better in case of complex sentences.
6.3 Future Work
 Using different techniques like machine learning ,super_wised learnig to train the one part
of text and use this training to analyze the rest of the text.
 Combine different techniques to see the result of combined approach of algorithms
 This work can be extended for other languages like Hindi etc.
 Construction of Regular Grammar makes the tagging part more efficient. Generate own
regular expressions.

30
References
[1] http://en.wikipedia.org/wiki/sentiment-analysis1
[2] http://inltk.org
[3] http://marl.gi2mo.org/img/class_diagram_v0.2.png
[4] http://www.nltk.org/books
[5] http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html
[6] https://wiki.python.org/moin/TkInter
[7] www.tutorialspoint.com/python/python_sending_email.htm
Appendix
A. Time Line
01-02 04-03 20-03 25-04 10-05 25-05 04-06
Synopsis
Study research
papers and
Implementation
Midterm report
Implementation
Testing
Report

31
Resume
Utkarsh
Date of Birth: 15-08-1993
E-Mail: soniutkarsh@ymail.com
Phone No.: +91-8468088422 Codechef Profile:Utkarsh3587
Interests:
 Data Structures
 Algorithms
 Operating Systems
 Object Oriented Programming
Education:
 B.Tech., Computer Science & Engineering-2015
Jaypee Institute of Information Technology , Noida
4th
year (7th
Semester) , Current CGPA : 6.2/10.
 Senior Secondary-2010
Sardar Patel Public Senior Secondary School , Delhi
CBSE with 74.6% .
 Secondary-2008
Sardar Patel Public Senior Secondary School , Delhi
CBSE with 83.8%.
Skillset:
Programming Languages: C , C++
Operating Systems : Ubuntu , Windows
Web Technologies: HTML, CSS, JavaScript
Projects:
 Hybrid Cross Platform Application
This Project was done on PhoneGap Platform using web technologies like html, css and java
script. Under this project I have implemented some functionalities like downloading study
material, playing quizzes , reading newspaper and few other functions etc.
 Face Recognition Application using OpenCV for Android
It was an android application project based on Image Processing using OpenCV libraries. It
detects faces and recognizes them on the basis of stored images.

Project report

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Project report

Similar to Project report (20)

Recently uploaded

Recently uploaded (20)

Project report