Summarization and opinion detection in product reviews

Summariza(on
and
Opinion

Detec(on
In
Product
Reviews

Team
:

Suman
Papanaboina
(p.suman@students.iiit.ac.in)

Swapnil
Pa7l
(swapnil.pa7l@students.iiit.ac.in)

Shubham
Srivastava
(shubham.srivastava@students.iiit.ac.in)

Spandana
Otra
(otra.spandana@students.iiit.ac.in)

Project
Mentor:

Aditya
Joshi
(aditya.joshi@research.iiit.ac.in)

Project
Mo7va7on

•  As
e-‐commerce
is
becoming
more
and
more

popular,
the
number
of
customer
reviews
that

a
product
receives
grows
rapidly.

• 
For
a
popular
product,
the
number
of
reviews

can
be
in
hundreds
or
even

Project
Mo7va7on

This
makes
it
diﬃcult
for
a

poten7al
customer
to
read
them

to
make
an
informed
decision

on
whether
to
purchase
the

product.

It
also
makes
it
diﬃcult
for
the

manufacturer
of
the
product
to

keep

track
and
to
manage

customer

opinions
.

Project
Objec7ve

•  Providing
Structured
feature
based
summary

for
the
new
customer
by
mining
reviews.

How
it
is
diﬀerent
from
Tradi7onal

Summariza7on?

•  We
only
mine
the
features
of
the
product
on

which
the
customers
have
expressed
their

opinions
and
whether
the
opinions
are
posi7ve

or
nega7ve.

• 
We
do
not
summarize
the
reviews
by
selec7ng
a

subset
or
rewrite
some
of
the
original
sentences

from
the
reviews
to
capture
the
main
points
as
in

the
classic
text
summariza7on.

End-‐to-‐End
Architecture

Crawler

UI

Rest
Service

Sentence
SpliTer/
Preprocesser

Feature/Opinion

Extractor

Frequent
Feature

Iden7ﬁer

Feature
Pruner

Sen7ment

Analyzer

Persistence

Summarizer

MySQl

Crawler
Module

Flipkart

Jsoup
Scraping

Tool

Persister

MySQL

Crawled
below
informa7on

Product
Name

Ra7ng

Review
Comment

Commented
User

Commented
Date/Time

Sentence
SpliTer/Preprocessor

Review

Sentence

SpliTer

OpenNLP

MySQL

Persister

Sentence

Preprocessor

Stop
words

ﬁlter

Stemming

Feature/Opinion
Extractor
Module

Sentence

Stanford

Dependency

Parser

Extract
nusbj,

amod,
nn

Find
any

nega7ons

Persister

MySQL

Feature/Opinion
Extractor
Module

•  Used
stanford
dependency

parser

•  Extract
only
nsubj,
amod,
nn
pairs.
These

pairs
turns
out
to
be
the
required
feature/
opinion
pairs.

•  Iden7fy
any
nega7ons
expressed
and
adjust

the
opinion
accordingly.

Frequent
Feature
Iden7fica7on

•  We
defined
frequent
feature
as
a
feature

which
appears
in
more
than
3
sentences
(this

parameter
can
be
configured).

•  We
used
Apache
Mahout
library
to
find

frequent
paTerns.

Frequent
Feature
Iden7ﬁca7on

Features

Mahout
Frequent

PaTern
Miner

Sentences

FP-‐Grwoth/Fp-‐tree

Frequent
Features
Persister

MySQL

Redundancy
Pruning

•  We
defined
a
feature
X
as
redundant
feature
if

•  X
is
a
part
of
another
feature

•  And
the
feature
X
does
not
appear
on
its
own
at
least

in
3
sentences
(threshold
is
configurable,
currently
in

our
system
we
configured
it
as
3)

•  A_er
implemen7ng
this
technique
we
are
able

to
eliminate
redundant
features
like
baTery,

life,
baTery
life.

Redundancy
Pruning

Redundancy

Pruner

BaTery,
life,
baTer

life

BaTery
life

Junk
Features

•  Some
of
the
reviews
we
have
sentences
like
Flipkart

services
are
awesome
in
this
case
our
system
is

extrac7ng
service
as

feature
and
awesome
as

opinion.

Frequent
Features

Junk
Feature

Pruner

Junk
Feature
File

Output
Featues

Sen7ment
Analysis

Opinion
Words

Sen7ment

Analyzer

Sen7Wordnet

Posi7ve
Seed
List
Nega7ve
Seed
List

Summarizer

•  Summarizer
generated
feature
based

structured
summary
as
shown
below.

Feature
Summary
Rest
Service

•  We
implemented
Rest
service
to
provide

following
func7onali7es
to
the
UI.

– Find
List
of
categories
in
the
system

– Find
list
of
products
for
a
given
category

– Find
feature
based
summary
for
a
given
product

•  We
used
Grizzly
embedded
container
to
implement

rest
service.

Screen
Shots/Home
Page

Screen
Shots/Feature
based
summary

Screenshots/Individual
sentences

Screenshots/Complete
review

Evalua7on

No.
of
feature-‐opinion
pairs
manual
extracted
20

No.
of
ini7al
feature-‐opinion
pairs
extracted
by
our

system

40

A_er
frequent
paTern
mining
25

A_er
pruning
(ﬁnal
stage)
18

No.
of
correct
feature-‐opinion
pairs
15

No.
of
incorrect
feature-‐opinion
pairs
3

Precision
15/20
(75%)

Recall
18/20
(90%)

F1-‐Measure
(
2*precision*recall)/(precision+recall)

0.81

Conclusion

•  It
is
a
great
learning
experience
for
all
of
us.
we

are
really
excited
in
applying
data
mining
and

natural
processing
techniques
to
implement
the

system.

•  We
do
believe
that
this
system
can
help
users
to

quickly
iden7fy
what
is
good/bad
in
a
product

basing
on
other
user
comments.
It
also
provides
a

beTer
perspec7ve
of
user’s
comments
to
the

Manufacturers
which
can
aid
in
proving
business

intelligence.

Future
Enhancements

•  We
need
to
add
more
rules
to
improve
overall
accuracy
of

the
feature/opinion
iden7ﬁca7on.

•  Migrate
en7re
system
to
run
on
Hadoop
YARN
using
Hbase

instead
of
Mysql.

•  Try
unsupervised/supervised
machine
learning
approaches

for
feature/opinion
iden7ﬁca7ons.

•  Replace
our
home
grown
Crawler
with
more
robust
and

opensource
crawler
Apache
Nutch
(
hTps://nutch.apache.org/)

Summarization and opinion detection in product reviews

Recommended

Recommended

More Related Content

Similar to Summarization and opinion detection in product reviews

Similar to Summarization and opinion detection in product reviews (20)

Recently uploaded

Recently uploaded (20)

Summarization and opinion detection in product reviews