Summarization and opinion detection in product reviews
1. Summariza(on
and
Opinion
Detec(on
In
Product
Reviews
Team
:
Suman
Papanaboina
(p.suman@students.iiit.ac.in)
Swapnil
Pa7l
(swapnil.pa7l@students.iiit.ac.in)
Shubham
Srivastava
(shubham.srivastava@students.iiit.ac.in)
Spandana
Otra
(otra.spandana@students.iiit.ac.in)
Project
Mentor:
Aditya
Joshi
(aditya.joshi@research.iiit.ac.in)
2. Project
Mo7va7on
• As
e-‐commerce
is
becoming
more
and
more
popular,
the
number
of
customer
reviews
that
a
product
receives
grows
rapidly.
•
For
a
popular
product,
the
number
of
reviews
can
be
in
hundreds
or
even
3. Project
Mo7va7on
This
makes
it
difficult
for
a
poten7al
customer
to
read
them
to
make
an
informed
decision
on
whether
to
purchase
the
product.
It
also
makes
it
difficult
for
the
manufacturer
of
the
product
to
keep
track
and
to
manage
customer
opinions
.
4. Project
Objec7ve
• Providing
Structured
feature
based
summary
for
the
new
customer
by
mining
reviews.
5. How
it
is
different
from
Tradi7onal
Summariza7on?
• We
only
mine
the
features
of
the
product
on
which
the
customers
have
expressed
their
opinions
and
whether
the
opinions
are
posi7ve
or
nega7ve.
•
We
do
not
summarize
the
reviews
by
selec7ng
a
subset
or
rewrite
some
of
the
original
sentences
from
the
reviews
to
capture
the
main
points
as
in
the
classic
text
summariza7on.
10. Feature/Opinion
Extractor
Module
• Used
stanford
dependency
parser
• Extract
only
nsubj,
amod,
nn
pairs.
These
pairs
turns
out
to
be
the
required
feature/
opinion
pairs.
• Iden7fy
any
nega7ons
expressed
and
adjust
the
opinion
accordingly.
11. Frequent
Feature
Iden7fica7on
• We
defined
frequent
feature
as
a
feature
which
appears
in
more
than
3
sentences
(this
parameter
can
be
configured).
• We
used
Apache
Mahout
library
to
find
frequent
paTerns.
12. Frequent
Feature
Iden7fica7on
Features
Mahout
Frequent
PaTern
Miner
Sentences
FP-‐Grwoth/Fp-‐tree
Frequent
Features
Persister
MySQL
13. Redundancy
Pruning
• We
defined
a
feature
X
as
redundant
feature
if
• X
is
a
part
of
another
feature
• And
the
feature
X
does
not
appear
on
its
own
at
least
in
3
sentences
(threshold
is
configurable,
currently
in
our
system
we
configured
it
as
3)
• A_er
implemen7ng
this
technique
we
are
able
to
eliminate
redundant
features
like
baTery,
life,
baTery
life.
15. Junk
Features
• Some
of
the
reviews
we
have
sentences
like
Flipkart
services
are
awesome
in
this
case
our
system
is
extrac7ng
service
as
feature
and
awesome
as
opinion.
Frequent
Features
Junk
Feature
Pruner
Junk
Feature
File
Output
Featues
18. Feature
Summary
Rest
Service
• We
implemented
Rest
service
to
provide
following
func7onali7es
to
the
UI.
– Find
List
of
categories
in
the
system
– Find
list
of
products
for
a
given
category
– Find
feature
based
summary
for
a
given
product
• We
used
Grizzly
embedded
container
to
implement
rest
service.
24. Evalua7on
No.
of
feature-‐opinion
pairs
manual
extracted
20
No.
of
ini7al
feature-‐opinion
pairs
extracted
by
our
system
40
A_er
frequent
paTern
mining
25
A_er
pruning
(final
stage)
18
No.
of
correct
feature-‐opinion
pairs
15
No.
of
incorrect
feature-‐opinion
pairs
3
Precision
15/20
(75%)
Recall
18/20
(90%)
F1-‐Measure
(
2*precision*recall)/(precision+recall)
0.81
25. Conclusion
• It
is
a
great
learning
experience
for
all
of
us.
we
are
really
excited
in
applying
data
mining
and
natural
processing
techniques
to
implement
the
system.
• We
do
believe
that
this
system
can
help
users
to
quickly
iden7fy
what
is
good/bad
in
a
product
basing
on
other
user
comments.
It
also
provides
a
beTer
perspec7ve
of
user’s
comments
to
the
Manufacturers
which
can
aid
in
proving
business
intelligence.
26. Future
Enhancements
• We
need
to
add
more
rules
to
improve
overall
accuracy
of
the
feature/opinion
iden7fica7on.
• Migrate
en7re
system
to
run
on
Hadoop
YARN
using
Hbase
instead
of
Mysql.
• Try
unsupervised/supervised
machine
learning
approaches
for
feature/opinion
iden7fica7ons.
• Replace
our
home
grown
Crawler
with
more
robust
and
opensource
crawler
Apache
Nutch
(
hTps://nutch.apache.org/)