SlideShare a Scribd company logo
1 of 26
Download to read offline
Summariza(on	
  and	
  Opinion	
  
Detec(on	
  In	
  Product	
  Reviews	
  
Team	
  :	
  
	
  
Suman	
  Papanaboina	
  (p.suman@students.iiit.ac.in)	
  
Swapnil	
  Pa7l	
  (swapnil.pa7l@students.iiit.ac.in)	
  
Shubham	
  Srivastava	
  (shubham.srivastava@students.iiit.ac.in)	
  
Spandana	
  Otra	
  (otra.spandana@students.iiit.ac.in)	
  
	
  
Project	
  Mentor:	
  
	
  	
  Aditya	
  Joshi	
  (aditya.joshi@research.iiit.ac.in)	
  
	
  
	
  
	
  
	
  
Project	
  Mo7va7on	
  
•  As	
  e-­‐commerce	
  is	
  becoming	
  more	
  and	
  more	
  
popular,	
  the	
  number	
  of	
  customer	
  reviews	
  that	
  
a	
  product	
  receives	
  grows	
  rapidly.	
  
•  	
  For	
  a	
  popular	
  product,	
  the	
  number	
  of	
  reviews	
  
can	
  be	
  in	
  hundreds	
  or	
  even	
  
	
  
Project	
  Mo7va7on	
  
This	
  makes	
  it	
  difficult	
  for	
  a	
  
poten7al	
  customer	
  to	
  read	
  them	
  
to	
  make	
  an	
  informed	
  decision	
  
on	
  whether	
  to	
  purchase	
  the	
  
product.	
  
	
  
	
  
	
  
	
  
It	
  also	
  makes	
  it	
  difficult	
  for	
  the	
  
manufacturer	
  of	
  the	
  product	
  to	
  
keep	
  	
  track	
  and	
  to	
  manage	
  
customer	
  	
  opinions	
  .	
  
Project	
  Objec7ve	
  
•  Providing	
  Structured	
  feature	
  based	
  summary	
  
for	
  the	
  new	
  customer	
  by	
  mining	
  reviews.	
  
	
  
How	
  it	
  is	
  different	
  from	
  Tradi7onal	
  
Summariza7on?	
  
•  We	
  only	
  mine	
  the	
  features	
  of	
  the	
  product	
  on	
  
which	
  the	
  customers	
  have	
  expressed	
  their	
  
opinions	
  and	
  whether	
  the	
  opinions	
  are	
  posi7ve	
  
or	
  nega7ve.	
  
	
  
•  	
  We	
  do	
  not	
  summarize	
  the	
  reviews	
  by	
  selec7ng	
  a	
  
subset	
  or	
  rewrite	
  some	
  of	
  the	
  original	
  sentences	
  
from	
  the	
  reviews	
  to	
  capture	
  the	
  main	
  points	
  as	
  in	
  
the	
  classic	
  text	
  summariza7on.	
  
	
  
End-­‐to-­‐End	
  Architecture	
  
Crawler	
  
UI	
  
Rest	
  Service	
  
Sentence	
  SpliTer/
Preprocesser	
  
Feature/Opinion	
  
Extractor	
  
Frequent	
  Feature	
  
Iden7fier	
  
Feature	
  Pruner	
  
Sen7ment	
  
Analyzer	
  
Persistence	
  
Summarizer	
  
MySQl	
  
Crawler	
  Module	
  
	
  
Flipkart	
  
Jsoup	
  Scraping	
  
Tool	
  
Persister	
  
MySQL	
  
Crawled	
  below	
  informa7on	
  
Product	
  Name	
  
Ra7ng	
  
Review	
  Comment	
  
Commented	
  User	
  
Commented	
  Date/Time	
  
Sentence	
  SpliTer/Preprocessor	
  
	
  
Review	
  
Sentence	
  
SpliTer	
  
OpenNLP	
  
MySQL	
  
Persister	
  
Sentence	
  
Preprocessor	
  
Stop	
  words	
  
filter	
  
Stemming	
  
Feature/Opinion	
  Extractor	
  Module	
  
	
  
Sentence	
  
Stanford	
  
Dependency	
  
Parser	
  
Extract	
  nusbj,	
  
amod,	
  nn	
  
Find	
  any	
  
nega7ons	
  
Persister	
  
MySQL	
  
Feature/Opinion	
  Extractor	
  Module	
  
•  Used	
  stanford	
  dependency	
  	
  parser	
  
	
  
•  Extract	
  only	
  nsubj,	
  amod,	
  nn	
  pairs.	
  These	
  
pairs	
  turns	
  out	
  to	
  be	
  the	
  required	
  feature/
opinion	
  pairs.	
  
	
  	
  
•  Iden7fy	
  any	
  nega7ons	
  expressed	
  and	
  adjust	
  
the	
  opinion	
  accordingly.	
  
Frequent	
  Feature	
  Iden7fica7on	
  
•  We	
  defined	
  frequent	
  feature	
  as	
  a	
  feature	
  
which	
  appears	
  in	
  more	
  than	
  3	
  sentences	
  (this	
  
parameter	
  can	
  be	
  configured).	
  
•  We	
  used	
  Apache	
  Mahout	
  library	
  to	
  find	
  
frequent	
  paTerns.	
  
	
  
Frequent	
  Feature	
  Iden7fica7on	
  
	
  
Features	
  
Mahout	
  Frequent	
  
PaTern	
  Miner	
  
Sentences	
  
FP-­‐Grwoth/Fp-­‐tree	
  
Frequent	
  Features	
   Persister	
  
MySQL	
  
Redundancy	
  Pruning	
  
•  We	
  defined	
  a	
  feature	
  X	
  as	
  redundant	
  feature	
  if	
  	
  
•  X	
  is	
  a	
  part	
  of	
  another	
  feature	
  
•  And	
  the	
  feature	
  X	
  does	
  not	
  appear	
  on	
  its	
  own	
  at	
  least	
  
in	
  3	
  sentences	
  (threshold	
  is	
  configurable,	
  currently	
  in	
  
our	
  system	
  we	
  configured	
  it	
  as	
  3)	
  
•  A_er	
  implemen7ng	
  this	
  technique	
  we	
  are	
  able	
  
to	
  eliminate	
  redundant	
  features	
  like	
  baTery,	
  
life,	
  baTery	
  life.	
  
	
  
Redundancy	
  Pruning	
  
Redundancy	
  
Pruner	
  
BaTery,	
  life,	
  baTer	
  
life	
  
BaTery	
  life	
  
Junk	
  Features	
  
•  Some	
  of	
  the	
  reviews	
  we	
  have	
  sentences	
  like	
  Flipkart	
  
services	
  are	
  awesome	
  in	
  this	
  case	
  our	
  system	
  is	
  
extrac7ng	
  service	
  as	
  	
  feature	
  and	
  awesome	
  as	
  
opinion.	
  
	
  
	
   	
  	
  
	
  
Frequent	
  Features	
  
Junk	
  Feature	
  
Pruner	
  
Junk	
  Feature	
  File	
  
Output	
  Featues	
  
Sen7ment	
  Analysis	
  
Opinion	
  Words	
  
Sen7ment	
  
Analyzer	
  
Sen7Wordnet	
  
Posi7ve	
  Seed	
  List	
   Nega7ve	
  Seed	
  List	
  
Summarizer	
  
•  Summarizer	
  generated	
  feature	
  based	
  
structured	
  summary	
  as	
  shown	
  below.	
  
Feature	
  Summary	
  Rest	
  Service	
  
•  We	
  implemented	
  Rest	
  service	
  to	
  provide	
  
following	
  func7onali7es	
  to	
  the	
  UI.	
  
– Find	
  List	
  of	
  categories	
  in	
  the	
  system	
  
– Find	
  list	
  of	
  products	
  for	
  a	
  given	
  category	
  
– Find	
  feature	
  based	
  summary	
  for	
  a	
  given	
  product	
  
•  We	
  used	
  Grizzly	
  embedded	
  container	
  to	
  implement	
  
rest	
  service.	
  
UI	
  
Screen	
  Shots/Home	
  Page	
  
Screen	
  Shots/Feature	
  based	
  summary	
  
Screenshots/Individual	
  sentences	
  
Screenshots/Complete	
  review	
  
Evalua7on	
  
No.	
  of	
  feature-­‐opinion	
  pairs	
  manual	
  extracted	
   20	
  
No.	
  of	
  ini7al	
  feature-­‐opinion	
  pairs	
  extracted	
  by	
  our	
  
system	
  
40	
  
A_er	
  frequent	
  paTern	
  mining	
   25	
  
A_er	
  pruning	
  (final	
  stage)	
   18	
  
No.	
  of	
  correct	
  feature-­‐opinion	
  pairs	
   15	
  
No.	
  of	
  incorrect	
  feature-­‐opinion	
  pairs	
   3	
  
Precision	
   15/20	
  (75%)	
  
Recall	
   18/20	
  (90%)	
  
F1-­‐Measure	
  (	
  2*precision*recall)/(precision+recall)	
   	
  	
  	
  0.81	
  	
  
Conclusion	
  
•  It	
  is	
  a	
  great	
  learning	
  experience	
  for	
  all	
  of	
  us.	
  we	
  
are	
  really	
  excited	
  in	
  applying	
  data	
  mining	
  and	
  
natural	
  processing	
  techniques	
  to	
  implement	
  the	
  
system.	
  	
  
•  We	
  do	
  believe	
  that	
  this	
  system	
  can	
  help	
  users	
  to	
  
quickly	
  iden7fy	
  what	
  is	
  good/bad	
  in	
  a	
  product	
  
basing	
  on	
  other	
  user	
  comments.	
  It	
  also	
  provides	
  a	
  
beTer	
  perspec7ve	
  of	
  user’s	
  comments	
  to	
  the	
  
Manufacturers	
  which	
  can	
  aid	
  in	
  proving	
  business	
  
intelligence.	
  
Future	
  Enhancements	
  
•  We	
  need	
  to	
  add	
  more	
  rules	
  to	
  improve	
  overall	
  accuracy	
  of	
  
the	
  feature/opinion	
  iden7fica7on.	
  
	
  
•  Migrate	
  en7re	
  system	
  to	
  run	
  on	
  Hadoop	
  YARN	
  using	
  Hbase	
  
instead	
  of	
  Mysql.	
  
	
  
•  Try	
  unsupervised/supervised	
  machine	
  learning	
  approaches	
  
for	
  feature/opinion	
  iden7fica7ons.	
  
	
  
•  Replace	
  our	
  home	
  grown	
  Crawler	
  with	
  more	
  robust	
  and	
  
opensource	
  crawler	
  Apache	
  Nutch	
  (
hTps://nutch.apache.org/)	
  

More Related Content

Similar to Summarization and opinion detection in product reviews

Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviewspapanaboinasuman
 
Feature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsFeature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsRavi Kiran Holur Vijay
 
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...Innovation Roots
 
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...LeanKanbanIndia
 
Opinion Mining and Classification Technique to help make better choices befor...
Opinion Mining and Classification Technique to help make better choices befor...Opinion Mining and Classification Technique to help make better choices befor...
Opinion Mining and Classification Technique to help make better choices befor...Rajat Katiyar
 
5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software Faster5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software FasterDynatrace
 
Hypothesis driven development
Hypothesis driven developmentHypothesis driven development
Hypothesis driven developmentDuri Chitayat
 
OnTune suggestion for value_2012
OnTune suggestion for value_2012OnTune suggestion for value_2012
OnTune suggestion for value_2012Austin Lee
 
Software Requirements Engineering Methodologies
Software Requirements Engineering MethodologiesSoftware Requirements Engineering Methodologies
Software Requirements Engineering MethodologiesKiran Munir
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt
 
Fundamentals of software development
Fundamentals of software developmentFundamentals of software development
Fundamentals of software developmentPratik Devmurari
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverQA or the Highway
 
CampusSDN2017 - Jawdat: Product Management and Agile Development
CampusSDN2017 - Jawdat: Product Management and Agile DevelopmentCampusSDN2017 - Jawdat: Product Management and Agile Development
CampusSDN2017 - Jawdat: Product Management and Agile DevelopmentJawdatTI
 
Lean and Kanban-based Software Development
Lean and Kanban-based Software DevelopmentLean and Kanban-based Software Development
Lean and Kanban-based Software DevelopmentTathagat Varma
 
System development life cycle (sdlc)
System development life cycle (sdlc)System development life cycle (sdlc)
System development life cycle (sdlc)Mukund Trivedi
 
E-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer ReviewE-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer ReviewIRJET Journal
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
software Engineering process
software Engineering processsoftware Engineering process
software Engineering processRaheel Aslam
 

Similar to Summarization and opinion detection in product reviews (20)

Summarization and opinion detection in product reviews
Summarization and opinion detection in product reviewsSummarization and opinion detection in product reviews
Summarization and opinion detection in product reviews
 
Feature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsFeature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon Reviews
 
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
LKIN17: Enabling Enterprise Agility though a Hybrid Agile Implementation Mode...
 
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
Lean Kanban India 2017 | Case study - Hybrid Agile Implementation Model to En...
 
Opinion Mining and Classification Technique to help make better choices befor...
Opinion Mining and Classification Technique to help make better choices befor...Opinion Mining and Classification Technique to help make better choices befor...
Opinion Mining and Classification Technique to help make better choices befor...
 
5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software Faster5 Key Metrics to Release Better Software Faster
5 Key Metrics to Release Better Software Faster
 
Prototype Model
Prototype ModelPrototype Model
Prototype Model
 
Hypothesis driven development
Hypothesis driven developmentHypothesis driven development
Hypothesis driven development
 
OnTune suggestion for value_2012
OnTune suggestion for value_2012OnTune suggestion for value_2012
OnTune suggestion for value_2012
 
Software Requirements Engineering Methodologies
Software Requirements Engineering MethodologiesSoftware Requirements Engineering Methodologies
Software Requirements Engineering Methodologies
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Fundamentals of software development
Fundamentals of software developmentFundamentals of software development
Fundamentals of software development
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
 
CampusSDN2017 - Jawdat: Product Management and Agile Development
CampusSDN2017 - Jawdat: Product Management and Agile DevelopmentCampusSDN2017 - Jawdat: Product Management and Agile Development
CampusSDN2017 - Jawdat: Product Management and Agile Development
 
Lean and Kanban-based Software Development
Lean and Kanban-based Software DevelopmentLean and Kanban-based Software Development
Lean and Kanban-based Software Development
 
System development life cycle (sdlc)
System development life cycle (sdlc)System development life cycle (sdlc)
System development life cycle (sdlc)
 
Visual Studio Profiler
Visual Studio ProfilerVisual Studio Profiler
Visual Studio Profiler
 
E-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer ReviewE-Commerce Product Rating Based on Customer Review
E-Commerce Product Rating Based on Customer Review
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
software Engineering process
software Engineering processsoftware Engineering process
software Engineering process
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Summarization and opinion detection in product reviews

  • 1. Summariza(on  and  Opinion   Detec(on  In  Product  Reviews   Team  :     Suman  Papanaboina  (p.suman@students.iiit.ac.in)   Swapnil  Pa7l  (swapnil.pa7l@students.iiit.ac.in)   Shubham  Srivastava  (shubham.srivastava@students.iiit.ac.in)   Spandana  Otra  (otra.spandana@students.iiit.ac.in)     Project  Mentor:      Aditya  Joshi  (aditya.joshi@research.iiit.ac.in)          
  • 2. Project  Mo7va7on   •  As  e-­‐commerce  is  becoming  more  and  more   popular,  the  number  of  customer  reviews  that   a  product  receives  grows  rapidly.   •   For  a  popular  product,  the  number  of  reviews   can  be  in  hundreds  or  even    
  • 3. Project  Mo7va7on   This  makes  it  difficult  for  a   poten7al  customer  to  read  them   to  make  an  informed  decision   on  whether  to  purchase  the   product.           It  also  makes  it  difficult  for  the   manufacturer  of  the  product  to   keep    track  and  to  manage   customer    opinions  .  
  • 4. Project  Objec7ve   •  Providing  Structured  feature  based  summary   for  the  new  customer  by  mining  reviews.    
  • 5. How  it  is  different  from  Tradi7onal   Summariza7on?   •  We  only  mine  the  features  of  the  product  on   which  the  customers  have  expressed  their   opinions  and  whether  the  opinions  are  posi7ve   or  nega7ve.     •   We  do  not  summarize  the  reviews  by  selec7ng  a   subset  or  rewrite  some  of  the  original  sentences   from  the  reviews  to  capture  the  main  points  as  in   the  classic  text  summariza7on.    
  • 6. End-­‐to-­‐End  Architecture   Crawler   UI   Rest  Service   Sentence  SpliTer/ Preprocesser   Feature/Opinion   Extractor   Frequent  Feature   Iden7fier   Feature  Pruner   Sen7ment   Analyzer   Persistence   Summarizer   MySQl  
  • 7. Crawler  Module     Flipkart   Jsoup  Scraping   Tool   Persister   MySQL   Crawled  below  informa7on   Product  Name   Ra7ng   Review  Comment   Commented  User   Commented  Date/Time  
  • 8. Sentence  SpliTer/Preprocessor     Review   Sentence   SpliTer   OpenNLP   MySQL   Persister   Sentence   Preprocessor   Stop  words   filter   Stemming  
  • 9. Feature/Opinion  Extractor  Module     Sentence   Stanford   Dependency   Parser   Extract  nusbj,   amod,  nn   Find  any   nega7ons   Persister   MySQL  
  • 10. Feature/Opinion  Extractor  Module   •  Used  stanford  dependency    parser     •  Extract  only  nsubj,  amod,  nn  pairs.  These   pairs  turns  out  to  be  the  required  feature/ opinion  pairs.       •  Iden7fy  any  nega7ons  expressed  and  adjust   the  opinion  accordingly.  
  • 11. Frequent  Feature  Iden7fica7on   •  We  defined  frequent  feature  as  a  feature   which  appears  in  more  than  3  sentences  (this   parameter  can  be  configured).   •  We  used  Apache  Mahout  library  to  find   frequent  paTerns.    
  • 12. Frequent  Feature  Iden7fica7on     Features   Mahout  Frequent   PaTern  Miner   Sentences   FP-­‐Grwoth/Fp-­‐tree   Frequent  Features   Persister   MySQL  
  • 13. Redundancy  Pruning   •  We  defined  a  feature  X  as  redundant  feature  if     •  X  is  a  part  of  another  feature   •  And  the  feature  X  does  not  appear  on  its  own  at  least   in  3  sentences  (threshold  is  configurable,  currently  in   our  system  we  configured  it  as  3)   •  A_er  implemen7ng  this  technique  we  are  able   to  eliminate  redundant  features  like  baTery,   life,  baTery  life.    
  • 14. Redundancy  Pruning   Redundancy   Pruner   BaTery,  life,  baTer   life   BaTery  life  
  • 15. Junk  Features   •  Some  of  the  reviews  we  have  sentences  like  Flipkart   services  are  awesome  in  this  case  our  system  is   extrac7ng  service  as    feature  and  awesome  as   opinion.             Frequent  Features   Junk  Feature   Pruner   Junk  Feature  File   Output  Featues  
  • 16. Sen7ment  Analysis   Opinion  Words   Sen7ment   Analyzer   Sen7Wordnet   Posi7ve  Seed  List   Nega7ve  Seed  List  
  • 17. Summarizer   •  Summarizer  generated  feature  based   structured  summary  as  shown  below.  
  • 18. Feature  Summary  Rest  Service   •  We  implemented  Rest  service  to  provide   following  func7onali7es  to  the  UI.   – Find  List  of  categories  in  the  system   – Find  list  of  products  for  a  given  category   – Find  feature  based  summary  for  a  given  product   •  We  used  Grizzly  embedded  container  to  implement   rest  service.  
  • 19. UI  
  • 24. Evalua7on   No.  of  feature-­‐opinion  pairs  manual  extracted   20   No.  of  ini7al  feature-­‐opinion  pairs  extracted  by  our   system   40   A_er  frequent  paTern  mining   25   A_er  pruning  (final  stage)   18   No.  of  correct  feature-­‐opinion  pairs   15   No.  of  incorrect  feature-­‐opinion  pairs   3   Precision   15/20  (75%)   Recall   18/20  (90%)   F1-­‐Measure  (  2*precision*recall)/(precision+recall)        0.81    
  • 25. Conclusion   •  It  is  a  great  learning  experience  for  all  of  us.  we   are  really  excited  in  applying  data  mining  and   natural  processing  techniques  to  implement  the   system.     •  We  do  believe  that  this  system  can  help  users  to   quickly  iden7fy  what  is  good/bad  in  a  product   basing  on  other  user  comments.  It  also  provides  a   beTer  perspec7ve  of  user’s  comments  to  the   Manufacturers  which  can  aid  in  proving  business   intelligence.  
  • 26. Future  Enhancements   •  We  need  to  add  more  rules  to  improve  overall  accuracy  of   the  feature/opinion  iden7fica7on.     •  Migrate  en7re  system  to  run  on  Hadoop  YARN  using  Hbase   instead  of  Mysql.     •  Try  unsupervised/supervised  machine  learning  approaches   for  feature/opinion  iden7fica7ons.     •  Replace  our  home  grown  Crawler  with  more  robust  and   opensource  crawler  Apache  Nutch  ( hTps://nutch.apache.org/)