SlideShare a Scribd company logo
Summary
                                           Introduction
                              Methodological Framework
                                       Experimentation
                                            Conclusions




      Assessing The Quality Of Opinion Retrieval
                      Systems.

  G. Amati1 , G. Amodeo2 , V. Capozio3 , C. Gaibisso4 , G. Gambosi3

                            1
                              Ugo Bordoni Foundation, Rome, Italy
               2
                Dept. of Computer Science, University of L’Aquila, L’Aquila, Italy
            3
              Dept. of Mathematics, University of Rome “Tor Vergata”, Rome, Italy
                                   4
                                     IASI-CNR, Rome, Italy


     The First International Workshop on Opinion Mining for
                       Business Intelligence
                          August 31, 2010


G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   1 / 14
Summary
                                               Introduction
                                  Methodological Framework
                                           Experimentation
                                                Conclusions


Summary
 Objectives of the work

       Topical Opinion Retrieval (TOR) is evaluated by classical IR evaluation
       measures, i.e. Mean Average Precision (MAP) or Precision at 10 (P@10).
       The effectiveness of the topical-only retrieval (effectiveness of the baseline)
       boosts the TOR performance.
       How can we assess the opinion-only classification accuracy (or precision, etc.)?
       How can we split the contribution of the opinion component from retrieval?

 Methodological Framework
       We build artificial opinion-only classifiers from relevance and opinion data at
       different rates of opinion accuracy and precision.
       Then we study the effect on MAP of the TOR system with such classifiers
       We are able to assess the opinion-only component quality of a given TOR
       system by comparing it with such artificial TOR systems.

 Results & Conclusions

    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   2 / 14
Summary
                                               Introduction
                                  Methodological Framework
                                           Experimentation
                                                Conclusions


Summary
 Objectives of the work

       Topical Opinion Retrieval (TOR) is evaluated by classical IR evaluation
       measures, i.e. Mean Average Precision (MAP) or Precision at 10 (P@10).
       The effectiveness of the topical-only retrieval (effectiveness of the baseline)
       boosts the TOR performance.
       How can we assess the opinion-only classification accuracy (or precision, etc.)?
       How can we split the contribution of the opinion component from retrieval?

 Methodological Framework
       We build artificial opinion-only classifiers from relevance and opinion data at
       different rates of opinion accuracy and precision.
       Then we study the effect on MAP of the TOR system with such classifiers
       We are able to assess the opinion-only component quality of a given TOR
       system by comparing it with such artificial TOR systems.

 Results & Conclusions

    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   2 / 14
Summary
                                               Introduction
                                  Methodological Framework
                                           Experimentation
                                                Conclusions


Summary
 Objectives of the work

       Topical Opinion Retrieval (TOR) is evaluated by classical IR evaluation
       measures, i.e. Mean Average Precision (MAP) or Precision at 10 (P@10).
       The effectiveness of the topical-only retrieval (effectiveness of the baseline)
       boosts the TOR performance.
       How can we assess the opinion-only classification accuracy (or precision, etc.)?
       How can we split the contribution of the opinion component from retrieval?

 Methodological Framework
       We build artificial opinion-only classifiers from relevance and opinion data at
       different rates of opinion accuracy and precision.
       Then we study the effect on MAP of the TOR system with such classifiers
       We are able to assess the opinion-only component quality of a given TOR
       system by comparing it with such artificial TOR systems.

 Results & Conclusions

    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   2 / 14
Summary
                                               Introduction
                                  Methodological Framework       Context, issues and aims
                                           Experimentation
                                                Conclusions


The topical opinion retrieval (TOR)
  TOR systems have two phases:
  Topic Retrieval : Ranking documents by content-only;
  Opinion Mining : Filtering or re-ranking these documents by opinion
               content.

  Filtering or re-ranking relevant documents by opinions always hurts the
  initial performance of topical retrieval (with the actual TREC submitted
  runs). Actually MAP always increases with a perfect opinion classifier!

  To assess the effectiveness of an opinion mining strategy should be
  sufficient to observe MAP of relevance and opinion (MAPR,O ) with
  respect to MAP of the baseline.

  Unfortunately different baselines provide different increment rates for the
  same technique of opinion mining.
    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   3 / 14
Summary
                                               Introduction
                                  Methodological Framework       Context, issues and aims
                                           Experimentation
                                                Conclusions


To sum up



  The aim of our work is to introduce a methodological evaluation
  framework to:
       provide a best achievable MAPR,O for a given baseline;
       assess opinion mining effectiveness from the overall topical opinion
       retrieval performance;
       study best filtering strategies on top of topical retrieval.




    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   4 / 14
Summary
                                               Introduction
                                  Methodological Framework       Artificial classifiers
                                           Experimentation
                                                Conclusions


Artificial opinion classifiers
  Let A be a complete set of assessments (by topic-relevance and
  opinion-only) for the collection. A binary opinion classifier is a function
  that maps documents in CO , the category of opinionated documents,
  and CO , the category of non-opinionated documents.

                                              O                             O
                             CO            KO · |O|                   (1 − KO ) · |O|
                             CO         (1 − KO ) · |O|                  KO · |O|

                                                               A
  We define a class of artificial binary classifiers of opinion, CKO ,KO (·),
  where
       KO is the detection rate of true positive documents according to A;
       KO is the detection rate of true negative documents according to A;
       (1 − KO ) · |O| is the number of type I errors;
       (1 − KO ) · |O| is the number of type II errors;
    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   5 / 14
Summary
                                               Introduction
                                  Methodological Framework       Artificial classifiers
                                           Experimentation
                                                Conclusions


How to use the framework

  Given a topical opinion retrieval run and its MAPR,O = r value, we
  obtain the set of all KO and KO values, such that the artificial opinion
              A
  classifiers CKO ,KO (·) achieve r.
  We then compute accuracy, precision, recall and F-score of the
  opinion-only component as follows:
                    KO ·|O|+KO ·|O|
       Acc =            |O|+|O|
                           KO ·|O|
       Prec =        KO ·|O|+(1−KO )·|O|
       Rec = KO
                               Prec·Rec
       F-score = 2 ·           Prec+Rec       (β = 1)
  Any approach must improve the performance of the random classifier
  CP (O),1−P (O) (·), where P (O) = |O| is the a priori distribution of
    A
                                    |C|
  opinionated documents in the collection.


    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   6 / 14
Summary
                                               Introduction      Test Collection
                                  Methodological Framework       Completing Data
                                           Experimentation       Results
                                                Conclusions


TREC Blog2008 collection



  The Blog2008 consists of 3.2 millions of web pages containing blog posts,
  a test suite of 150 topics and a set of relevance/opinion assessment
  (QRELs).

  Topics and QRELs are provided by the NIST.

  The NIST also provided the best 5 runs, named baselines, produced by
  some participants. Each baseline is made by 150 runs, one for each topic.




    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   7 / 14
Summary
                                               Introduction       Test Collection
                                  Methodological Framework        Completing Data
                                           Experimentation        Results
                                                Conclusions


Complete the data
  Unfortunately the 150 topics are a sample of the topics treated by the
  collection and the largest part of documents are not assessed with respect
  to their content of opinion.

  To fill the lack of information on the opinion expressed by documents we
  need to “complete” the data.

  To complete the data we assume that each document is relevant for some
  topic t. Qrelst is completed assigning each non relevant document for t
  to the set of non relevant and opinionated documents with probability
                                                               |OR − ORt |
                                         P (ORt ) =                        .
                                                                |R − Rt |
  Analogously can be defined P (ORt ) as:
                                                |OR − ORt |
                           P (ORt ) =                       = 1 − P (ORt ).
                                                  |R − Rt |
    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi    Assessing The Quality Of Opinion Retrieval Systems.   8 / 14
Summary
                                               Introduction      Test Collection
                                  Methodological Framework       Completing Data
                                           Experimentation       Results
                                                Conclusions


The Monte Carlo approach



  We use Monte Carlo approach to generate randomly different opinion
  assessments for not relevant data in order to complete data.

  We iterate previous step to generate randomly different values for
  precision, recall, F-score or accuracy and average them.

  Much less than 20 cycles are enough to obtain stable results.




    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   9 / 14
Summary
                                               Introduction      Test Collection
                                  Methodological Framework       Completing Data
                                           Experimentation       Results
                                                Conclusions


How to use the framework to predict opinion performance

  Setting KO = KO = 1 the framework works as an oracle and provides a
  best achievable MAPR,O for each baseline.

                                   MAPR              MAPR,O             MAP∗ R,O              ∆%
                      BL1          0.3540             0.2639             0.4999               89%
                      BL2          0.3382             0.2657             0.4737               78%
                      BL3          0.4079             0.3201             0.5580               74%
                      BL4          0.4776             0.3543             0.6294               78%
                      BL5          0.4247             0.3147             0.5839               86%

  Mean Average Precision of relevance MAPR , relevance and opinion
  MAPR,O , optimal relevance and opinion MAP∗ , variation ∆% between
                                             R,O
  MAP∗R,O and MAPR,O .




    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   10 / 14
Summary
                                               Introduction      Test Collection
                                  Methodological Framework       Completing Data
                                           Experimentation       Results
                                                Conclusions


Mean percentage variations of MAPR,O filtering the
                   Qrels∗
baselines through CKO ,K (·).
                                                    O




          PP KO
       PP
                                          1.0          0.9        0.8           0.7            0.6            0.5
        KO  PP
               P
               P
             1.0                        81%          63%        45%           27%           10%            -9%
             0.9                        63%          46%        28%           11%           -7%           -24%
             0.8                        50%          33%        17%            0%          -17%           -33%
             0.7                        40%          24%         7%           -8%          -24%           -39%
             0.6                        32%          16%         0%          -15%          -30%           -44%
             0.5                        24%           9%        -6%          -20%          -35%           -48%

  KO contributes to improve MAPR,O more than KO . This is evident
  comparing the values of MAPR,O reported by the column and the row
  corresponding to KO = KO = 0.7.


    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   11 / 14
Summary
                                               Introduction      Test Collection
                                  Methodological Framework       Completing Data
                                           Experimentation       Results
                                                Conclusions


Use the framework to compare the best three TREC
approaches


  The best three approaches to the TREC Blog Track 2008 achieve, on the
  five baselines, the following performance:
    1   MAPR,O = 0.3614, percentage improvements of +12%;
    2   MAPR,O = 0.3565, percentage improvements of +10%;
    3   MAPR,O = 0.3412, percentage improvements of +5%;

  These evidently different improvements do not significantly differ in terms
  of opinion mining effectiveness.




    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   12 / 14
Summary
                                               Introduction
                                  Methodological Framework
                                           Experimentation
                                                Conclusions


Conclusion


       Our evaluation framework assesses the effectiveness of opinion
       mining techniques.
       This framework, makes it possible to provide a best achievable
       MAPR,O for a given baseline.
       We determine the minimum values of accuracy, precision, recall and
       F-score that make it possible to improve a baseline. These values
       show that it is an hard task to improve a baseline by filtering its
       documents according to the opinion they express.
       We show how to compare different opinion mining techniques and to
       understand if they really improves on the state of the art.




    G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   13 / 14
Summary
                                           Introduction
                              Methodological Framework
                                       Experimentation
                                            Conclusions




                                        Thanks!




G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi   Assessing The Quality Of Opinion Retrieval Systems.   14 / 14

More Related Content

Similar to Assessing the Quality of Opinion Retrieval Systems

OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
ijfls
 
Kk341721880
Kk341721880Kk341721880
Kk341721880
IJERA Editor
 
using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...
using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...
using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...
Danilo Monteiro
 
Us
UsUs
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
Förderverein Technische Fakultät
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
Alejandro Bellogin
 

Similar to Assessing the Quality of Opinion Retrieval Systems (6)

OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...
 
Kk341721880
Kk341721880Kk341721880
Kk341721880
 
using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...
using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...
using-qualitative-metasummary-to-synthesize-empirical-findings-in-literature-...
 
Us
UsUs
Us
 
Don't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptxDon't Treat the Symptom, Find the Cause!.pptx
Don't Treat the Symptom, Find the Cause!.pptx
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 

Assessing the Quality of Opinion Retrieval Systems

  • 1. Summary Introduction Methodological Framework Experimentation Conclusions Assessing The Quality Of Opinion Retrieval Systems. G. Amati1 , G. Amodeo2 , V. Capozio3 , C. Gaibisso4 , G. Gambosi3 1 Ugo Bordoni Foundation, Rome, Italy 2 Dept. of Computer Science, University of L’Aquila, L’Aquila, Italy 3 Dept. of Mathematics, University of Rome “Tor Vergata”, Rome, Italy 4 IASI-CNR, Rome, Italy The First International Workshop on Opinion Mining for Business Intelligence August 31, 2010 G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 1 / 14
  • 2. Summary Introduction Methodological Framework Experimentation Conclusions Summary Objectives of the work Topical Opinion Retrieval (TOR) is evaluated by classical IR evaluation measures, i.e. Mean Average Precision (MAP) or Precision at 10 (P@10). The effectiveness of the topical-only retrieval (effectiveness of the baseline) boosts the TOR performance. How can we assess the opinion-only classification accuracy (or precision, etc.)? How can we split the contribution of the opinion component from retrieval? Methodological Framework We build artificial opinion-only classifiers from relevance and opinion data at different rates of opinion accuracy and precision. Then we study the effect on MAP of the TOR system with such classifiers We are able to assess the opinion-only component quality of a given TOR system by comparing it with such artificial TOR systems. Results & Conclusions G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 2 / 14
  • 3. Summary Introduction Methodological Framework Experimentation Conclusions Summary Objectives of the work Topical Opinion Retrieval (TOR) is evaluated by classical IR evaluation measures, i.e. Mean Average Precision (MAP) or Precision at 10 (P@10). The effectiveness of the topical-only retrieval (effectiveness of the baseline) boosts the TOR performance. How can we assess the opinion-only classification accuracy (or precision, etc.)? How can we split the contribution of the opinion component from retrieval? Methodological Framework We build artificial opinion-only classifiers from relevance and opinion data at different rates of opinion accuracy and precision. Then we study the effect on MAP of the TOR system with such classifiers We are able to assess the opinion-only component quality of a given TOR system by comparing it with such artificial TOR systems. Results & Conclusions G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 2 / 14
  • 4. Summary Introduction Methodological Framework Experimentation Conclusions Summary Objectives of the work Topical Opinion Retrieval (TOR) is evaluated by classical IR evaluation measures, i.e. Mean Average Precision (MAP) or Precision at 10 (P@10). The effectiveness of the topical-only retrieval (effectiveness of the baseline) boosts the TOR performance. How can we assess the opinion-only classification accuracy (or precision, etc.)? How can we split the contribution of the opinion component from retrieval? Methodological Framework We build artificial opinion-only classifiers from relevance and opinion data at different rates of opinion accuracy and precision. Then we study the effect on MAP of the TOR system with such classifiers We are able to assess the opinion-only component quality of a given TOR system by comparing it with such artificial TOR systems. Results & Conclusions G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 2 / 14
  • 5. Summary Introduction Methodological Framework Context, issues and aims Experimentation Conclusions The topical opinion retrieval (TOR) TOR systems have two phases: Topic Retrieval : Ranking documents by content-only; Opinion Mining : Filtering or re-ranking these documents by opinion content. Filtering or re-ranking relevant documents by opinions always hurts the initial performance of topical retrieval (with the actual TREC submitted runs). Actually MAP always increases with a perfect opinion classifier! To assess the effectiveness of an opinion mining strategy should be sufficient to observe MAP of relevance and opinion (MAPR,O ) with respect to MAP of the baseline. Unfortunately different baselines provide different increment rates for the same technique of opinion mining. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 3 / 14
  • 6. Summary Introduction Methodological Framework Context, issues and aims Experimentation Conclusions To sum up The aim of our work is to introduce a methodological evaluation framework to: provide a best achievable MAPR,O for a given baseline; assess opinion mining effectiveness from the overall topical opinion retrieval performance; study best filtering strategies on top of topical retrieval. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 4 / 14
  • 7. Summary Introduction Methodological Framework Artificial classifiers Experimentation Conclusions Artificial opinion classifiers Let A be a complete set of assessments (by topic-relevance and opinion-only) for the collection. A binary opinion classifier is a function that maps documents in CO , the category of opinionated documents, and CO , the category of non-opinionated documents. O O CO KO · |O| (1 − KO ) · |O| CO (1 − KO ) · |O| KO · |O| A We define a class of artificial binary classifiers of opinion, CKO ,KO (·), where KO is the detection rate of true positive documents according to A; KO is the detection rate of true negative documents according to A; (1 − KO ) · |O| is the number of type I errors; (1 − KO ) · |O| is the number of type II errors; G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 5 / 14
  • 8. Summary Introduction Methodological Framework Artificial classifiers Experimentation Conclusions How to use the framework Given a topical opinion retrieval run and its MAPR,O = r value, we obtain the set of all KO and KO values, such that the artificial opinion A classifiers CKO ,KO (·) achieve r. We then compute accuracy, precision, recall and F-score of the opinion-only component as follows: KO ·|O|+KO ·|O| Acc = |O|+|O| KO ·|O| Prec = KO ·|O|+(1−KO )·|O| Rec = KO Prec·Rec F-score = 2 · Prec+Rec (β = 1) Any approach must improve the performance of the random classifier CP (O),1−P (O) (·), where P (O) = |O| is the a priori distribution of A |C| opinionated documents in the collection. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 6 / 14
  • 9. Summary Introduction Test Collection Methodological Framework Completing Data Experimentation Results Conclusions TREC Blog2008 collection The Blog2008 consists of 3.2 millions of web pages containing blog posts, a test suite of 150 topics and a set of relevance/opinion assessment (QRELs). Topics and QRELs are provided by the NIST. The NIST also provided the best 5 runs, named baselines, produced by some participants. Each baseline is made by 150 runs, one for each topic. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 7 / 14
  • 10. Summary Introduction Test Collection Methodological Framework Completing Data Experimentation Results Conclusions Complete the data Unfortunately the 150 topics are a sample of the topics treated by the collection and the largest part of documents are not assessed with respect to their content of opinion. To fill the lack of information on the opinion expressed by documents we need to “complete” the data. To complete the data we assume that each document is relevant for some topic t. Qrelst is completed assigning each non relevant document for t to the set of non relevant and opinionated documents with probability |OR − ORt | P (ORt ) = . |R − Rt | Analogously can be defined P (ORt ) as: |OR − ORt | P (ORt ) = = 1 − P (ORt ). |R − Rt | G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 8 / 14
  • 11. Summary Introduction Test Collection Methodological Framework Completing Data Experimentation Results Conclusions The Monte Carlo approach We use Monte Carlo approach to generate randomly different opinion assessments for not relevant data in order to complete data. We iterate previous step to generate randomly different values for precision, recall, F-score or accuracy and average them. Much less than 20 cycles are enough to obtain stable results. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 9 / 14
  • 12. Summary Introduction Test Collection Methodological Framework Completing Data Experimentation Results Conclusions How to use the framework to predict opinion performance Setting KO = KO = 1 the framework works as an oracle and provides a best achievable MAPR,O for each baseline. MAPR MAPR,O MAP∗ R,O ∆% BL1 0.3540 0.2639 0.4999 89% BL2 0.3382 0.2657 0.4737 78% BL3 0.4079 0.3201 0.5580 74% BL4 0.4776 0.3543 0.6294 78% BL5 0.4247 0.3147 0.5839 86% Mean Average Precision of relevance MAPR , relevance and opinion MAPR,O , optimal relevance and opinion MAP∗ , variation ∆% between R,O MAP∗R,O and MAPR,O . G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 10 / 14
  • 13. Summary Introduction Test Collection Methodological Framework Completing Data Experimentation Results Conclusions Mean percentage variations of MAPR,O filtering the Qrels∗ baselines through CKO ,K (·). O PP KO PP 1.0 0.9 0.8 0.7 0.6 0.5 KO PP P P 1.0 81% 63% 45% 27% 10% -9% 0.9 63% 46% 28% 11% -7% -24% 0.8 50% 33% 17% 0% -17% -33% 0.7 40% 24% 7% -8% -24% -39% 0.6 32% 16% 0% -15% -30% -44% 0.5 24% 9% -6% -20% -35% -48% KO contributes to improve MAPR,O more than KO . This is evident comparing the values of MAPR,O reported by the column and the row corresponding to KO = KO = 0.7. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 11 / 14
  • 14. Summary Introduction Test Collection Methodological Framework Completing Data Experimentation Results Conclusions Use the framework to compare the best three TREC approaches The best three approaches to the TREC Blog Track 2008 achieve, on the five baselines, the following performance: 1 MAPR,O = 0.3614, percentage improvements of +12%; 2 MAPR,O = 0.3565, percentage improvements of +10%; 3 MAPR,O = 0.3412, percentage improvements of +5%; These evidently different improvements do not significantly differ in terms of opinion mining effectiveness. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 12 / 14
  • 15. Summary Introduction Methodological Framework Experimentation Conclusions Conclusion Our evaluation framework assesses the effectiveness of opinion mining techniques. This framework, makes it possible to provide a best achievable MAPR,O for a given baseline. We determine the minimum values of accuracy, precision, recall and F-score that make it possible to improve a baseline. These values show that it is an hard task to improve a baseline by filtering its documents according to the opinion they express. We show how to compare different opinion mining techniques and to understand if they really improves on the state of the art. G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 13 / 14
  • 16. Summary Introduction Methodological Framework Experimentation Conclusions Thanks! G. Amati, G. Amodeo, V. Capozio, C. Gaibisso, G. Gambosi Assessing The Quality Of Opinion Retrieval Systems. 14 / 14