SlideShare a Scribd company logo
1 of 73
Download to read offline
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6
September 2012
OUTLINE OF THE TALK
 Introduction
 Online Advertising

 A Modern Contextual Advertising System
        Syntactic Textual Analysis
        Semantic Textual Analysis
        Matching
        An Example: ConCA
        Experimental Results
 Conclusions
 References



Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
INTRODUCTION
OUTERNET & INTERNET




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
   In Atkinson’s view something is missing…




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
   In Atkinson’s view something is missing…




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
   In Atkinson’s view something is missing…




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
   In Atkinson’s view something is missing…




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET
   In Atkinson’s view something is missing…




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
ONLINE ADVERTISING




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
   Sponsored Search




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
   Banner Advertising




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
   Contextual Advertising




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONTEXTUAL ADVERTISING
               Webpage                                              Ad




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
   Is it always a good thing?




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING
   Is it always a good thing?




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
A MODERN CONTEXTUAL
ADVERTISING SYSTEM
A MODERN CONTEXTUAL ADVERTISING SYSTEM




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
 Text Summarization
 Bag of Words Representation




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Text summarization




        State of the art techniques
            First and Last Paragraph (FLP)
            Title, First and Last Paragraph (TFLP)

            Snippet (S)

            Title and Snippet (TS)




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
      First and Last Paragraph (FLP)
           You don’t need to shell out thousands,
           survive various ballots, or swap a family
           member for a ticket to enjoy the 2012
           Summer Olympic Games this year. There's
           all manner of free events and associated
           shenanigans taking place in London and
           across the UK to mark the occasion. Here
           are ten ways to join in without spending any
           money.

           Indulge in a family feast
           Volunteer chefs at 24 Sure Start Centres
           across the UK are preparing to dish up free
           delights throughout the period. Details,
           along with all the other events that make up
           the Cultural Olympiad, are available on the
           site.
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
      Title, First and Last Paragraph (TFLP)
           You don’t need to shell out thousands,
           survive various ballots, or swap a family
           member for a ticket to enjoy the 2012
           Summer Olympic Games this year. There's
           all manner of free events and associated
           shenanigans taking place in London and
           across the UK to mark the occasion. Here
           are ten ways to join in without spending any
           money.

           Indulge in a family feast
           Volunteer chefs at 24 Sure Start Centres
           across the UK are preparing to dish up free
           delights throughout the period. Details,
           along with all the other events that make up
           the Cultural Olympiad, are available on the
           site.
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
      Title, First and Last Paragraph (TFLP)
           You don’t need to shell out thousands,
           survive various ballots, or swap a family
           member for a ticket to enjoy the 2012
           Summer Olympic Games this year. There's
           all manner of free events and associated
           shenanigans taking place London 2012 – Ten ways to celebrate the Olympics for free
                                       in London and
           across the UK to mark the occasion. Here
           are ten ways to join in without spending any
           money.

           Indulge in a family feast
           Volunteer chefs at 24 Sure Start Centres
           across the UK are preparing to dish up free
           delights throughout the period. Details,
           along with all the other events that make up
           the Cultural Olympiad, are available on the
           site.
http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
      Snippet (S)




http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
      Title and Snippet (TS)




http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
SYNTACTIC TEXTUAL ANALYSIS
   Bag of Words (BoW) representation




        Dimensionality reduction
            Stop-words removal
            Stemming

        Vector representation
              Set of pairs <word, occurrences>




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Stop-words removal
     You don’t need to shell out thousands,
     survive various ballots, or swap a
     family member for a ticket to enjoy the
     2012 Summer Olympic Games this
     year. There's all manner of free events
     and associated shenanigans taking
     place in London and across the UK to
     mark the occasion. Here are ten ways
     to join in without spending any money.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Stop-words removal

      X X X X                 X
     You don’t need to shell out thousands,
     survive various ballots, X swap X
                              or      a
     family member forX ticket X enjoy the
                      Xa          to    X
     2012 Summer Olympic Games this   X
               X X
     year. There's all manner X free events
                               of
      X
     and associated shenanigans taking
     placeX London and across the UK X
            in          X            X to
                          X are
     mark the occasion. Here X ten ways
            X
     to joinX without spending any money.
     X in X                       X




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Stop-words removal

      X X X X                 X
     You don’t need to shell out thousands,
     survive various ballots, X swap X
                              or       a
     family member forX ticket X enjoy the
                      Xa          to       X
     2012 Summer Olympic Games this    X
               X X
     year. There's all manner X free events
                               of
      X
     and associated shenanigans taking
                                     Shell thousands, survive various
     placeX London and across the UK X
            in          X            X to swap family member ticket
                                     ballots,
                          X are enjoy 2012 Summer Olympic Games
     mark the occasion. Here X ten ways
            X
     to joinX without spending any money.
     X in X                       X year. Manner free events associated
                                     shenanigans taking place London
                                     across UK mark occasion. ten ways
                                     join spending money.



Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Stemming
     Shell thousands, survive various
     ballots, swap family member ticket
     enjoy 2012 Summer Olympic Games
     year. Manner free events associated
     shenanigans taking place London
     across UK mark occasion. ten ways
     join spending money.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Stemming
     Shell thousands, survive various
                            X
           X
     ballots, swap family member ticket
                        X
                                X
     enjoy 2012 Summer Olympic Games    X
     year. Manner free events associated
                             X         X
                 X
     shenanigans taking place London
                      X
     across UK mark occasion. ten ways
         X                            X
                X
     join spending money.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Stemming
     Shell thousands, survive various
                            X
           X
     ballots, swap family member ticket
                        X
                                X
     enjoy 2012 Summer Olympic Games     X
     year. Manner free events associated
                             X          X
                 X
     shenanigans taking place London
                      X
     across UK mark occasion. ten ways
         X                             X
                                   Shell thousand, surviv various ballot,
                X
     join spending money.
                                   swap famil member ticket enjoy 2012
                                   Summer Olymp Game year. Manner
                                   free event associat shenanigan tak
                                   place London across UK mark
                                   occasion. ten way join spend money.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS
   Vector representation
        TFIDF

            <free0.0116>
          <olymp, 0.0235>
          <event, 0.0012>
           <way, 0.0125>
         <london, 0.0421>
         <celebrat, 0.0005>
           <chef, 0.0127>
                  …




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
IS ENOUGH THE SOLE SYNTACTIC APPROACH?




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
IS ENOUGH THE SOLE SYNTACTIC APPROACH?
   Polysemy…

                                                    “BASS”




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
IS ENOUGH THE SOLE SYNTACTIC APPROACH?
   Synonymity…




                           Vehicle                                          Machine
                                              Car                    Auto
                                                        Automobile




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
 Taxonomy-based Classification
 Word Disambiguation




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   Taxonomy-based Classification
        Classification Features (CF) representation




        Adopted classifiers
            Rocchio
            SVM




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   Rocchio
        Each centroid is defined as a sum of TF-IDF values of each
         term, normalized by the number of webpages in the class




        The classification is based on the
         cosine of the angle between the
         webpage and the centroid of each class




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   SVM




        The score is related to the
         distance of the webpage from a
         separation hyperplane

Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   Word Disambiguation
        Bag of Concepts (BoC) representation




        Adopted lexical supports
              WordNet
              YAGO
              ConceptNet

Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   WordNet
        A large lexical database
         of English. Nouns, verbs,
         adjectives and adverbs
         are grouped into sets of
         cognitive synonyms
         (synsets), each
         expressing a distinct
         concept.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   YAGO
        A semantic knowledge base, derived from Wikipedia,
         WordNet and GeoNames




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC TEXTUAL ANALYSIS
   ConceptNet
        A network of concepts connected by several semantic
         relations (e.g., “IsA”, “PartOf”)




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
 Similarity calculation
 Ranking




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
   Similarity calculation




        Adopted approaches
            Cosine similarity
            Jaccard index


Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
     o   Cosine similarity




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
     o   Jaccard index
              The Jaccard coefficient measures similarity between sample sets,
              and is defined as the size of the intersection divided by the size of
              the union of the sample sets




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
   Ranking
        Adopted approaches
            Simple ranking according to the calculated scores
            Learning to rank model




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING
     o   Learning to rank model
              Pointwise approach
               o   Each query-document pair in the training data has a numerical
                   or ordinal score
               o   Regression problem approach: given a single query-document
                   pair, predict its score
              Pairwise approach
               o   Classification problem approach: learning a binary classifier
                   which can tell which document is better in a given pair of
                   documents
              Listwise approach
               o   Optimization problem approach: try to directly optimize the
                   value of one of the above evaluation measures, averaged over
                   all queries in the training data

Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
AN EXAMPLE: CONCA
CONCEPTS IN CONTEXTUAL ADVERTISING
CONCA




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONCA




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
RESULTS
SYNTAX VS SEMANTICS
SYNTACTICAL ANALYSIS
   Text summarization techniques comparison
        FLP vs TFLP vs S vs TS
   Comparison metrics

          | {relevant documents}  {retrieved documents} |     TP
                                                        
                       | {retrieved documents} |             TP  FP
          | {relevant documents}  {retrieved documents} |     TP
                                                        
                        | {relevant documents} |             TP  FN
                    
        F1  2
                    

   Taxonomy
        BankSearch Dataset
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTICAL ANALYSIS
   Results
                  FLP           TFLP              S                 TS
      π         0.745          0.832           0.734            0.806
      ρ         0.719          0.801           0.730            0.804
      F1        0.732          0.816           0.732            0.805
      #t           24             26             12                 14

      Adding information about the title improves the
       performances
      TFLP has the best performance




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
   Semantic approaches comparison
        Anagnostopoulos et al. (2007) system vs Armano et al.
         (2011-TIR) vs ConCA
   Matching function
      ( p, a)    simBoC  (1   )  simCF

   Comparison metric
                               N     k

                               TP
                               i 1 j 1
                                              ij

          @k          N    k

                         (TP  FP )
                        i 1 j 1
                                         ij        ij




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
   Ad repository
        Built by hand by a domain expert
   Taxonomy
        BankSearch Dataset




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
   Results

            k       Anagnostopoulos                  Armano et al.             ConCA
                         et al.
                         π             α               π            α      π           α
            1        0.674             0            0.768           0.2   0.773        0.1
            2        0.653             0           0.750            0.2   0.752        0.1
            3        0.617           0.2           0.729            0.3   0.728        0.1
            4        0.582           0.2            0.701           0.3   0.701        0.1
            5        0.546           0.1           0.663            0.0   0.668        0.1




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
   Results




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS
   Results
        Slight improvement by using concepts
        Low values of α → CF more impact then BoC




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS
   Contextual Advertising System
        Armano et al. (2011-TIR)
   Matching function
      ( p, a)    simBoW  (1   )  simCF

   Comparisons varying α
        α = 1 → pure syntax
        α = 0 → pure semantics
   Comparison metric
                              N     k

                              TP
                              i 1 j 1
                                             ij

         @k          N    k

                        (TP  FP )
                       i 1 j 1
                                        ij        ij



Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS
   Ad repository
        Built by hand by a domain expert
   Taxonomy
        BankSearch Dataset




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS
   Results
           α            π@1             π@2             π@3         π@4     π@5
           0           0.765           0.746           0.719        0.696   0.663
         0.1           0.767           0.749           0.724        0.698   0.663
         0.2           0.768           0.750          0.729         0.699   0.662
         0.3           0.766           0.749          0.729         0.701   0.661
         0.4           0.756           0.747          0.729         0.698   0.658
         0.5           0.744           0.735           0.721        0.693   0.651
         0.6           0.722           0.717           0.703        0.681   0.640
         0.7           0.685           0.687           0.680        0.658   0.625
         0,8           0.632           0.637           0.635        0.614   0.586
         0.9           0.557           0.552           0.548        0.534   0.512
           1           0.408           0.421           0.372        0.388   0.640
Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONCLUSIONS
CONCLUSIONS
   Online advertising
        represents one of the major sources of income for a large
         number of websites
        is aimed at suggesting products and services to the
         population of Internet users
   Modern contextual advertising systems
      put ads within the content of a generic, third party,
       webpage
      adopt both syntactical and semantic textual analyses to
       select the most relevant ads for a given webpage
      an example is ConCA




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONCLUSIONS
   Results show that
        the impact of semantics is stronger than that of syntax
        adopting more advanced semantic techniques, such as
         concepts, improves the performances
        the more the suggested ads are, the worse the
         performance is




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
REFERENCES
REFERENCES
   Syntactical Textual Analysis
        Armano G., Giuliani A., & Vargiu E. Experimenting text summarization
         techniques for contextual advertising. 2nd Italian Information Retrieval
         Workshop (IIR’11) , 2011.
        Armano G., Giuliani A. & Vargiu, E. Using snippets in text summarization: a
         comparative study and an application. 3rd Italian Information Retrieval
         Workshop (IIR’12), 2012.
        Kolcz A., Prabakarmurthi V. & Kalita J. Summarization as feature selection for
         text categorization. 10th International Conference on Information and
         Knowledge Management (CIKM’01). ACM, New York, NY, USA, pp. 365–370,
         2001.
        Porter M. An algorithm for suffix stripping. Program 14, 3, 130–137, 1980.
        Salton G., Wong A. & Yang C.S, A vector space model for automatic indexing,
         Communications of the ACM, 18, 11, pp.613-620, 1975.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
REFERENCES
   Semantic Textual Analysis
        Cortes C. & Vapnik, V.N. Support-Vector Networks, Machine Learning, 20,
         1995.
        Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT
         Press, 1998.
        Liu H. & Singh P. ConceptNet: A practical commonsense reasoning tool-kit. BT
         Technology Journal 22, pp. 211–226, 2004.
        Miller G.A. WordNet: A Lexical Database for English. Communications of the
         ACM, 38, 11, pp. 39-41, 1995.
        Rocchio J. The SMART Retrieval System: Experiments in Automatic Document
         Processing. PrenticeHall, Chapter: Relevance feedback in information
         retrieval, pp. 313–323, 1971.
        Suchanek F.M., Kasneci G. & Weikum G. Yago - A Core of Semantic
         Knowledge. 16th International World Wide Web conference (WWW 2007),
         2007.




Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
REFERENCES
   Matching
      Liu T.Y. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3,
       pp. 225–331, 2009.
      Radomski P.J. & Goeman, T.J. The homogenizing of Minnesota lake fish
       assemblages. Fisheries, 20, pp. 20–23, 1995.

   Comparison Systems
      Anagnostopoulos A., Broder A. Z., Gabrilovich E., Josifovski V. & Riedel L. Just-
       in-time contextual advertising. 16th ACM Conference on Information and
       Knowledge Management (CIKM’07). ACM, New York, NY, USA, pp. 331–340,
       2007.
      Armano G., Giuliani A. & Vargiu E. Studying the impact of text summarization
       on contextual advertising. 8th International Workshop on Text-based
       Information Retrieval (TIR’11), 2011.
      Armano G., Giuliani A. & Vargiu E. Semantic enrichment of contextual
       advertising by using concepts. International Conference on Knowledge
       Discovery and Information Retrieval, 2011.


Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
Contact: Eloisa Vargiu – evargiu@bdigital.org

More Related Content

More from CRS4 Research Center in Sardinia

GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...CRS4 Research Center in Sardinia
 
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid CRS4 Research Center in Sardinia
 
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...CRS4 Research Center in Sardinia
 
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...CRS4 Research Center in Sardinia
 
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015CRS4 Research Center in Sardinia
 
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...CRS4 Research Center in Sardinia
 
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)CRS4 Research Center in Sardinia
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...CRS4 Research Center in Sardinia
 
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...CRS4 Research Center in Sardinia
 
Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)
Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)
Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)CRS4 Research Center in Sardinia
 
Modellistica molecolare e applicazioni alla sclerosi multipla
Modellistica molecolare e applicazioni alla sclerosi multiplaModellistica molecolare e applicazioni alla sclerosi multipla
Modellistica molecolare e applicazioni alla sclerosi multiplaCRS4 Research Center in Sardinia
 

More from CRS4 Research Center in Sardinia (20)

GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
GIS partecipativo. Laura Muscas e Valentina Spanu (CRS4), Cagliari, 21 Ottobr...
 
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
Alfonso Damiano (Università di Cagliari) ICT per Smart Grid
 
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. PirasBig Data Infrastructures - Hadoop ecosystem, M. E. Piras
Big Data Infrastructures - Hadoop ecosystem, M. E. Piras
 
Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
 Big Data Analytics, Giovanni Delussu e Marco Enrico Piras  Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
Big Data Analytics, Giovanni Delussu e Marco Enrico Piras
 
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
Dinamica Molecolare e Modellistica dell'interazione di lipidi col recettore P...
 
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
Innovazione e infrastrutture cloud per lo sviluppo di applicativi web e mobil...
 
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
ORDBMS e NoSQL nel trattamento dei dati geografici parte seconda. 30 Sett. 2015
 
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
Sistemi No-Sql e Object-Relational nella gestione dei dati geografici 30 Sett...
 
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
Elementi di sismica a riflessione e Georadar (Gian Piero Deidda, UNICA)
 
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
Near Surface Geoscience Conference 2014, Athens - Real-­time or full­‐precisi...
 
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)SmartGeo/Eiagrid portal (Guido Satta, CRS4)
SmartGeo/Eiagrid portal (Guido Satta, CRS4)
 
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
Luigi Atzori Metabolomica: Introduzione e review di alcune applicazioni in am...
 
Mobile Graphics (part2)
Mobile Graphics (part2)Mobile Graphics (part2)
Mobile Graphics (part2)
 
Mobile Graphics (part1)
Mobile Graphics (part1)Mobile Graphics (part1)
Mobile Graphics (part1)
 
2015 crs4-seminar-massive-models-full
2015 crs4-seminar-massive-models-full2015 crs4-seminar-massive-models-full
2015 crs4-seminar-massive-models-full
 
A Survey of Compressed GPU-based Direct Volume Rendering
A Survey of Compressed GPU-based Direct Volume RenderingA Survey of Compressed GPU-based Direct Volume Rendering
A Survey of Compressed GPU-based Direct Volume Rendering
 
Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)
Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)
Scripting e DataWarehouse sui Big Data. Luca Pireddu (CRS4)
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Modellistica molecolare e applicazioni alla sclerosi multipla
Modellistica molecolare e applicazioni alla sclerosi multiplaModellistica molecolare e applicazioni alla sclerosi multipla
Modellistica molecolare e applicazioni alla sclerosi multipla
 
Amit Kumar (CRS4, Università di Cagliari)
Amit Kumar (CRS4, Università di Cagliari)Amit Kumar (CRS4, Università di Cagliari)
Amit Kumar (CRS4, Università di Cagliari)
 

Seminario Eloisa Vargiu, 06-09-2012

  • 1. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 2. OUTLINE OF THE TALK  Introduction  Online Advertising  A Modern Contextual Advertising System  Syntactic Textual Analysis  Semantic Textual Analysis  Matching  An Example: ConCA  Experimental Results  Conclusions  References Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 4. OUTERNET & INTERNET Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 5. OUTERNET & INTERNET  In Atkinson’s view something is missing… Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 6. OUTERNET & INTERNET  In Atkinson’s view something is missing… Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 7. OUTERNET & INTERNET  In Atkinson’s view something is missing… Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 8. OUTERNET & INTERNET  In Atkinson’s view something is missing… Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 9. OUTERNET & INTERNET  In Atkinson’s view something is missing… Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 11. ONLINE ADVERTISING Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 12. ONLINE ADVERTISING  Sponsored Search Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 13. ONLINE ADVERTISING  Banner Advertising Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 14. ONLINE ADVERTISING  Contextual Advertising Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 15. CONTEXTUAL ADVERTISING Webpage Ad Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 16. ONLINE ADVERTISING  Is it always a good thing? Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 17. ONLINE ADVERTISING  Is it always a good thing? Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 19. A MODERN CONTEXTUAL ADVERTISING SYSTEM Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 20. SYNTACTIC TEXTUAL ANALYSIS  Text Summarization  Bag of Words Representation Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 21. SYNTACTIC TEXTUAL ANALYSIS  Text summarization  State of the art techniques  First and Last Paragraph (FLP)  Title, First and Last Paragraph (TFLP)  Snippet (S)  Title and Snippet (TS) Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 22. SYNTACTIC TEXTUAL ANALYSIS  First and Last Paragraph (FLP) You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. There's all manner of free events and associated shenanigans taking place in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Indulge in a family feast Volunteer chefs at 24 Sure Start Centres across the UK are preparing to dish up free delights throughout the period. Details, along with all the other events that make up the Cultural Olympiad, are available on the site. http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  • 23. SYNTACTIC TEXTUAL ANALYSIS  Title, First and Last Paragraph (TFLP) You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. There's all manner of free events and associated shenanigans taking place in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Indulge in a family feast Volunteer chefs at 24 Sure Start Centres across the UK are preparing to dish up free delights throughout the period. Details, along with all the other events that make up the Cultural Olympiad, are available on the site. http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  • 24. SYNTACTIC TEXTUAL ANALYSIS  Title, First and Last Paragraph (TFLP) You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. There's all manner of free events and associated shenanigans taking place London 2012 – Ten ways to celebrate the Olympics for free in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Indulge in a family feast Volunteer chefs at 24 Sure Start Centres across the UK are preparing to dish up free delights throughout the period. Details, along with all the other events that make up the Cultural Olympiad, are available on the site. http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  • 25. SYNTACTIC TEXTUAL ANALYSIS  Snippet (S) http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  • 26. SYNTACTIC TEXTUAL ANALYSIS  Title and Snippet (TS) http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  • 27. SYNTACTIC TEXTUAL ANALYSIS  Bag of Words (BoW) representation  Dimensionality reduction  Stop-words removal  Stemming  Vector representation  Set of pairs <word, occurrences> Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 28. SYNTACTIC TEXTUAL ANALYSIS  Stop-words removal You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. There's all manner of free events and associated shenanigans taking place in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 29. SYNTACTIC TEXTUAL ANALYSIS  Stop-words removal X X X X X You don’t need to shell out thousands, survive various ballots, X swap X or a family member forX ticket X enjoy the Xa to X 2012 Summer Olympic Games this X X X year. There's all manner X free events of X and associated shenanigans taking placeX London and across the UK X in X X to X are mark the occasion. Here X ten ways X to joinX without spending any money. X in X X Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 30. SYNTACTIC TEXTUAL ANALYSIS  Stop-words removal X X X X X You don’t need to shell out thousands, survive various ballots, X swap X or a family member forX ticket X enjoy the Xa to X 2012 Summer Olympic Games this X X X year. There's all manner X free events of X and associated shenanigans taking Shell thousands, survive various placeX London and across the UK X in X X to swap family member ticket ballots, X are enjoy 2012 Summer Olympic Games mark the occasion. Here X ten ways X to joinX without spending any money. X in X X year. Manner free events associated shenanigans taking place London across UK mark occasion. ten ways join spending money. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 31. SYNTACTIC TEXTUAL ANALYSIS  Stemming Shell thousands, survive various ballots, swap family member ticket enjoy 2012 Summer Olympic Games year. Manner free events associated shenanigans taking place London across UK mark occasion. ten ways join spending money. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 32. SYNTACTIC TEXTUAL ANALYSIS  Stemming Shell thousands, survive various X X ballots, swap family member ticket X X enjoy 2012 Summer Olympic Games X year. Manner free events associated X X X shenanigans taking place London X across UK mark occasion. ten ways X X X join spending money. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 33. SYNTACTIC TEXTUAL ANALYSIS  Stemming Shell thousands, survive various X X ballots, swap family member ticket X X enjoy 2012 Summer Olympic Games X year. Manner free events associated X X X shenanigans taking place London X across UK mark occasion. ten ways X X Shell thousand, surviv various ballot, X join spending money. swap famil member ticket enjoy 2012 Summer Olymp Game year. Manner free event associat shenanigan tak place London across UK mark occasion. ten way join spend money. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 34. SYNTACTIC TEXTUAL ANALYSIS  Vector representation  TFIDF <free0.0116> <olymp, 0.0235> <event, 0.0012> <way, 0.0125> <london, 0.0421> <celebrat, 0.0005> <chef, 0.0127> … Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 35. IS ENOUGH THE SOLE SYNTACTIC APPROACH? Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 36. IS ENOUGH THE SOLE SYNTACTIC APPROACH?  Polysemy… “BASS” Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 37. IS ENOUGH THE SOLE SYNTACTIC APPROACH?  Synonymity… Vehicle Machine Car Auto Automobile Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 38. SEMANTIC TEXTUAL ANALYSIS  Taxonomy-based Classification  Word Disambiguation Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 39. SEMANTIC TEXTUAL ANALYSIS  Taxonomy-based Classification  Classification Features (CF) representation  Adopted classifiers  Rocchio  SVM Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 40. SEMANTIC TEXTUAL ANALYSIS  Rocchio  Each centroid is defined as a sum of TF-IDF values of each term, normalized by the number of webpages in the class  The classification is based on the cosine of the angle between the webpage and the centroid of each class Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 41. SEMANTIC TEXTUAL ANALYSIS  SVM  The score is related to the distance of the webpage from a separation hyperplane Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 42. SEMANTIC TEXTUAL ANALYSIS  Word Disambiguation  Bag of Concepts (BoC) representation  Adopted lexical supports  WordNet  YAGO  ConceptNet Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 43. SEMANTIC TEXTUAL ANALYSIS  WordNet  A large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 44. SEMANTIC TEXTUAL ANALYSIS  YAGO  A semantic knowledge base, derived from Wikipedia, WordNet and GeoNames Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 45. SEMANTIC TEXTUAL ANALYSIS  ConceptNet  A network of concepts connected by several semantic relations (e.g., “IsA”, “PartOf”) Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 46. MATCHING  Similarity calculation  Ranking Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 47. MATCHING  Similarity calculation  Adopted approaches  Cosine similarity  Jaccard index Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 48. MATCHING o Cosine similarity Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 49. MATCHING o Jaccard index The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 50. MATCHING  Ranking  Adopted approaches  Simple ranking according to the calculated scores  Learning to rank model Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 51. MATCHING o Learning to rank model Pointwise approach o Each query-document pair in the training data has a numerical or ordinal score o Regression problem approach: given a single query-document pair, predict its score Pairwise approach o Classification problem approach: learning a binary classifier which can tell which document is better in a given pair of documents Listwise approach o Optimization problem approach: try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training data Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 52. AN EXAMPLE: CONCA CONCEPTS IN CONTEXTUAL ADVERTISING
  • 53. CONCA Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 54. CONCA Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 56. SYNTACTICAL ANALYSIS  Text summarization techniques comparison  FLP vs TFLP vs S vs TS  Comparison metrics | {relevant documents}  {retrieved documents} | TP    | {retrieved documents} | TP  FP | {relevant documents}  {retrieved documents} | TP    | {relevant documents} | TP  FN    F1  2    Taxonomy  BankSearch Dataset Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 57. SYNTACTICAL ANALYSIS  Results FLP TFLP S TS π 0.745 0.832 0.734 0.806 ρ 0.719 0.801 0.730 0.804 F1 0.732 0.816 0.732 0.805 #t 24 26 12 14  Adding information about the title improves the performances  TFLP has the best performance Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 58. SEMANTIC ANALYSIS  Semantic approaches comparison  Anagnostopoulos et al. (2007) system vs Armano et al. (2011-TIR) vs ConCA  Matching function   ( p, a)    simBoC  (1   )  simCF  Comparison metric N k  TP i 1 j 1 ij   @k  N k  (TP  FP ) i 1 j 1 ij ij Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 59. SEMANTIC ANALYSIS  Ad repository  Built by hand by a domain expert  Taxonomy  BankSearch Dataset Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 60. SEMANTIC ANALYSIS  Results k Anagnostopoulos Armano et al. ConCA et al. π α π α π α 1 0.674 0 0.768 0.2 0.773 0.1 2 0.653 0 0.750 0.2 0.752 0.1 3 0.617 0.2 0.729 0.3 0.728 0.1 4 0.582 0.2 0.701 0.3 0.701 0.1 5 0.546 0.1 0.663 0.0 0.668 0.1 Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 61. SEMANTIC ANALYSIS  Results Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 62. SEMANTIC ANALYSIS  Results  Slight improvement by using concepts  Low values of α → CF more impact then BoC Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 63. SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS  Contextual Advertising System  Armano et al. (2011-TIR)  Matching function   ( p, a)    simBoW  (1   )  simCF  Comparisons varying α  α = 1 → pure syntax  α = 0 → pure semantics  Comparison metric N k  TP i 1 j 1 ij   @k  N k  (TP  FP ) i 1 j 1 ij ij Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 64. SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS  Ad repository  Built by hand by a domain expert  Taxonomy  BankSearch Dataset Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 65. SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS  Results α π@1 π@2 π@3 π@4 π@5 0 0.765 0.746 0.719 0.696 0.663 0.1 0.767 0.749 0.724 0.698 0.663 0.2 0.768 0.750 0.729 0.699 0.662 0.3 0.766 0.749 0.729 0.701 0.661 0.4 0.756 0.747 0.729 0.698 0.658 0.5 0.744 0.735 0.721 0.693 0.651 0.6 0.722 0.717 0.703 0.681 0.640 0.7 0.685 0.687 0.680 0.658 0.625 0,8 0.632 0.637 0.635 0.614 0.586 0.9 0.557 0.552 0.548 0.534 0.512 1 0.408 0.421 0.372 0.388 0.640 Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 67. CONCLUSIONS  Online advertising  represents one of the major sources of income for a large number of websites  is aimed at suggesting products and services to the population of Internet users  Modern contextual advertising systems  put ads within the content of a generic, third party, webpage  adopt both syntactical and semantic textual analyses to select the most relevant ads for a given webpage  an example is ConCA Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 68. CONCLUSIONS  Results show that  the impact of semantics is stronger than that of syntax  adopting more advanced semantic techniques, such as concepts, improves the performances  the more the suggested ads are, the worse the performance is Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 70. REFERENCES  Syntactical Textual Analysis  Armano G., Giuliani A., & Vargiu E. Experimenting text summarization techniques for contextual advertising. 2nd Italian Information Retrieval Workshop (IIR’11) , 2011.  Armano G., Giuliani A. & Vargiu, E. Using snippets in text summarization: a comparative study and an application. 3rd Italian Information Retrieval Workshop (IIR’12), 2012.  Kolcz A., Prabakarmurthi V. & Kalita J. Summarization as feature selection for text categorization. 10th International Conference on Information and Knowledge Management (CIKM’01). ACM, New York, NY, USA, pp. 365–370, 2001.  Porter M. An algorithm for suffix stripping. Program 14, 3, 130–137, 1980.  Salton G., Wong A. & Yang C.S, A vector space model for automatic indexing, Communications of the ACM, 18, 11, pp.613-620, 1975. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 71. REFERENCES  Semantic Textual Analysis  Cortes C. & Vapnik, V.N. Support-Vector Networks, Machine Learning, 20, 1995.  Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.  Liu H. & Singh P. ConceptNet: A practical commonsense reasoning tool-kit. BT Technology Journal 22, pp. 211–226, 2004.  Miller G.A. WordNet: A Lexical Database for English. Communications of the ACM, 38, 11, pp. 39-41, 1995.  Rocchio J. The SMART Retrieval System: Experiments in Automatic Document Processing. PrenticeHall, Chapter: Relevance feedback in information retrieval, pp. 313–323, 1971.  Suchanek F.M., Kasneci G. & Weikum G. Yago - A Core of Semantic Knowledge. 16th International World Wide Web conference (WWW 2007), 2007. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 72. REFERENCES  Matching  Liu T.Y. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3, pp. 225–331, 2009.  Radomski P.J. & Goeman, T.J. The homogenizing of Minnesota lake fish assemblages. Fisheries, 20, pp. 20–23, 1995.  Comparison Systems  Anagnostopoulos A., Broder A. Z., Gabrilovich E., Josifovski V. & Riedel L. Just- in-time contextual advertising. 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, USA, pp. 331–340, 2007.  Armano G., Giuliani A. & Vargiu E. Studying the impact of text summarization on contextual advertising. 8th International Workshop on Text-based Information Retrieval (TIR’11), 2011.  Armano G., Giuliani A. & Vargiu E. Semantic enrichment of contextual advertising by using concepts. International Conference on Knowledge Discovery and Information Retrieval, 2011. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  • 73. Contact: Eloisa Vargiu – evargiu@bdigital.org