Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6September 2012
OUTLINE OF THE TALK Introduction Online Advertising A Modern Contextual Advertising System        Syntactic Textual An...
INTRODUCTION
OUTERNET & INTERNETEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
OUTERNET & INTERNET   In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 Septembe...
OUTERNET & INTERNET   In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 Septembe...
OUTERNET & INTERNET   In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 Septembe...
OUTERNET & INTERNET   In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 Septembe...
OUTERNET & INTERNET   In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 Septembe...
ONLINE ADVERTISING
ONLINE ADVERTISINGEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING   Sponsored SearchEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING   Banner AdvertisingEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING   Contextual AdvertisingEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONTEXTUAL ADVERTISING               Webpage                                              AdEloisa Vargiu (evargiu@bdigita...
ONLINE ADVERTISING   Is it always a good thing?Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
ONLINE ADVERTISING   Is it always a good thing?Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
A MODERN CONTEXTUALADVERTISING SYSTEM
A MODERN CONTEXTUAL ADVERTISING SYSTEMEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SYNTACTIC TEXTUAL ANALYSIS Text Summarization Bag of Words RepresentationEloisa Vargiu (evargiu@bdigital.org) – Cagliari...
SYNTACTIC TEXTUAL ANALYSIS   Text summarization        State of the art techniques            First and Last Paragraph ...
SYNTACTIC TEXTUAL ANALYSIS      First and Last Paragraph (FLP)           You don’t need to shell out thousands,          ...
SYNTACTIC TEXTUAL ANALYSIS      Title, First and Last Paragraph (TFLP)           You don’t need to shell out thousands,  ...
SYNTACTIC TEXTUAL ANALYSIS      Title, First and Last Paragraph (TFLP)           You don’t need to shell out thousands,  ...
SYNTACTIC TEXTUAL ANALYSIS      Snippet (S)http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLig...
SYNTACTIC TEXTUAL ANALYSIS      Title and Snippet (TS)http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.a...
SYNTACTIC TEXTUAL ANALYSIS   Bag of Words (BoW) representation        Dimensionality reduction            Stop-words re...
SYNTACTIC TEXTUAL ANALYSIS   Stop-words removal     You don’t need to shell out thousands,     survive various ballots, o...
SYNTACTIC TEXTUAL ANALYSIS   Stop-words removal      X X X X                 X     You don’t need to shell out thousands,...
SYNTACTIC TEXTUAL ANALYSIS   Stop-words removal      X X X X                 X     You don’t need to shell out thousands,...
SYNTACTIC TEXTUAL ANALYSIS   Stemming     Shell thousands, survive various     ballots, swap family member ticket     enj...
SYNTACTIC TEXTUAL ANALYSIS   Stemming     Shell thousands, survive various                            X           X     b...
SYNTACTIC TEXTUAL ANALYSIS   Stemming     Shell thousands, survive various                            X           X     b...
SYNTACTIC TEXTUAL ANALYSIS   Vector representation        TFIDF            <free0.0116>          <olymp, 0.0235>        ...
IS ENOUGH THE SOLE SYNTACTIC APPROACH?Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
IS ENOUGH THE SOLE SYNTACTIC APPROACH?   Polysemy…                                                    “BASS”Eloisa Vargiu...
IS ENOUGH THE SOLE SYNTACTIC APPROACH?   Synonymity…                           Vehicle                                   ...
SEMANTIC TEXTUAL ANALYSIS Taxonomy-based Classification Word DisambiguationEloisa Vargiu (evargiu@bdigital.org) – Caglia...
SEMANTIC TEXTUAL ANALYSIS   Taxonomy-based Classification        Classification Features (CF) representation        Ado...
SEMANTIC TEXTUAL ANALYSIS   Rocchio        Each centroid is defined as a sum of TF-IDF values of each         term, norm...
SEMANTIC TEXTUAL ANALYSIS   SVM        The score is related to the         distance of the webpage from a         separa...
SEMANTIC TEXTUAL ANALYSIS   Word Disambiguation        Bag of Concepts (BoC) representation        Adopted lexical supp...
SEMANTIC TEXTUAL ANALYSIS   WordNet        A large lexical database         of English. Nouns, verbs,         adjectives...
SEMANTIC TEXTUAL ANALYSIS   YAGO        A semantic knowledge base, derived from Wikipedia,         WordNet and GeoNamesE...
SEMANTIC TEXTUAL ANALYSIS   ConceptNet        A network of concepts connected by several semantic         relations (e.g...
MATCHING Similarity calculation RankingEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING   Similarity calculation        Adopted approaches            Cosine similarity            Jaccard indexElois...
MATCHING     o   Cosine similarityEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
MATCHING     o   Jaccard index              The Jaccard coefficient measures similarity between sample sets,              ...
MATCHING   Ranking        Adopted approaches            Simple ranking according to the calculated scores            L...
MATCHING     o   Learning to rank model              Pointwise approach               o   Each query-document pair in the ...
AN EXAMPLE: CONCACONCEPTS IN CONTEXTUAL ADVERTISING
CONCAEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
CONCAEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
RESULTSSYNTAX VS SEMANTICS
SYNTACTICAL ANALYSIS   Text summarization techniques comparison        FLP vs TFLP vs S vs TS   Comparison metrics     ...
SYNTACTICAL ANALYSIS   Results                  FLP           TFLP              S                 TS      π         0.745...
SEMANTIC ANALYSIS   Semantic approaches comparison        Anagnostopoulos et al. (2007) system vs Armano et al.         ...
SEMANTIC ANALYSIS   Ad repository        Built by hand by a domain expert   Taxonomy        BankSearch DatasetEloisa V...
SEMANTIC ANALYSIS   Results            k       Anagnostopoulos                  Armano et al.             ConCA          ...
SEMANTIC ANALYSIS   ResultsEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
SEMANTIC ANALYSIS   Results        Slight improvement by using concepts        Low values of α → CF more impact then Bo...
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS   Contextual Advertising System        Armano et al. (2011-TIR)   Matching fu...
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS   Ad repository        Built by hand by a domain expert   Taxonomy        Ba...
SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS   Results           α            π@1             π@2             π@3         π@...
CONCLUSIONS
CONCLUSIONS   Online advertising        represents one of the major sources of income for a large         number of webs...
CONCLUSIONS   Results show that        the impact of semantics is stronger than that of syntax        adopting more adv...
REFERENCES
REFERENCES   Syntactical Textual Analysis        Armano G., Giuliani A., & Vargiu E. Experimenting text summarization   ...
REFERENCES   Semantic Textual Analysis        Cortes C. & Vapnik, V.N. Support-Vector Networks, Machine Learning, 20,   ...
REFERENCES   Matching      Liu T.Y. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3,       pp....
Contact: Eloisa Vargiu – evargiu@bdigital.org
Upcoming SlideShare
Loading in...5
×

Seminario Eloisa Vargiu, 06-09-2012

370

Published on

Dopo una panoramica sugli aspetti correlati all'inserimento di messaggi pubblicitari in pagine Web, il seminario illustra alcune tecniche innovative di analisi testuale. In particolare, tali tecniche saranno incentrate sul suggerimento di annunci pubblicitari inerenti il contesto della pagina visualizzata.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Seminario Eloisa Vargiu, 06-09-2012

  1. 1. Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6September 2012
  2. 2. OUTLINE OF THE TALK Introduction Online Advertising A Modern Contextual Advertising System  Syntactic Textual Analysis  Semantic Textual Analysis  Matching  An Example: ConCA  Experimental Results Conclusions ReferencesEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  3. 3. INTRODUCTION
  4. 4. OUTERNET & INTERNETEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  5. 5. OUTERNET & INTERNET In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  6. 6. OUTERNET & INTERNET In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  7. 7. OUTERNET & INTERNET In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  8. 8. OUTERNET & INTERNET In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  9. 9. OUTERNET & INTERNET In Atkinson’s view something is missing…Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  10. 10. ONLINE ADVERTISING
  11. 11. ONLINE ADVERTISINGEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  12. 12. ONLINE ADVERTISING Sponsored SearchEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  13. 13. ONLINE ADVERTISING Banner AdvertisingEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  14. 14. ONLINE ADVERTISING Contextual AdvertisingEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  15. 15. CONTEXTUAL ADVERTISING Webpage AdEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  16. 16. ONLINE ADVERTISING Is it always a good thing?Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  17. 17. ONLINE ADVERTISING Is it always a good thing?Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  18. 18. A MODERN CONTEXTUALADVERTISING SYSTEM
  19. 19. A MODERN CONTEXTUAL ADVERTISING SYSTEMEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  20. 20. SYNTACTIC TEXTUAL ANALYSIS Text Summarization Bag of Words RepresentationEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  21. 21. SYNTACTIC TEXTUAL ANALYSIS Text summarization  State of the art techniques  First and Last Paragraph (FLP)  Title, First and Last Paragraph (TFLP)  Snippet (S)  Title and Snippet (TS)Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  22. 22. SYNTACTIC TEXTUAL ANALYSIS  First and Last Paragraph (FLP) You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. Theres all manner of free events and associated shenanigans taking place in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Indulge in a family feast Volunteer chefs at 24 Sure Start Centres across the UK are preparing to dish up free delights throughout the period. Details, along with all the other events that make up the Cultural Olympiad, are available on the site.http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  23. 23. SYNTACTIC TEXTUAL ANALYSIS  Title, First and Last Paragraph (TFLP) You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. Theres all manner of free events and associated shenanigans taking place in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Indulge in a family feast Volunteer chefs at 24 Sure Start Centres across the UK are preparing to dish up free delights throughout the period. Details, along with all the other events that make up the Cultural Olympiad, are available on the site.http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  24. 24. SYNTACTIC TEXTUAL ANALYSIS  Title, First and Last Paragraph (TFLP) You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. Theres all manner of free events and associated shenanigans taking place London 2012 – Ten ways to celebrate the Olympics for free in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money. Indulge in a family feast Volunteer chefs at 24 Sure Start Centres across the UK are preparing to dish up free delights throughout the period. Details, along with all the other events that make up the Cultural Olympiad, are available on the site.http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  25. 25. SYNTACTIC TEXTUAL ANALYSIS  Snippet (S)http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  26. 26. SYNTACTIC TEXTUAL ANALYSIS  Title and Snippet (TS)http://www.roughguides.com/website/Travel/SpotLight/ViewSpotLight.aspx?spotLightID=575
  27. 27. SYNTACTIC TEXTUAL ANALYSIS Bag of Words (BoW) representation  Dimensionality reduction  Stop-words removal  Stemming  Vector representation  Set of pairs <word, occurrences>Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  28. 28. SYNTACTIC TEXTUAL ANALYSIS Stop-words removal You don’t need to shell out thousands, survive various ballots, or swap a family member for a ticket to enjoy the 2012 Summer Olympic Games this year. Theres all manner of free events and associated shenanigans taking place in London and across the UK to mark the occasion. Here are ten ways to join in without spending any money.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  29. 29. SYNTACTIC TEXTUAL ANALYSIS Stop-words removal X X X X X You don’t need to shell out thousands, survive various ballots, X swap X or a family member forX ticket X enjoy the Xa to X 2012 Summer Olympic Games this X X X year. Theres all manner X free events of X and associated shenanigans taking placeX London and across the UK X in X X to X are mark the occasion. Here X ten ways X to joinX without spending any money. X in X XEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  30. 30. SYNTACTIC TEXTUAL ANALYSIS Stop-words removal X X X X X You don’t need to shell out thousands, survive various ballots, X swap X or a family member forX ticket X enjoy the Xa to X 2012 Summer Olympic Games this X X X year. Theres all manner X free events of X and associated shenanigans taking Shell thousands, survive various placeX London and across the UK X in X X to swap family member ticket ballots, X are enjoy 2012 Summer Olympic Games mark the occasion. Here X ten ways X to joinX without spending any money. X in X X year. Manner free events associated shenanigans taking place London across UK mark occasion. ten ways join spending money.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  31. 31. SYNTACTIC TEXTUAL ANALYSIS Stemming Shell thousands, survive various ballots, swap family member ticket enjoy 2012 Summer Olympic Games year. Manner free events associated shenanigans taking place London across UK mark occasion. ten ways join spending money.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  32. 32. SYNTACTIC TEXTUAL ANALYSIS Stemming Shell thousands, survive various X X ballots, swap family member ticket X X enjoy 2012 Summer Olympic Games X year. Manner free events associated X X X shenanigans taking place London X across UK mark occasion. ten ways X X X join spending money.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  33. 33. SYNTACTIC TEXTUAL ANALYSIS Stemming Shell thousands, survive various X X ballots, swap family member ticket X X enjoy 2012 Summer Olympic Games X year. Manner free events associated X X X shenanigans taking place London X across UK mark occasion. ten ways X X Shell thousand, surviv various ballot, X join spending money. swap famil member ticket enjoy 2012 Summer Olymp Game year. Manner free event associat shenanigan tak place London across UK mark occasion. ten way join spend money.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  34. 34. SYNTACTIC TEXTUAL ANALYSIS Vector representation  TFIDF <free0.0116> <olymp, 0.0235> <event, 0.0012> <way, 0.0125> <london, 0.0421> <celebrat, 0.0005> <chef, 0.0127> …Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  35. 35. IS ENOUGH THE SOLE SYNTACTIC APPROACH?Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  36. 36. IS ENOUGH THE SOLE SYNTACTIC APPROACH? Polysemy… “BASS”Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  37. 37. IS ENOUGH THE SOLE SYNTACTIC APPROACH? Synonymity… Vehicle Machine Car Auto AutomobileEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  38. 38. SEMANTIC TEXTUAL ANALYSIS Taxonomy-based Classification Word DisambiguationEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  39. 39. SEMANTIC TEXTUAL ANALYSIS Taxonomy-based Classification  Classification Features (CF) representation  Adopted classifiers  Rocchio  SVMEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  40. 40. SEMANTIC TEXTUAL ANALYSIS Rocchio  Each centroid is defined as a sum of TF-IDF values of each term, normalized by the number of webpages in the class  The classification is based on the cosine of the angle between the webpage and the centroid of each classEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  41. 41. SEMANTIC TEXTUAL ANALYSIS SVM  The score is related to the distance of the webpage from a separation hyperplaneEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  42. 42. SEMANTIC TEXTUAL ANALYSIS Word Disambiguation  Bag of Concepts (BoC) representation  Adopted lexical supports  WordNet  YAGO  ConceptNetEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  43. 43. SEMANTIC TEXTUAL ANALYSIS WordNet  A large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  44. 44. SEMANTIC TEXTUAL ANALYSIS YAGO  A semantic knowledge base, derived from Wikipedia, WordNet and GeoNamesEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  45. 45. SEMANTIC TEXTUAL ANALYSIS ConceptNet  A network of concepts connected by several semantic relations (e.g., “IsA”, “PartOf”)Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  46. 46. MATCHING Similarity calculation RankingEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  47. 47. MATCHING Similarity calculation  Adopted approaches  Cosine similarity  Jaccard indexEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  48. 48. MATCHING o Cosine similarityEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  49. 49. MATCHING o Jaccard index The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample setsEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  50. 50. MATCHING Ranking  Adopted approaches  Simple ranking according to the calculated scores  Learning to rank modelEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  51. 51. MATCHING o Learning to rank model Pointwise approach o Each query-document pair in the training data has a numerical or ordinal score o Regression problem approach: given a single query-document pair, predict its score Pairwise approach o Classification problem approach: learning a binary classifier which can tell which document is better in a given pair of documents Listwise approach o Optimization problem approach: try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training dataEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  52. 52. AN EXAMPLE: CONCACONCEPTS IN CONTEXTUAL ADVERTISING
  53. 53. CONCAEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  54. 54. CONCAEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  55. 55. RESULTSSYNTAX VS SEMANTICS
  56. 56. SYNTACTICAL ANALYSIS Text summarization techniques comparison  FLP vs TFLP vs S vs TS Comparison metrics | {relevant documents}  {retrieved documents} | TP    | {retrieved documents} | TP  FP | {relevant documents}  {retrieved documents} | TP    | {relevant documents} | TP  FN    F1  2   Taxonomy  BankSearch DatasetEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  57. 57. SYNTACTICAL ANALYSIS Results FLP TFLP S TS π 0.745 0.832 0.734 0.806 ρ 0.719 0.801 0.730 0.804 F1 0.732 0.816 0.732 0.805 #t 24 26 12 14  Adding information about the title improves the performances  TFLP has the best performanceEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  58. 58. SEMANTIC ANALYSIS Semantic approaches comparison  Anagnostopoulos et al. (2007) system vs Armano et al. (2011-TIR) vs ConCA Matching function   ( p, a)    simBoC  (1   )  simCF Comparison metric N k  TP i 1 j 1 ij   @k  N k  (TP  FP ) i 1 j 1 ij ijEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  59. 59. SEMANTIC ANALYSIS Ad repository  Built by hand by a domain expert Taxonomy  BankSearch DatasetEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  60. 60. SEMANTIC ANALYSIS Results k Anagnostopoulos Armano et al. ConCA et al. π α π α π α 1 0.674 0 0.768 0.2 0.773 0.1 2 0.653 0 0.750 0.2 0.752 0.1 3 0.617 0.2 0.729 0.3 0.728 0.1 4 0.582 0.2 0.701 0.3 0.701 0.1 5 0.546 0.1 0.663 0.0 0.668 0.1Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  61. 61. SEMANTIC ANALYSIS ResultsEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  62. 62. SEMANTIC ANALYSIS Results  Slight improvement by using concepts  Low values of α → CF more impact then BoCEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  63. 63. SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS Contextual Advertising System  Armano et al. (2011-TIR) Matching function   ( p, a)    simBoW  (1   )  simCF Comparisons varying α  α = 1 → pure syntax  α = 0 → pure semantics Comparison metric N k  TP i 1 j 1 ij   @k  N k  (TP  FP ) i 1 j 1 ij ijEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  64. 64. SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS Ad repository  Built by hand by a domain expert Taxonomy  BankSearch DatasetEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  65. 65. SYNTACTICAL ANALYSIS VS SEMANTIC ANALYSIS Results α π@1 π@2 π@3 π@4 π@5 0 0.765 0.746 0.719 0.696 0.663 0.1 0.767 0.749 0.724 0.698 0.663 0.2 0.768 0.750 0.729 0.699 0.662 0.3 0.766 0.749 0.729 0.701 0.661 0.4 0.756 0.747 0.729 0.698 0.658 0.5 0.744 0.735 0.721 0.693 0.651 0.6 0.722 0.717 0.703 0.681 0.640 0.7 0.685 0.687 0.680 0.658 0.625 0,8 0.632 0.637 0.635 0.614 0.586 0.9 0.557 0.552 0.548 0.534 0.512 1 0.408 0.421 0.372 0.388 0.640Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  66. 66. CONCLUSIONS
  67. 67. CONCLUSIONS Online advertising  represents one of the major sources of income for a large number of websites  is aimed at suggesting products and services to the population of Internet users Modern contextual advertising systems  put ads within the content of a generic, third party, webpage  adopt both syntactical and semantic textual analyses to select the most relevant ads for a given webpage  an example is ConCAEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  68. 68. CONCLUSIONS Results show that  the impact of semantics is stronger than that of syntax  adopting more advanced semantic techniques, such as concepts, improves the performances  the more the suggested ads are, the worse the performance isEloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  69. 69. REFERENCES
  70. 70. REFERENCES Syntactical Textual Analysis  Armano G., Giuliani A., & Vargiu E. Experimenting text summarization techniques for contextual advertising. 2nd Italian Information Retrieval Workshop (IIR’11) , 2011.  Armano G., Giuliani A. & Vargiu, E. Using snippets in text summarization: a comparative study and an application. 3rd Italian Information Retrieval Workshop (IIR’12), 2012.  Kolcz A., Prabakarmurthi V. & Kalita J. Summarization as feature selection for text categorization. 10th International Conference on Information and Knowledge Management (CIKM’01). ACM, New York, NY, USA, pp. 365–370, 2001.  Porter M. An algorithm for suffix stripping. Program 14, 3, 130–137, 1980.  Salton G., Wong A. & Yang C.S, A vector space model for automatic indexing, Communications of the ACM, 18, 11, pp.613-620, 1975.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  71. 71. REFERENCES Semantic Textual Analysis  Cortes C. & Vapnik, V.N. Support-Vector Networks, Machine Learning, 20, 1995.  Fellbaum C. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.  Liu H. & Singh P. ConceptNet: A practical commonsense reasoning tool-kit. BT Technology Journal 22, pp. 211–226, 2004.  Miller G.A. WordNet: A Lexical Database for English. Communications of the ACM, 38, 11, pp. 39-41, 1995.  Rocchio J. The SMART Retrieval System: Experiments in Automatic Document Processing. PrenticeHall, Chapter: Relevance feedback in information retrieval, pp. 313–323, 1971.  Suchanek F.M., Kasneci G. & Weikum G. Yago - A Core of Semantic Knowledge. 16th International World Wide Web conference (WWW 2007), 2007.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  72. 72. REFERENCES Matching  Liu T.Y. Learning to rank for information retrieval. Found. Trends Inf. Retr. 3, 3, pp. 225–331, 2009.  Radomski P.J. & Goeman, T.J. The homogenizing of Minnesota lake fish assemblages. Fisheries, 20, pp. 20–23, 1995. Comparison Systems  Anagnostopoulos A., Broder A. Z., Gabrilovich E., Josifovski V. & Riedel L. Just- in-time contextual advertising. 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, USA, pp. 331–340, 2007.  Armano G., Giuliani A. & Vargiu E. Studying the impact of text summarization on contextual advertising. 8th International Workshop on Text-based Information Retrieval (TIR’11), 2011.  Armano G., Giuliani A. & Vargiu E. Semantic enrichment of contextual advertising by using concepts. International Conference on Knowledge Discovery and Information Retrieval, 2011.Eloisa Vargiu (evargiu@bdigital.org) – Cagliari, 6 September 2012
  73. 73. Contact: Eloisa Vargiu – evargiu@bdigital.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×