Chances and Challenges in Comparing
  Cross-Language Retrieval Tools

             Giovanna Roda
              Vienna, Aus...
CLEF-IP: the Intellectual Property track at CLEF




  CLEF-IP is an evaluation track within the Cross Language
  Evaluati...
CLEF-IP: the Intellectual Property track at CLEF




  CLEF-IP is an evaluation track within the Cross Language
  Evaluati...
CLEF-IP: the Intellectual Property track at CLEF




  CLEF-IP is an evaluation track within the Cross Language
  Evaluati...
CLEF-IP: the Intellectual Property track at CLEF




  CLEF-IP is an evaluation track within the Cross Language
  Evaluati...
CLEF-IP: the Intellectual Property track at CLEF




  CLEF-IP is an evaluation track within the Cross Language
  Evaluati...
What is an evaluation track?

  An evaluation track in Information Retrieval is a cooperative action
  aimed at comparing ...
What is an evaluation track?

  An evaluation track in Information Retrieval is a cooperative action
  aimed at comparing ...
What is an evaluation track?

  An evaluation track in Information Retrieval is a cooperative action
  aimed at comparing ...
What is an evaluation track?

  An evaluation track in Information Retrieval is a cooperative action
  aimed at comparing ...
What is an evaluation track?

  An evaluation track in Information Retrieval is a cooperative action
  aimed at comparing ...
Clef–Ip 2009: the task


  The main task in the Clef–Ip track was to find prior art for a
  given patent.
Clef–Ip 2009: the task


  The main task in the Clef–Ip track was to find prior art for a
  given patent.



  Prior art se...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    1 Tech. Univ. Darmstadt, Dept. of CS,
      Ubiquitous Knowledge Processing Lab (DE)
    2 ...
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
   10 Humboldt Univ. -...
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
   10 Humboldt Univ. -...
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
   10 Humboldt Univ. -...
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
   10 Humboldt Univ. -...
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
   10 Humboldt Univ. -...
Participants - 2009 track

    9 Geneva Univ. Hospitals - Service of Medical
      Informatics (CH)
   10 Humboldt Univ. -...
Participants - 2009 track
Participants - 2009 track




                            15 participants
Participants - 2009 track




                            15 participants
                            48 experiments
     ...
Participants - 2009 track




                            15 participants
                            48 experiments
     ...
2009-2010: participants
2009-2010: evolution of the CLEF-IP track

   2009

   1 task: prior art search


   targeting granted patents

   15 part...
2009-2010: evolution of the CLEF-IP track

   2009

   1 task: prior art search


   targeting granted patents

   15 part...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search


   targetin...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
2009-2010: evolution of the CLEF-IP track

   2009                         2010

   1 task: prior art search     prior art...
What are relevance assessments



  A test collection (also known as gold standard) consists of a target
  dataset, a set ...
What are relevance assessments



  A test collection (also known as gold standard) consists of a target
  dataset, a set ...
What are relevance assessments



  A test collection (also known as gold standard) consists of a target
  dataset, a set ...
What are relevance assessments



  A test collection (also known as gold standard) consists of a target
  dataset, a set ...
What are relevance assessments



  A test collection (also known as gold standard) consists of a target
  dataset, a set ...
Relevance assessments


  We used patents cited as prior art as relevance assessments.
Relevance assessments


  We used patents cited as prior art as relevance assessments.


  Sources of citations:
Relevance assessments


  We used patents cited as prior art as relevance assessments.


  Sources of citations:
    1   a...
Relevance assessments


  We used patents cited as prior art as relevance assessments.


  Sources of citations:
    1   a...
Relevance assessments


  We used patents cited as prior art as relevance assessments.


  Sources of citations:
    1   a...
Extended citations as relevance assessments




  direct citations and their families
Extended citations as relevance assessments




  direct citations of family members ...
Extended citations as relevance assessments




  ... and their families
Patent families




  A patent family consists of patents granted by different patent
  authorities but related to the same...
Patent families




  A patent family consists of patents granted by different patent
  authorities but related to the same...
Patent families




  A patent family consists of patents granted by different patent
  authorities but related to the same...
Patent families




Patent documents are linked by
priorities
Patent families




Patent documents are linked by
                                 INPADOC family.
priorities
Patent families




Patent documents are linked by
                                 Clef–Ip uses simple families.
prioriti...
Relevance assessments 2010




  Expanding the 2009 extended citations:
Relevance assessments 2010




  Expanding the 2009 extended citations:
    1   include citations of forward citations ...
Relevance assessments 2010




  Expanding the 2009 extended citations:
    1   include citations of forward citations ......
Relevance assessments 2010




  Expanding the 2009 extended citations:
    1   include citations of forward citations ......
Relevance assessments 2010




  Expanding the 2009 extended citations:
    1   include citations of forward citations ......
How good are the CLEF-IP relevance assessments?




CLEF-IP uses families + citations:
How good are the CLEF-IP relevance assessments?


    how complete are extended
    citations as a relevance
    assessmen...
How good are the CLEF-IP relevance assessments?


    how complete are extended
    citations as a relevance
    assessmen...
How good are the CLEF-IP relevance assessments?


    how complete are extended
    citations as a relevance
    assessmen...
How good are the CLEF-IP relevance assessments?


    how complete are extended
    citations as a relevance
    assessmen...
Feedback from patent experts needed




       Quality of prior art candidate sets has to be assessed
Feedback from patent experts needed




          Know-how of patent search experts is needed
Feedback from patent experts needed




     at Clef–Ip 2009 7 patent search professionals assessed 12
     search results
Feedback from patent experts needed




     at Clef–Ip 2009 7 patent search professionals assessed 12
     search results...
Feedback from patent experts needed




     at Clef–Ip 2009 7 patent search professionals assessed 12
     search results...
Feedback from patent experts needed
Some initiatives associated with Clef–Ip




  The results of evaluation tracks are mostly useful for the research
  commu...
Some initiatives associated with Clef–Ip




  The results of evaluation tracks are mostly useful for the research
  commu...
Some initiatives associated with Clef–Ip




  The results of evaluation tracks are mostly useful for the research
  commu...
Soire
Soire




        developed at Matrixware
Soire




        developed at Matrixware
        service-oriented architecture - available as a a Web service
Soire




        developed at Matrixware
        service-oriented architecture - available as a a Web service
        all...
Soire




        developed at Matrixware
        service-oriented architecture - available as a a Web service
        all...
Soire




        developed at Matrixware
        service-oriented architecture - available as a a Web service
        all...
Spinque
Spinque




     a spin-off (2010) from CWI (the Dutch National Research
     Center in Computer Science and Mathematics)
Spinque




     a spin-off (2010) from CWI (the Dutch National Research
     Center in Computer Science and Mathematics)
 ...
Spinque




     a spin-off (2010) from CWI (the Dutch National Research
     Center in Computer Science and Mathematics)
 ...
Spinque




     a spin-off (2010) from CWI (the Dutch National Research
     Center in Computer Science and Mathematics)
 ...
Clef–Ip 2009 learnings



  The Humboldt University implemented a model for patent search
  that produced the best results.
Clef–Ip 2009 learnings



  The Humboldt University implemented a model for patent search
  that produced the best results...
Clef–Ip 2009 learnings



  The Humboldt University implemented a model for patent search
  that produced the best results...
Clef–Ip 2009 learnings



  The Humboldt University implemented a model for patent search
  that produced the best results...
Clef–Ip 2009 learnings



  The Humboldt University implemented a model for patent search
  that produced the best results...
Clef–Ip 2009 learnings



  The Humboldt University implemented a model for patent search
  that produced the best results...
Some additional investigations




  Some citations were hard to find
Some additional investigations




                                       % runs          class
                          ...
Some additional investigations


      We looked at the content of citations and citing patents.
Some additional investigations




                   Ongoing investigations.
Thank you for your attention.
Upcoming SlideShare
Loading in …5
×

Chances and Challenges in Comparing Cross-Language Retrieval Tools

540 views
476 views

Published on

Presentation at IRF symposium 2010
Vienna June 3, 2010

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
540
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Chances and Challenges in Comparing Cross-Language Retrieval Tools

  1. 1. Chances and Challenges in Comparing Cross-Language Retrieval Tools Giovanna Roda Vienna, Austria Irf Symposium 2010 / June 3, 2010
  2. 2. CLEF-IP: the Intellectual Property track at CLEF CLEF-IP is an evaluation track within the Cross Language Evaluation Forum (Clef). 1 1 http://www.clef-campaign.org
  3. 3. CLEF-IP: the Intellectual Property track at CLEF CLEF-IP is an evaluation track within the Cross Language Evaluation Forum (Clef). 1 organized by the IRF 1 http://www.clef-campaign.org
  4. 4. CLEF-IP: the Intellectual Property track at CLEF CLEF-IP is an evaluation track within the Cross Language Evaluation Forum (Clef). 1 organized by the IRF first track ran in 2009 1 http://www.clef-campaign.org
  5. 5. CLEF-IP: the Intellectual Property track at CLEF CLEF-IP is an evaluation track within the Cross Language Evaluation Forum (Clef). 1 organized by the IRF first track ran in 2009 running this year for the second time 1 http://www.clef-campaign.org
  6. 6. CLEF-IP: the Intellectual Property track at CLEF CLEF-IP is an evaluation track within the Cross Language Evaluation Forum (Clef). 1 organized by the IRF first track ran in 2009 running this year for the second time 1 http://www.clef-campaign.org
  7. 7. What is an evaluation track? An evaluation track in Information Retrieval is a cooperative action aimed at comparing different techniques on a common retrieval task.
  8. 8. What is an evaluation track? An evaluation track in Information Retrieval is a cooperative action aimed at comparing different techniques on a common retrieval task. produces experimental data that can be analyzed and used to improve existing systems
  9. 9. What is an evaluation track? An evaluation track in Information Retrieval is a cooperative action aimed at comparing different techniques on a common retrieval task. produces experimental data that can be analyzed and used to improve existing systems fosters exchange of ideas and cooperation
  10. 10. What is an evaluation track? An evaluation track in Information Retrieval is a cooperative action aimed at comparing different techniques on a common retrieval task. produces experimental data that can be analyzed and used to improve existing systems fosters exchange of ideas and cooperation produces a reusable test collection, sets milestones
  11. 11. What is an evaluation track? An evaluation track in Information Retrieval is a cooperative action aimed at comparing different techniques on a common retrieval task. produces experimental data that can be analyzed and used to improve existing systems fosters exchange of ideas and cooperation produces a reusable test collection, sets milestones Test collection A test collection consists traditionally of target data, a set of queries, and relevance assessments for each query.
  12. 12. Clef–Ip 2009: the task The main task in the Clef–Ip track was to find prior art for a given patent.
  13. 13. Clef–Ip 2009: the task The main task in the Clef–Ip track was to find prior art for a given patent. Prior art search Prior art search consists in identifying all information (including non-patent literature) that might be relevant to a patent’s claim of novelty.
  14. 14. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE)
  15. 15. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH)
  16. 16. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH) 3 Santiago de Compostela Univ. - Dept. Electronica y Computacion (ES)
  17. 17. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH) 3 Santiago de Compostela Univ. - Dept. Electronica y Computacion (ES) 4 University of Tampere - Info Studies (FI)
  18. 18. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH) 3 Santiago de Compostela Univ. - Dept. Electronica y Computacion (ES) 4 University of Tampere - Info Studies (FI) 5 Interactive Media and Swedish Institute of Computer Science (SE)
  19. 19. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH) 3 Santiago de Compostela Univ. - Dept. Electronica y Computacion (ES) 4 University of Tampere - Info Studies (FI) 5 Interactive Media and Swedish Institute of Computer Science (SE) 6 Geneva Univ. - Centre Universitaire d’Informatique (CH)
  20. 20. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH) 3 Santiago de Compostela Univ. - Dept. Electronica y Computacion (ES) 4 University of Tampere - Info Studies (FI) 5 Interactive Media and Swedish Institute of Computer Science (SE) 6 Geneva Univ. - Centre Universitaire d’Informatique (CH) 7 Glasgow Univ. - IR Group Keith (UK)
  21. 21. Participants - 2009 track 1 Tech. Univ. Darmstadt, Dept. of CS, Ubiquitous Knowledge Processing Lab (DE) 2 Univ. Neuchatel - Computer Science (CH) 3 Santiago de Compostela Univ. - Dept. Electronica y Computacion (ES) 4 University of Tampere - Info Studies (FI) 5 Interactive Media and Swedish Institute of Computer Science (SE) 6 Geneva Univ. - Centre Universitaire d’Informatique (CH) 7 Glasgow Univ. - IR Group Keith (UK) 8 Centrum Wiskunde & Informatica - Interactive Information Access (NL)
  22. 22. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH)
  23. 23. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH) 10 Humboldt Univ. - Dept. of German Language and Linguistics (DE)
  24. 24. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH) 10 Humboldt Univ. - Dept. of German Language and Linguistics (DE) 11 Dublin City Univ. - School of Computing (IE)
  25. 25. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH) 10 Humboldt Univ. - Dept. of German Language and Linguistics (DE) 11 Dublin City Univ. - School of Computing (IE) 12 Radboud Univ. Nijmegen - Centre for Language Studies & Speech Technologies (NL)
  26. 26. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH) 10 Humboldt Univ. - Dept. of German Language and Linguistics (DE) 11 Dublin City Univ. - School of Computing (IE) 12 Radboud Univ. Nijmegen - Centre for Language Studies & Speech Technologies (NL) 13 Hildesheim Univ. - Information Systems & Machine Learning Lab (DE)
  27. 27. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH) 10 Humboldt Univ. - Dept. of German Language and Linguistics (DE) 11 Dublin City Univ. - School of Computing (IE) 12 Radboud Univ. Nijmegen - Centre for Language Studies & Speech Technologies (NL) 13 Hildesheim Univ. - Information Systems & Machine Learning Lab (DE) 14 Technical Univ. Valencia - Natural Language Engineering (ES)
  28. 28. Participants - 2009 track 9 Geneva Univ. Hospitals - Service of Medical Informatics (CH) 10 Humboldt Univ. - Dept. of German Language and Linguistics (DE) 11 Dublin City Univ. - School of Computing (IE) 12 Radboud Univ. Nijmegen - Centre for Language Studies & Speech Technologies (NL) 13 Hildesheim Univ. - Information Systems & Machine Learning Lab (DE) 14 Technical Univ. Valencia - Natural Language Engineering (ES) 15 Al. I. Cuza University of Iasi - Natural Language Processing (RO)
  29. 29. Participants - 2009 track
  30. 30. Participants - 2009 track 15 participants
  31. 31. Participants - 2009 track 15 participants 48 experiments submitted for the main task
  32. 32. Participants - 2009 track 15 participants 48 experiments submitted for the main task 10 experiments submitted for the language tasks
  33. 33. 2009-2010: participants
  34. 34. 2009-2010: evolution of the CLEF-IP track 2009 1 task: prior art search targeting granted patents 15 participants all from academia families and citations manual assessments standard evaluation mea- sures
  35. 35. 2009-2010: evolution of the CLEF-IP track 2009 1 task: prior art search targeting granted patents 15 participants all from academia families and citations manual assessments standard evaluation mea- sures
  36. 36. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search targeting granted patents 15 participants all from academia families and citations manual assessments standard evaluation mea- sures
  37. 37. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents 15 participants all from academia families and citations manual assessments standard evaluation mea- sures
  38. 38. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents patent applications 15 participants all from academia families and citations manual assessments standard evaluation mea- sures
  39. 39. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents patent applications 15 participants 20 participants all from academia families and citations manual assessments standard evaluation mea- sures
  40. 40. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents patent applications 15 participants 20 participants all from academia 4 industrial participants families and citations manual assessments standard evaluation mea- sures
  41. 41. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents patent applications 15 participants 20 participants all from academia 4 industrial participants families and citations include forward citations manual assessments standard evaluation mea- sures
  42. 42. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents patent applications 15 participants 20 participants all from academia 4 industrial participants families and citations include forward citations manual assessments expanded lists of relevant docs standard evaluation mea- sures
  43. 43. 2009-2010: evolution of the CLEF-IP track 2009 2010 1 task: prior art search prior art candidate search and classification task targeting granted patents patent applications 15 participants 20 participants all from academia 4 industrial participants families and citations include forward citations manual assessments expanded lists of relevant docs standard evaluation mea- new measure: pres, more sures recall-oriented
  44. 44. What are relevance assessments A test collection (also known as gold standard) consists of a target dataset, a set of queries, and relevance assessments corresponding to each query.
  45. 45. What are relevance assessments A test collection (also known as gold standard) consists of a target dataset, a set of queries, and relevance assessments corresponding to each query. The CLEF-IP test collection:
  46. 46. What are relevance assessments A test collection (also known as gold standard) consists of a target dataset, a set of queries, and relevance assessments corresponding to each query. The CLEF-IP test collection: target data: 2 million EP patents
  47. 47. What are relevance assessments A test collection (also known as gold standard) consists of a target dataset, a set of queries, and relevance assessments corresponding to each query. The CLEF-IP test collection: target data: 2 million EP patents queries: full-text patents (without images)
  48. 48. What are relevance assessments A test collection (also known as gold standard) consists of a target dataset, a set of queries, and relevance assessments corresponding to each query. The CLEF-IP test collection: target data: 2 million EP patents queries: full-text patents (without images) relevance assessments: extended citations
  49. 49. Relevance assessments We used patents cited as prior art as relevance assessments.
  50. 50. Relevance assessments We used patents cited as prior art as relevance assessments. Sources of citations:
  51. 51. Relevance assessments We used patents cited as prior art as relevance assessments. Sources of citations: 1 applicant’s disclosure: the Uspto requires applicants to disclose all known relevant publications
  52. 52. Relevance assessments We used patents cited as prior art as relevance assessments. Sources of citations: 1 applicant’s disclosure: the Uspto requires applicants to disclose all known relevant publications 2 patent office search report: each patent office will do a search for prior art to judge the novelty of a patent
  53. 53. Relevance assessments We used patents cited as prior art as relevance assessments. Sources of citations: 1 applicant’s disclosure: the Uspto requires applicants to disclose all known relevant publications 2 patent office search report: each patent office will do a search for prior art to judge the novelty of a patent 3 opposition procedures: patents cited to prove that a granted patent is not novel
  54. 54. Extended citations as relevance assessments direct citations and their families
  55. 55. Extended citations as relevance assessments direct citations of family members ...
  56. 56. Extended citations as relevance assessments ... and their families
  57. 57. Patent families A patent family consists of patents granted by different patent authorities but related to the same invention.
  58. 58. Patent families A patent family consists of patents granted by different patent authorities but related to the same invention. simple family all family members share the same priority number
  59. 59. Patent families A patent family consists of patents granted by different patent authorities but related to the same invention. simple family all family members share the same priority number extended family there are several definitions, in the INPADOC database all documents which are directly or indirectly linked via a priority number belong to the same family
  60. 60. Patent families Patent documents are linked by priorities
  61. 61. Patent families Patent documents are linked by INPADOC family. priorities
  62. 62. Patent families Patent documents are linked by Clef–Ip uses simple families. priorities
  63. 63. Relevance assessments 2010 Expanding the 2009 extended citations:
  64. 64. Relevance assessments 2010 Expanding the 2009 extended citations: 1 include citations of forward citations ...
  65. 65. Relevance assessments 2010 Expanding the 2009 extended citations: 1 include citations of forward citations ... 2 ... and their families
  66. 66. Relevance assessments 2010 Expanding the 2009 extended citations: 1 include citations of forward citations ... 2 ... and their families This is apparently a well-known method among patent searchers.
  67. 67. Relevance assessments 2010 Expanding the 2009 extended citations: 1 include citations of forward citations ... 2 ... and their families This is apparently a well-known method among patent searchers. Zig-zag search?
  68. 68. How good are the CLEF-IP relevance assessments? CLEF-IP uses families + citations:
  69. 69. How good are the CLEF-IP relevance assessments? how complete are extended citations as a relevance assessments?
  70. 70. How good are the CLEF-IP relevance assessments? how complete are extended citations as a relevance assessments? will every prior art patent be included in this set?
  71. 71. How good are the CLEF-IP relevance assessments? how complete are extended citations as a relevance assessments? will every prior art patent be included in this set? and if not, what percentage of prior art items are captured by extended citations?
  72. 72. How good are the CLEF-IP relevance assessments? how complete are extended citations as a relevance assessments? will every prior art patent be included in this set? and if not, what percentage of prior art items are captured by extended citations? when considering forward citations, how good are extended citations as a prior art candidate set?
  73. 73. Feedback from patent experts needed Quality of prior art candidate sets has to be assessed
  74. 74. Feedback from patent experts needed Know-how of patent search experts is needed
  75. 75. Feedback from patent experts needed at Clef–Ip 2009 7 patent search professionals assessed 12 search results
  76. 76. Feedback from patent experts needed at Clef–Ip 2009 7 patent search professionals assessed 12 search results the task was not well defined and there were misunderstandings on the concept of relevance
  77. 77. Feedback from patent experts needed at Clef–Ip 2009 7 patent search professionals assessed 12 search results the task was not well defined and there were misunderstandings on the concept of relevance amount of data was not sufficient to draw conclusions
  78. 78. Feedback from patent experts needed
  79. 79. Some initiatives associated with Clef–Ip The results of evaluation tracks are mostly useful for the research community.
  80. 80. Some initiatives associated with Clef–Ip The results of evaluation tracks are mostly useful for the research community. This community often produces prototypes that are of little interest to the end-user.
  81. 81. Some initiatives associated with Clef–Ip The results of evaluation tracks are mostly useful for the research community. This community often produces prototypes that are of little interest to the end-user. Next I’d like to present two concrete outcomes - not of Clef–Ip directly but arising from work in patent retrieval evaluation
  82. 82. Soire
  83. 83. Soire developed at Matrixware
  84. 84. Soire developed at Matrixware service-oriented architecture - available as a a Web service
  85. 85. Soire developed at Matrixware service-oriented architecture - available as a a Web service allows to replicate IR experiments based on classical evaluation model
  86. 86. Soire developed at Matrixware service-oriented architecture - available as a a Web service allows to replicate IR experiments based on classical evaluation model tested on the CLEF-IP data
  87. 87. Soire developed at Matrixware service-oriented architecture - available as a a Web service allows to replicate IR experiments based on classical evaluation model tested on the CLEF-IP data customized for the evaluation of machine translation
  88. 88. Spinque
  89. 89. Spinque a spin-off (2010) from CWI (the Dutch National Research Center in Computer Science and Mathematics)
  90. 90. Spinque a spin-off (2010) from CWI (the Dutch National Research Center in Computer Science and Mathematics) introduces search-by-strategy
  91. 91. Spinque a spin-off (2010) from CWI (the Dutch National Research Center in Computer Science and Mathematics) introduces search-by-strategy provides optimized strategies for patent search - tested on CLEF-IP data
  92. 92. Spinque a spin-off (2010) from CWI (the Dutch National Research Center in Computer Science and Mathematics) introduces search-by-strategy provides optimized strategies for patent search - tested on CLEF-IP data transparency: understand your search results to improve strategy
  93. 93. Clef–Ip 2009 learnings The Humboldt University implemented a model for patent search that produced the best results.
  94. 94. Clef–Ip 2009 learnings The Humboldt University implemented a model for patent search that produced the best results. The model combined several strategies:
  95. 95. Clef–Ip 2009 learnings The Humboldt University implemented a model for patent search that produced the best results. The model combined several strategies: using metadata (IPC, ECLA)
  96. 96. Clef–Ip 2009 learnings The Humboldt University implemented a model for patent search that produced the best results. The model combined several strategies: using metadata (IPC, ECLA) indexes built at lemma level
  97. 97. Clef–Ip 2009 learnings The Humboldt University implemented a model for patent search that produced the best results. The model combined several strategies: using metadata (IPC, ECLA) indexes built at lemma level an additional phrase index for English
  98. 98. Clef–Ip 2009 learnings The Humboldt University implemented a model for patent search that produced the best results. The model combined several strategies: using metadata (IPC, ECLA) indexes built at lemma level an additional phrase index for English crosslingual concept index (multilingual terminological database)
  99. 99. Some additional investigations Some citations were hard to find
  100. 100. Some additional investigations % runs class ≤5 hard 5 < x ≤ 10 very difficult Some citations were hard to find 10 < x ≤ 50 difficult 50 < x ≤ 75 medium 75 < x ≤ 100 easy
  101. 101. Some additional investigations We looked at the content of citations and citing patents.
  102. 102. Some additional investigations Ongoing investigations.
  103. 103. Thank you for your attention.

×