Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

II-SDV 2015, 20 - 21 April, in Nice

1,078 views

Published on

Published in: Internet
  • Be the first to comment

  • Be the first to like this

II-SDV 2015, 20 - 21 April, in Nice

  1. 1. Future Challenges in (automated) Patent Search Alexander G. Klenner-Bajaja, PhD. aklenner@epo.org
  2. 2. Why Search – European Patent Convention 2 Information Management`s Task: Support Search
  3. 3. Introduction – What do we want, where are we? 3
  4. 4. The current Search System  A boolean search system, documents are returned as sets  Search is dominated by meta-data search as well as keywords 4 Search Space boolean query
  5. 5. The current Search System  A Lucene elastic search based system, documents are returned as ranked lists (pilot – fully available but no extensive training)  Moving away from a meta-data dominated search...? 5 Search Space k Lucene query 1
  6. 6. Patent Gold Standards  We have “manually” curated search reports for about 40 million simple patent families  The relevant documents are mentioned in the search report as either –X(I,N),A,Y,... documents 6 median: 5 citations in search reports
  7. 7. Citation temporal distribution  50% of all citations are younger than 10 years (2005-now); 80% of all citations are younger than 20 years; only 5% of citations are older than 1974. 7
  8. 8. Setting up a benchmarking environment  We need to move away from anecdotal evidence to statistically meaningful facts  TAPAS 8 SEARCH INDEX Applications Method 1 Method 2 MAP:0.4 MAP:0.2 Patent Corpus 1 2 3 4 * Exploiting real queries
  9. 9. Setting up a prototyping environment - KNIME 9 1 2 3 4 1 1 1 1 2 2 2 3 3 3 1
  10. 10. Evaluating the results 10
  11. 11. Graph Databases are valid tools - if we have a good starting document (seed) 11
  12. 12. Graph Databases are valid tools - if we have a good starting document (seed) 12
  13. 13. Graph Databases are valid tools - if we have a good starting document (seed) 13
  14. 14. Graph Databases are valid tools - if we have a good starting document (seed) 14 Again Meta-Data based!
  15. 15. But where do we start with an incoming patent application? 15 ? Patent Application
  16. 16. This has been implemented during the last 1-3 years, but  Literature suggest that we are sealed with our parameter optimization strategies applying classic IR methods  We ignore the huge NPL part of the citations  The problem becomes worse every day (~3000 applications per week) 16
  17. 17. A searcher tries to work around “meaning” by:  Proximity Queries simulate or approximate “meaning”  Assumption: certain distances transport more meaning than others (e.g. 3w or p) .  We want to ask “Give me all documents that are relevant with regards to treatment of migraine pain with Aspirin”  But we actually ask “Migraine AND Pain AND Aspirin” or many variants of that.  Classification is a very strong aid, representing a meaningful relation <belongs to> 17
  18. 18. What does search actually mean?  Claim 1: A composition comprising a combination of paracetamol and aspirin for use in the treatment a migraine pain in a human subject.  Claim 2: A composition according to claim 1 where the composition further comprises caffeine 18
  19. 19. A Knowledge Map of Claims 1 & 2  Claim 1: A method for treating migraine pain comprising administering to a human subject a composition comprising a combination of paracetamol and aspirin.  Claim 2: A method according to claim 1 where the composition further comprises caffeine 19
  20. 20. What is the Δ of Prior Art and the Application? Δ 20
  21. 21. We use meta-data knowledge maps with simple relations already 21
  22. 22. Moving towards real knowledge maps  Normalized Annotations are one step towards semantic search connecting mentions in patents with normalized entities  Good coverage for biomedical domains  Lack of good terminologies for everything else 22
  23. 23. Approaches do exist 23
  24. 24. Patents have multi-modal information content: Images  Images – Chemical Formulas – Flow Diagrams – Circuits – Technical Drawings 24
  25. 25. Google Image Search 25
  26. 26. Image Search 26 Search Space Query State of the art Image processing Filtering and Visualisation
  27. 27. Image Search using S&K prototype 27
  28. 28. Modelling Search – which direction do we go? 28 PA X Is modelling the Examiner the best choice?
  29. 29. Enrichment and Annotations Natural Language Processing Topic ModellingInformation Extraction Knowledge Bases Visualisation Techniques Workflow Management Information Retrieval Modelling the Search Process Knowledge Organisation Systems Technologies that can guide us 29
  30. 30. Future Search Ecosystem bringing together many technologies • Captured Domain Knowledge allows to merge and get relevant third party documents/results • „Machine“ Understanding of Application allows for „Auto-Query“ generation • IR System retrieves relevant documents from query • Enrichment allows „semantic“ search • Examiner is „Search Pilot“ 30
  31. 31. Thank you for your attention aklenner@epo.org 31

×