II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (Katrin Tomanek and Philipp Daumke - Averbis, Germany)

827 views

Published on

Published in: Software, Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
827
On SlideShare
0
From Embeds
0
Number of Embeds
284
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (Katrin Tomanek and Philipp Daumke - Averbis, Germany)

  1. 1. Dr. Philipp Daumke Analyze Text, Gain Answers
  2. 2. ABOUT AVERBIS Founded: 2007 Location: Freiburg im Breisgau Team: Domain- & IT-Experts Focus: Leverage structured & unstructured information Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
  3. 3. PORTFOLIO PRODUCTS: CORE TECHNOLOGIES:
  4. 4. CHALLENGE Exponential growth of data • need for data-driven decisions • limited human resources for analysis New analytics tools needed for • Semantic search and discovery • Competitor analysis • Identification of market trends • IP landscaping • Portfolio analysis • … Patent applications: Medline articles:
  5. 5. (Semi-)Automate patent categorization with high precision Learning system imitates the behavior of IP professionals Semantic search Search for meanings, not just keywords
  6. 6. PATENT ANALYTICS
  7. 7. PATENT ANALYTICS Terminologies Text Mining Rules Text Mining Machine Learning Patent Collection
  8. 8. TERMINOLOGY MANAGEMENT Define the ‚semantic space‘ of your technology fields • Keywords • Categories • Hierarchies • …. Include relevant word lists from your company • Products • Devices • Companies • Components • Indications • … Reuse already existing terminologies on the market
  9. 9. TEXT MINING Lung metastasis lung metastasis lung metastases metastases in the lung metastases in the lower lobe of the lung pulmonal metastates pulmonal relapse of a metastasis pulmonal filia pulmonal filiae lung filiae lower lobe filiae
  10. 10. TEXT MINING tumors tumour cancer carcinoma lymphoma endometrioma astrocytoma glioblastoma seminoma ALL leukemia
  11. 11. TEXT MINING
  12. 12. PATENT CLASSIFICATION – MACHINE LEARNING System learns how to fine-classify patents Observes and imitates human decision making Advantages • No explicit externalization of knowledge needed • No rule-writing • Better results • System generalizes (higher recall) • Statistical model can handle „noise“ better than rules • Ambiguity and textual variations better handled
  13. 13. THE PROCESS OF MACHINE LEARNING Labeling • Up to 100 categories • ~10-50 patents per category • Hierarchical categories • Multi-labeling Learning • Learn characteristic patterns in labeled data • Lots of different classification algorithms Prediction & Review • Automatically map new patents to categories • Confidence value for each category • Different selection criteria 14
  14. 14. POWERFUL FRONTEND Linguistic full text search Lingustic Filters Patent Summary Additional info, e.g. picture Multilabel Classification
  15. 15. USE CASE1: LARGE-SCALE PATENT LANDSCAPING • Goal: to semi-automatically categorize patents to the company‘s technology landscape • Technology Landscape: 35 Classes (8 main classes, 27 sub- classes) • 7.000 patents, 10 competitors • Evaluation – between automated judgement with expert judgement – between two expert judgements (Interrator-Agreement)
  16. 16. USE CASE1: LARGE-SCALE PATENT LANDSCAPING
  17. 17. CONFUSION MATRIX
  18. 18. USE CASE1: LARGE-SCALE PATENT LANDSCAPING Results Accuracy Time Savings Automated, Scenario I 85% 70% Automated, Scenario II 82% 80% Manual (2 expert judges) 80% Averbis Patent Analytics save up to 80% of time with accuracy being on par with manual judges!
  19. 19. USE CASE2: RESEARCH LITERATURE RELEVANCY • Goal: to automatically identify company‘s relevant literature • Rule set: – Mentionings of company‘s indications, products, etc. – Competitor products and indications – „Testosterone, but only given externally“ – „Products shall not be found in an enumeration“ – …
  20. 20. PATENT ANALYTICS Rule Set Text Mining, Machine Learning Search, Analysis Medline, Embase
  21. 21. VERAPAMIL
  22. 22. USE CASE2: RESEARCH LITERATURE RELEVANCY Rule: Testosterone, but only given externally
  23. 23. USE CASE2: RESEARCH LITERATURE RELEVANCY Rule: Ignore products listed in enumerations
  24. 24. USE CASE 3: SOCIAL MEDIA ANALYTICS
  25. 25. USE CASE 3: SOCIAL MEDIA ANALYTICS
  26. 26. USE CASE 3: SOCIAL MEDIA ANALYTICS Main Challenge: what is positive, what is negative? – „Could somebody please remove the dead bird from the balcony“? – „From the breadcrumbs lying under the bed one could live for ages“ – „The hotel is situated in the crowdiest party district of the town“ – „The toilets were that big that I couldn‘t sit down for …“
  27. 27. USE CASE4: PATIENT RECRUITMENT/ DIAGNOSIS SUPPORT Disease Profiles Inclusion/Exclusion Criteria Categorization Visualization Electronic Health Records
  28. 28. USE CASE4: PATIENT RECRUITMENT/ DIAGNOSIS SUPPORT
  29. 29. USE CASE4: PATIENT RECRUITMENT/ DIAGNOSIS SUPPORT
  30. 30. For further questions, please contact Dr. Philipp Daumke philipp.daumke@averbis.com +49 761 - 203 9769 0

×