SlideShare a Scribd company logo
1 of 34
Download to read offline
David Baehrens
Large-Scale Patent Classification
at the European Patent Office
ABOUT AVERBIS
Founded: 2007
Location: Freiburg im Breisgau
Team: Domain & IT-Experts
Focus: Leverage structured & unstructured information
Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
PORTFOLIO
Solutions
Libraries PharmaPatentsHealthcare Social Media
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Automotive
TERMINOLOGY MANAGEMENT
Terminology management
software
Provision of terminologies
Mappings between
terminologies
Building terminology-based
applications
Synonyms: dimethyl sulfoxide, dimethylsulfoxide, Domoso, Infiltrina
Hierarchies: cancer, carcinoma, melanoma, lymphoma, glioblastoma…
Patterns: dates, citations, mail addresses…
Rule-based extraction of all different kinds of complex information
Persons, Locations, Genes, ….
Coocurrences, Typed Relations, e.g. Genes / Diseases / Modification Type
TEXT MINING
Term Detection
Regular
Expressions
Rule Engine
Named Entities
Relations
Sentences, Tokens, POS-Tags, Chunks, Paragraphs, Sections, Stemming, Decompounding…Syntax Detection
RULE ENGINE
1. NAME OF THE MEDICINAL PRODUCT
Desloratadine ratiopharm 5 mg film-coated tablets
Primary Field Name Secondary Field Name Field Value
MedicalProductName coveredText Desloratadine ratiopharm 5 mg film-coated tablets
inventedPartName DESLORATADINE
strengthPart 5 mg
pharmaceuticalDoseFormPart FILM-COATED TABLET
TextRegelErgebnis
SEARCH & NOSQL
Free text + concept based
search
Text mining integration
Guided navigation / facets
NoSQL functionalities
Multi- & cross lingual search
Related documents
Based on Apache Solr
• Extended Query Syntax
• JSON-API
• Scalability
…
DOCUMENT CLASSIFICATION
Hotel Reviews
Patents
SEARCH & NOSQL
INFORMATION DISCOVERY
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Delivery / Deployment / Runtime Environment
Integration Tests / Continuous Integration
Extensive Documentation
Common Architecture / Application Design
User & Role Management, Security
Communication Bus
Project Management
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
2) Re-Classification on
published patents, if category system changes
ABOUT EPO
• The European Patent Office (EPO)
grants European patents for the
Contracting States to the European
Patent Convention
• Second largest intergovernmental
institution in Europe
• Not an EU institution
• Self-financing, i.e. revenue
from fees covers operating
and capital expenditure
NUMBER OF STAFF
Status: December 2008
PATENT APPLICATIONS
http://www.epo.org/about-us/annual-reports-statistics/annual-report/2014.html
COOPERATIVE PATENT CLASSIFICATION
• Patent Classification System based on ECLA / IPC
• jointly developed by the European Patent Office (EPO)
and the United States Patent and Trademark Office
(USPTO)
• used by both the EPO and USPTO since 1 January 2013
• currently contains about 250.000 classes
EXAMPLE CPC CLASS
GRANTED PATENT
EARLY PATENT
EARLY PATENT
EARLY PATENT
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
Our Motivation:
• Great Classification Use-Case
– Big Data (80 Mio. patents available)
– Large Scale Category System >250.000 CPC codes
– Tough classification quality and response time
constraints
• Text Mining Success Story
OLD CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
CLASSIFICATION COMPLEXITY
~250.000
CPC Codes
~1.500
Ranges
250
Departments
CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
NEW CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
SOME FACTS
• about 650k training documents from 2005-2013
• supervised learning: light-weight and fast linear support
vector machine
• Training time (16 Cores, 128 GB RAM)
– Feature Extraction: ~1 hour
– Training of Classifiers: ~1 hour
– 90/10 tests with a look-a-head of 3 levels
and reporting 3 best candidates: ~1 hour
• Prediction: 5 docs in 5 sec
HIERARCHICAL CLASSIFICATION
STATUS & OUTLOOK
 Range-specific quality
evaluation
 Going live with best
ranges
• Continuous optimization
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Re-Classification on
published patents, if category system changes
Challenges and Facts:
– 250.000 CPC codes, regular changes/refinements
– Several re-classification projects at any one time, great
variation in size, a class is split into 5-20(?) subclasses
– No training material available
NEW RE-CLASSIFICATION PROCESS
Training Data
• Human Annotator starts labeling about 20% of
the documents with new subclasses
Statistical Models
• are generated on-the-fly, and
• Cross-validation test are carried out
Threshold
• If cross-validation achieves certain threshold
(e.g. 90%), the remaining documents are
classified fully automatically without further
review
• Otherwise, more training data is being generated
STATUS & OUTLOOK
 Currently in evaluation
phase
• Going live in the next
weeks
…NOT ONLY PATENTS
Solutions
Libraries PharmaPatentsHealthcare Social Media
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Automotive
For further questions, please contact:
David Baehrens
 + 49 (0)761 203 97690
 info@averbis.com

More Related Content

Viewers also liked

(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of Vienna(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of ViennaSemantic Web Company
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataSemantic Web Company
 
Florian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platformsFlorian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platformsSemantic Web Company
 
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...Semantic Web Company
 
BigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data TechnologiesBigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data TechnologiesSemantic Web Company
 
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Semantic Web Company
 
SKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSemantic Web Company
 
Lieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean wayLieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean waySemantic Web Company
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Semantic Web Company
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Semantic Web Company
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图 第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图 qtsharp
 

Viewers also liked (20)

(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of Vienna(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of Vienna
 
The Healthdirect Australia Story
The Healthdirect Australia StoryThe Healthdirect Australia Story
The Healthdirect Australia Story
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked Data
 
SKOS - An Overview
SKOS - An OverviewSKOS - An Overview
SKOS - An Overview
 
Florian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platformsFlorian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platforms
 
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
 
BigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data TechnologiesBigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data Technologies
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
 
SKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data Strategies
 
Lieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean wayLieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean way
 
ODINE - Open Data Incubator Europe
ODINE - Open Data Incubator EuropeODINE - Open Data Incubator Europe
ODINE - Open Data Incubator Europe
 
SKOS - Some Use Cases
SKOS - Some Use CasesSKOS - Some Use Cases
SKOS - Some Use Cases
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图 第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图
 

Similar to Large-Scale Patent Classification at the European Patent Office

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology RegistriesMarcia Zeng
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...Dr. Haxel Consult
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceSusanna-Assunta Sansone
 
Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nick Jones
 
Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersPatrick Nicolas
 
Standards: awareness, information, education
Standards: awareness, information, educationStandards: awareness, information, education
Standards: awareness, information, educationSusanna-Assunta Sansone
 
de theory and practice of digital preservation
de theory and practice of digital preservationde theory and practice of digital preservation
de theory and practice of digital preservationFIAT/IFTA
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectCosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectIntland Software GmbH
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
TSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdoolyTSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdoolyzdooly
 

Similar to Large-Scale Patent Classification at the European Patent Office (20)

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology Registries
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2
 
PoolParty Platform 2013
PoolParty Platform 2013PoolParty Platform 2013
PoolParty Platform 2013
 
Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
 
Standards: awareness, information, education
Standards: awareness, information, educationStandards: awareness, information, education
Standards: awareness, information, education
 
Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014
 
agriopenlink - summary
agriopenlink  - summary agriopenlink  - summary
agriopenlink - summary
 
de theory and practice of digital preservation
de theory and practice of digital preservationde theory and practice of digital preservation
de theory and practice of digital preservation
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectCosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
TSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdoolyTSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdooly
 

More from Semantic Web Company

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...Semantic Web Company
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textSemantic Web Company
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataSemantic Web Company
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringSemantic Web Company
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningSemantic Web Company
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsSemantic Web Company
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantic Web Company
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderSemantic Web Company
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)Semantic Web Company
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataSemantic Web Company
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5Semantic Web Company
 

More from Semantic Web Company (20)

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from text
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured Data
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge Engineering
 
Semantic AI
Semantic AISemantic AI
Semantic AI
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
PoolParty Semantic Classifier
PoolParty Semantic ClassifierPoolParty Semantic Classifier
PoolParty Semantic Classifier
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine Learning
 
Taxonomies put in the right place
Taxonomies put in the right placeTaxonomies put in the right place
Taxonomies put in the right place
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive Computing
 
Structured Content Meets Taxonomy
Structured Content Meets TaxonomyStructured Content Meets Taxonomy
Structured Content Meets Taxonomy
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic Ladder
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Taxonomy Quality Assessment
Taxonomy Quality AssessmentTaxonomy Quality Assessment
Taxonomy Quality Assessment
 
Taxonomy-Driven UX
Taxonomy-Driven UXTaxonomy-Driven UX
Taxonomy-Driven UX
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5
 

Recently uploaded

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 

Recently uploaded (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 

Large-Scale Patent Classification at the European Patent Office

  • 1. David Baehrens Large-Scale Patent Classification at the European Patent Office
  • 2. ABOUT AVERBIS Founded: 2007 Location: Freiburg im Breisgau Team: Domain & IT-Experts Focus: Leverage structured & unstructured information Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
  • 3. PORTFOLIO Solutions Libraries PharmaPatentsHealthcare Social Media Terminology Management Text Mining Search & Analytics NoSQL Categorization & Clustering Automotive
  • 4. TERMINOLOGY MANAGEMENT Terminology management software Provision of terminologies Mappings between terminologies Building terminology-based applications
  • 5. Synonyms: dimethyl sulfoxide, dimethylsulfoxide, Domoso, Infiltrina Hierarchies: cancer, carcinoma, melanoma, lymphoma, glioblastoma… Patterns: dates, citations, mail addresses… Rule-based extraction of all different kinds of complex information Persons, Locations, Genes, …. Coocurrences, Typed Relations, e.g. Genes / Diseases / Modification Type TEXT MINING Term Detection Regular Expressions Rule Engine Named Entities Relations Sentences, Tokens, POS-Tags, Chunks, Paragraphs, Sections, Stemming, Decompounding…Syntax Detection
  • 6. RULE ENGINE 1. NAME OF THE MEDICINAL PRODUCT Desloratadine ratiopharm 5 mg film-coated tablets Primary Field Name Secondary Field Name Field Value MedicalProductName coveredText Desloratadine ratiopharm 5 mg film-coated tablets inventedPartName DESLORATADINE strengthPart 5 mg pharmaceuticalDoseFormPart FILM-COATED TABLET TextRegelErgebnis
  • 7. SEARCH & NOSQL Free text + concept based search Text mining integration Guided navigation / facets NoSQL functionalities Multi- & cross lingual search Related documents Based on Apache Solr • Extended Query Syntax • JSON-API • Scalability …
  • 10. INFORMATION DISCOVERY Terminology Management Text Mining Search & Analytics NoSQL Categorization & Clustering Delivery / Deployment / Runtime Environment Integration Tests / Continuous Integration Extensive Documentation Common Architecture / Application Design User & Role Management, Security Communication Bus Project Management
  • 11. PATENT CLASSIFICATION AT EPO Tender No. 1585 1) Pre-Classification of unpublished patents into departments 2) Re-Classification on published patents, if category system changes
  • 12. ABOUT EPO • The European Patent Office (EPO) grants European patents for the Contracting States to the European Patent Convention • Second largest intergovernmental institution in Europe • Not an EU institution • Self-financing, i.e. revenue from fees covers operating and capital expenditure
  • 13. NUMBER OF STAFF Status: December 2008
  • 16. COOPERATIVE PATENT CLASSIFICATION • Patent Classification System based on ECLA / IPC • jointly developed by the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO) • used by both the EPO and USPTO since 1 January 2013 • currently contains about 250.000 classes
  • 22. PATENT CLASSIFICATION AT EPO Tender No. 1585 1) Pre-Classification of unpublished patents into departments Our Motivation: • Great Classification Use-Case – Big Data (80 Mio. patents available) – Large Scale Category System >250.000 CPC codes – Tough classification quality and response time constraints • Text Mining Success Story
  • 23. OLD CLASSIFICATION PROCESS PATENTS CLA SSIFICATION DEPARTMENTS
  • 25. CLASSIFICATION PROCESS PATENTS CLA SSIFICATION DEPARTMENTS
  • 26. NEW CLASSIFICATION PROCESS PATENTS CLA SSIFICATION DEPARTMENTS
  • 27. SOME FACTS • about 650k training documents from 2005-2013 • supervised learning: light-weight and fast linear support vector machine • Training time (16 Cores, 128 GB RAM) – Feature Extraction: ~1 hour – Training of Classifiers: ~1 hour – 90/10 tests with a look-a-head of 3 levels and reporting 3 best candidates: ~1 hour • Prediction: 5 docs in 5 sec
  • 29. STATUS & OUTLOOK  Range-specific quality evaluation  Going live with best ranges • Continuous optimization
  • 30. PATENT CLASSIFICATION AT EPO Tender No. 1585 1) Re-Classification on published patents, if category system changes Challenges and Facts: – 250.000 CPC codes, regular changes/refinements – Several re-classification projects at any one time, great variation in size, a class is split into 5-20(?) subclasses – No training material available
  • 31. NEW RE-CLASSIFICATION PROCESS Training Data • Human Annotator starts labeling about 20% of the documents with new subclasses Statistical Models • are generated on-the-fly, and • Cross-validation test are carried out Threshold • If cross-validation achieves certain threshold (e.g. 90%), the remaining documents are classified fully automatically without further review • Otherwise, more training data is being generated
  • 32. STATUS & OUTLOOK  Currently in evaluation phase • Going live in the next weeks
  • 33. …NOT ONLY PATENTS Solutions Libraries PharmaPatentsHealthcare Social Media Terminology Management Text Mining Search & Analytics NoSQL Categorization & Clustering Automotive
  • 34. For further questions, please contact: David Baehrens  + 49 (0)761 203 97690  info@averbis.com