The EPO document collection: A technical treasure chest Georg Schiwy Directorate Information Acquisition 06 June 2008
Patents are granted on objective grounds <ul><li>A patent office  must have access to State of the Art information  to ful...
Documentation at EPO - Blessing or Curse ? > 371 million records in 117 databases > 78.8 million Patent and NPL facsimile ...
Patent acquisition approach Bibliographic data:   extended bibliographic data Full text:   searchable full text, one paten...
Managing and maintaining our data
Patent data quality requirements &quot;Global Patent Data Coverage&quot; On Internet at the following address:  http://www...
The  Quality  of our data <ul><li>Receive data from 81 Offices world wide. </li></ul><ul><li>Process incoming data (Qualit...
EPO Non-Patent Literature (NPL) Resources Databases of   Secondary publishers INSPEC,COMPDX,BIOSIS, MEDLINE,IHS... The EPO...
Classification at the EPO   
Why do we need a classification system? <ul><li>Patent Number US2001051944 </li></ul><ul><li>  </li></ul><ul><li>1.  A met...
Patent Classification C12N15/82 A Further divided in 30 fields in ECLA C C12 C12N C12N15 71 000 subgroups (IPC)   8 Sectio...
Patent classification - Example The EPO document collection  Classification Parking space problem:
Classification Systems <ul><li>Library Classification </li></ul><ul><li>Dewey Decimal Classification   1 000 classes </li>...
The Patent Granting Workbench: Tools at the EPO 
Access and Tools: The Patent Granting Workbench Yesterday The EPO document collection  The patent granting workbench
Access and Tools: The Patent Granting Workbench Today: SEA The EPO document collection  The patent granting workbench
Access and Tools: The Patent Granting Workbench Today: Chemical formulas The EPO document collection  The patent granting...
Sequence data capture Access and Tools: The Patent Granting Workbench Today: Sequence data capture The EPO document collec...
Access and Tools: The Patent Granting Workbench Early OCR Input Output New scanned applications Structured text XML / PDF ...
Access and Tools: The Patent Granting Workbench Today: Machine Translation German - English - French - Spanish The EPO doc...
Access and Tools: The Patent Granting Workbench <ul><li>Chemical drawings  ->  analysing & recognizing formula </li></ul><...
Added value: &quot;intelligent&quot; patent document collection Bibliographic data Original abstract  Facsimile Images Ful...
The Paperless Project
How to get from A to B? The EPO document collection  The Paperless project A B
Objectives <ul><ul><li>To increase the quality of the search by improving completeness and quality of the EPO databases </...
History - Paper groups <ul><li>The documentation available to the examiners used to be accessible only on paper. </li></ul...
The challenges <ul><li>Cost efficient operation (avoid duplication of work). </li></ul><ul><li>Convince the users. </li></...
Scanning problems <ul><li>Documents  format </li></ul><ul><li>Quality  of the paper (glossy, brittle, etc.) </li></ul><ul>...
Project progress: 22 million documents in 6 years The EPO document collection  The Paperless project
Status - Individual scanned documents The EPO document collection  The Paperless project
Status: Main project finished End of 2007 <ul><li>In-file documents: 22,1 Mio documents (100%) </li></ul><ul><li>14700 col...
<ul><ul><li>Generate a list of patent documents present in our main database EPODOC but without facsimile image: The &quot...
Corrections - Example: GB The EPO document collection  The Paperless project should be  GB191222653
<ul><li>Once a classification code has been treated Paperless: </li></ul><ul><li>Enables the completion of the databases. ...
&quot;The EPO is promoting a knowledge-based society in Europe as one of the world’s leading providers of technical inform...
Upcoming SlideShare
Loading in...5
×

The EPO document collection: A technical treasure chest

1,807

Published on

Presentation of Georg Schiwi, Documentation Information Manager at the European Patent Office.
The EPO holds one of the largest digital repositories of public knowledge in the world. This vast store is accessed daily by thousands of users and its usage is constantly increasing. Each year about 40 Terabytes, the equivalent of 40 million books, are downloaded from the EPO search collection both by internal and external users. This figure is a perfect illustration of EPO‘s unique contribution to the knowledge economy. The presentation will give an overview on the patent and non-patent collection that is used by examiners for prior-art search. In a second part, the move from a paper documentation collection to an electronic one and the particular challenges in this process will be outlined.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,807
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Transcript of "The EPO document collection: A technical treasure chest"

    1. 1. The EPO document collection: A technical treasure chest Georg Schiwy Directorate Information Acquisition 06 June 2008
    2. 2. Patents are granted on objective grounds <ul><li>A patent office must have access to State of the Art information to fulfil its mission. </li></ul><ul><li>For the EPO this means: </li></ul><ul><li>Collect and manage the best documentation for State of the Art searches </li></ul><ul><li>Enrich both documents and collection </li></ul><ul><li>Easy access to the documentation  Tools , products and services </li></ul>The EPO document collection  Patents and documentation Inventive Step / Novelty criteria Based on the &quot;State of the Art&quot;
    3. 3. Documentation at EPO - Blessing or Curse ? > 371 million records in 117 databases > 78.8 million Patent and NPL facsimile > 66 million unique Patent documents > 57 million Patent abstracts > 21.7 million full text Patent documents > 3,5 million full text NPL documents > 6,000 NPL titles and growing daily.... The EPO document collection  Patents and documentation
    4. 4. Patent acquisition approach Bibliographic data: extended bibliographic data Full text: searchable full text, one patent document in an official language Title: searchable Drawing Abstract: - original language - English language Image: facsimile The EPO document collection  Documentation overview
    5. 5. Managing and maintaining our data
    6. 6. Patent data quality requirements &quot;Global Patent Data Coverage&quot; On Internet at the following address: http://www.epo.org/gpdc The EPO document collection  Managing and maintaining our data Data Quality Timeliness Correctness Completeness
    7. 7. The Quality of our data <ul><li>Receive data from 81 Offices world wide. </li></ul><ul><li>Process incoming data (Quality: validation, formatting, etc.) </li></ul><ul><li>In 2006, 4.2 million documents were added to the patent bibliographic database. </li></ul><ul><li>About 30% of these had to be corrected! </li></ul>The EPO document collection  Managing and maintaining our data Create and maintain a World Wide Patent database serving our users.
    8. 8. EPO Non-Patent Literature (NPL) Resources Databases of Secondary publishers INSPEC,COMPDX,BIOSIS, MEDLINE,IHS... The EPO document collection  Managing and maintaining our data Standards Books, Thesis, Technical reports, Monographs Journals Conference Proceedings Company Disclosures Encyclopaedias, Dictionaries
    9. 9. Classification at the EPO 
    10. 10. Why do we need a classification system? <ul><li>Patent Number US2001051944 </li></ul><ul><li>  </li></ul><ul><li>1. A method of maintaining a database having a central database and a plurality of individual partially replicated databases, wherein updates made to the central database or to one of the individual, partially replicated databases are selectively propagated to a recipient partially replicated database if the owner of the recipient partially replicated database has visibility to the data being selectively transmitted, said method comprising: </li></ul><ul><li>(a) replicating a group of records as a single logical docking object, which is composed of one or more physical database tables; </li></ul><ul><li>(b) applying a single set of visibility rules to the data content of the entire logical docking unit; and </li></ul><ul><li>(c) propagating the docking object to the recipient individual partially replicated database if the owner thereof has visibility to the data being transmitted in the single logical docking object. </li></ul>The EPO document collection  Classification
    11. 11. Patent Classification C12N15/82 A Further divided in 30 fields in ECLA C C12 C12N C12N15 71 000 subgroups (IPC) 8 Sections (A…H) Class Subclass Main group The EPO document collection  Classification
    12. 12. Patent classification - Example The EPO document collection  Classification Parking space problem:
    13. 13. Classification Systems <ul><li>Library Classification </li></ul><ul><li>Dewey Decimal Classification 1 000 classes </li></ul><ul><li>Patent Classification </li></ul><ul><li>International Patent Classification (IPC) 71 000 classes </li></ul><ul><li>(WIPO, 177 members) </li></ul><ul><li>EPO system: ECLA 136 000 classes </li></ul><ul><li>JPO system: F-terms 180 000 classes </li></ul><ul><li>USPTO system: US Class 130 000 classes </li></ul>The EPO document collection  Classification
    14. 14. The Patent Granting Workbench: Tools at the EPO 
    15. 15. Access and Tools: The Patent Granting Workbench Yesterday The EPO document collection  The patent granting workbench
    16. 16. Access and Tools: The Patent Granting Workbench Today: SEA The EPO document collection  The patent granting workbench
    17. 17. Access and Tools: The Patent Granting Workbench Today: Chemical formulas The EPO document collection  The patent granting workbench
    18. 18. Sequence data capture Access and Tools: The Patent Granting Workbench Today: Sequence data capture The EPO document collection  The patent granting workbench
    19. 19. Access and Tools: The Patent Granting Workbench Early OCR Input Output New scanned applications Structured text XML / PDF EPOQUE Machine translation Today: Early OCR Pre- classification Examination The EPO document collection  The patent granting workbench OCR Conversion Quality Control Storage
    20. 20. Access and Tools: The Patent Granting Workbench Today: Machine Translation German - English - French - Spanish The EPO document collection  The patent granting workbench
    21. 21. Access and Tools: The Patent Granting Workbench <ul><li>Chemical drawings -> analysing & recognizing formula </li></ul><ul><li>Handling of chemical formula & sequence data </li></ul><ul><li>Flow chart searching </li></ul><ul><li>Image similarity search </li></ul><ul><li>Searching scientific units -> numerical values extraction </li></ul><ul><li>Ranking of search results (customized sorting) </li></ul><ul><li>Capturing of queries </li></ul><ul><li>Synonyms </li></ul><ul><li>Targeted routing of applications </li></ul><ul><li>Global machine translation </li></ul>Future ? The EPO document collection  The patent granting workbench
    22. 22. Added value: &quot;intelligent&quot; patent document collection Bibliographic data Original abstract Facsimile Images Full Text Original title Classification Abstract EN Title EN Citations Sequences Numerical Values extraction Machine Translation Text summariser Chemical formulae Flow Chart searches targeted routing The EPO document collection  The patent granting workbench
    23. 23. The Paperless Project
    24. 24. How to get from A to B? The EPO document collection  The Paperless project A B
    25. 25. Objectives <ul><ul><li>To increase the quality of the search by improving completeness and quality of the EPO databases </li></ul></ul><ul><ul><li>To offer Office-wide the same search collection </li></ul></ul><ul><ul><li>To reduce paper handlings and consequently enable more digital workflows and reduce costs </li></ul></ul>The EPO document collection  The Paperless project
    26. 26. History - Paper groups <ul><li>The documentation available to the examiners used to be accessible only on paper. </li></ul><ul><li>Paper documentation was organised in paper groups according to the classification information given by examiners/classifiers to the documents. </li></ul><ul><li>Paper groups contained folders for </li></ul><ul><ul><li>patent publications and </li></ul></ul><ul><ul><li>non-patent literature (NPL) </li></ul></ul><ul><li>Paper groups sorted by country and publication number. </li></ul><ul><li>Only one family member was in the paper group. </li></ul>The EPO document collection  The Paperless project
    27. 27. The challenges <ul><li>Cost efficient operation (avoid duplication of work). </li></ul><ul><li>Convince the users. </li></ul><ul><li>Knowing the gaps: The Missing List ... </li></ul><ul><li>Improve paper group quality before scanning... </li></ul><ul><li>Ensure quality of scanning operation. </li></ul>The EPO document collection  The Paperless project
    28. 28. Scanning problems <ul><li>Documents format </li></ul><ul><li>Quality of the paper (glossy, brittle, etc.) </li></ul><ul><li>Documents with a black background </li></ul><ul><li>Photos </li></ul><ul><li>Documents stapled together </li></ul><ul><li>Document with handwritten comments </li></ul>The EPO document collection  The Paperless project
    29. 29. Project progress: 22 million documents in 6 years The EPO document collection  The Paperless project
    30. 30. Status - Individual scanned documents The EPO document collection  The Paperless project
    31. 31. Status: Main project finished End of 2007 <ul><li>In-file documents: 22,1 Mio documents (100%) </li></ul><ul><li>14700 columns of 1500 documents, almost 1000 tons paper </li></ul><ul><li>=> 3700 m2, about 19 tennis courts </li></ul>The EPO document collection  The Paperless project
    32. 32. <ul><ul><li>Generate a list of patent documents present in our main database EPODOC but without facsimile image: The &quot; Missing List &quot;. </li></ul></ul><ul><ul><li>Keep all patents documents with handwritten family information for future correction. </li></ul></ul><ul><ul><li>Documents not marked: Check for facsimile availability in BNS and for completeness of the red underlined class in classification tool DocTool. </li></ul></ul><ul><ul><li>All documents appearing on the missing list AND those without a marking that are not yet in facsimile available will be scanned and entered into BNS, our facsimile repository. </li></ul></ul><ul><ul><li>Adapted procedures for patents and non-patent literature. </li></ul></ul>The Procedure - Knowing the gaps The EPO document collection  The Paperless project
    33. 33. Corrections - Example: GB The EPO document collection  The Paperless project should be GB191222653
    34. 34. <ul><li>Once a classification code has been treated Paperless: </li></ul><ul><li>Enables the completion of the databases. </li></ul><ul><li>No information or document is lost. </li></ul><ul><li>Documentation is made available to all sites and examiners. </li></ul><ul><li>Externally patents are made accessible through Esp@cenet. </li></ul>Summary The EPO document collection  The Paperless project
    35. 35. &quot;The EPO is promoting a knowledge-based society in Europe as one of the world’s leading providers of technical information.&quot; Thank you for your attention! Any questions? Georg Schiwy Information Acquisition [email_address]

    ×