• Save
The EPO document collection:A technical treasure chest
Upcoming SlideShare
Loading in...5
×
 

The EPO document collection: A technical treasure chest

on

  • 2,788 views

Presentation of Georg Schiwi, Documentation Information Manager at the European Patent Office. ...

Presentation of Georg Schiwi, Documentation Information Manager at the European Patent Office.
The EPO holds one of the largest digital repositories of public knowledge in the world. This vast store is accessed daily by thousands of users and its usage is constantly increasing. Each year about 40 Terabytes, the equivalent of 40 million books, are downloaded from the EPO search collection both by internal and external users. This figure is a perfect illustration of EPO‘s unique contribution to the knowledge economy. The presentation will give an overview on the patent and non-patent collection that is used by examiners for prior-art search. In a second part, the move from a paper documentation collection to an electronic one and the particular challenges in this process will be outlined.

Statistics

Views

Total Views
2,788
Views on SlideShare
2,772
Embed Views
16

Actions

Likes
2
Downloads
0
Comments
0

6 Embeds 16

http://www.stgo.nl 8
http://www.slideshare.net 3
http://www.slideee.com 2
http://www.stichting-go.nl 1
http://goopleidingen.nl 1
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The EPO document collection:A technical treasure chest The EPO document collection: A technical treasure chest Presentation Transcript

  • The EPO document collection: A technical treasure chest Georg Schiwy Directorate Information Acquisition 06 June 2008
  • Patents are granted on objective grounds
    • A patent office must have access to State of the Art information to fulfil its mission.
    • For the EPO this means:
    • Collect and manage the best documentation for State of the Art searches
    • Enrich both documents and collection
    • Easy access to the documentation  Tools , products and services
    The EPO document collection  Patents and documentation Inventive Step / Novelty criteria Based on the "State of the Art"
  • Documentation at EPO - Blessing or Curse ? > 371 million records in 117 databases > 78.8 million Patent and NPL facsimile > 66 million unique Patent documents > 57 million Patent abstracts > 21.7 million full text Patent documents > 3,5 million full text NPL documents > 6,000 NPL titles and growing daily.... The EPO document collection  Patents and documentation
  • Patent acquisition approach Bibliographic data: extended bibliographic data Full text: searchable full text, one patent document in an official language Title: searchable Drawing Abstract: - original language - English language Image: facsimile The EPO document collection  Documentation overview
  • Managing and maintaining our data
  • Patent data quality requirements "Global Patent Data Coverage" On Internet at the following address: http://www.epo.org/gpdc The EPO document collection  Managing and maintaining our data Data Quality Timeliness Correctness Completeness
  • The Quality of our data
    • Receive data from 81 Offices world wide.
    • Process incoming data (Quality: validation, formatting, etc.)
    • In 2006, 4.2 million documents were added to the patent bibliographic database.
    • About 30% of these had to be corrected!
    The EPO document collection  Managing and maintaining our data Create and maintain a World Wide Patent database serving our users.
  • EPO Non-Patent Literature (NPL) Resources Databases of Secondary publishers INSPEC,COMPDX,BIOSIS, MEDLINE,IHS... The EPO document collection  Managing and maintaining our data Standards Books, Thesis, Technical reports, Monographs Journals Conference Proceedings Company Disclosures Encyclopaedias, Dictionaries
  • Classification at the EPO 
  • Why do we need a classification system?
    • Patent Number US2001051944
    •  
    • 1. A method of maintaining a database having a central database and a plurality of individual partially replicated databases, wherein updates made to the central database or to one of the individual, partially replicated databases are selectively propagated to a recipient partially replicated database if the owner of the recipient partially replicated database has visibility to the data being selectively transmitted, said method comprising:
    • (a) replicating a group of records as a single logical docking object, which is composed of one or more physical database tables;
    • (b) applying a single set of visibility rules to the data content of the entire logical docking unit; and
    • (c) propagating the docking object to the recipient individual partially replicated database if the owner thereof has visibility to the data being transmitted in the single logical docking object.
    The EPO document collection  Classification
  • Patent Classification C12N15/82 A Further divided in 30 fields in ECLA C C12 C12N C12N15 71 000 subgroups (IPC) 8 Sections (A…H) Class Subclass Main group The EPO document collection  Classification
  • Patent classification - Example The EPO document collection  Classification Parking space problem:
  • Classification Systems
    • Library Classification
    • Dewey Decimal Classification 1 000 classes
    • Patent Classification
    • International Patent Classification (IPC) 71 000 classes
    • (WIPO, 177 members)
    • EPO system: ECLA 136 000 classes
    • JPO system: F-terms 180 000 classes
    • USPTO system: US Class 130 000 classes
    The EPO document collection  Classification
  • The Patent Granting Workbench: Tools at the EPO 
  • Access and Tools: The Patent Granting Workbench Yesterday The EPO document collection  The patent granting workbench
  • Access and Tools: The Patent Granting Workbench Today: SEA The EPO document collection  The patent granting workbench
  • Access and Tools: The Patent Granting Workbench Today: Chemical formulas The EPO document collection  The patent granting workbench
  • Sequence data capture Access and Tools: The Patent Granting Workbench Today: Sequence data capture The EPO document collection  The patent granting workbench
  • Access and Tools: The Patent Granting Workbench Early OCR Input Output New scanned applications Structured text XML / PDF EPOQUE Machine translation Today: Early OCR Pre- classification Examination The EPO document collection  The patent granting workbench OCR Conversion Quality Control Storage
  • Access and Tools: The Patent Granting Workbench Today: Machine Translation German - English - French - Spanish The EPO document collection  The patent granting workbench
  • Access and Tools: The Patent Granting Workbench
    • Chemical drawings -> analysing & recognizing formula
    • Handling of chemical formula & sequence data
    • Flow chart searching
    • Image similarity search
    • Searching scientific units -> numerical values extraction
    • Ranking of search results (customized sorting)
    • Capturing of queries
    • Synonyms
    • Targeted routing of applications
    • Global machine translation
    Future ? The EPO document collection  The patent granting workbench
  • Added value: "intelligent" patent document collection Bibliographic data Original abstract Facsimile Images Full Text Original title Classification Abstract EN Title EN Citations Sequences Numerical Values extraction Machine Translation Text summariser Chemical formulae Flow Chart searches targeted routing The EPO document collection  The patent granting workbench
  • The Paperless Project
  • How to get from A to B? The EPO document collection  The Paperless project A B
  • Objectives
      • To increase the quality of the search by improving completeness and quality of the EPO databases
      • To offer Office-wide the same search collection
      • To reduce paper handlings and consequently enable more digital workflows and reduce costs
    The EPO document collection  The Paperless project
  • History - Paper groups
    • The documentation available to the examiners used to be accessible only on paper.
    • Paper documentation was organised in paper groups according to the classification information given by examiners/classifiers to the documents.
    • Paper groups contained folders for
      • patent publications and
      • non-patent literature (NPL)
    • Paper groups sorted by country and publication number.
    • Only one family member was in the paper group.
    The EPO document collection  The Paperless project
  • The challenges
    • Cost efficient operation (avoid duplication of work).
    • Convince the users.
    • Knowing the gaps: The Missing List ...
    • Improve paper group quality before scanning...
    • Ensure quality of scanning operation.
    The EPO document collection  The Paperless project
  • Scanning problems
    • Documents format
    • Quality of the paper (glossy, brittle, etc.)
    • Documents with a black background
    • Photos
    • Documents stapled together
    • Document with handwritten comments
    The EPO document collection  The Paperless project
  • Project progress: 22 million documents in 6 years The EPO document collection  The Paperless project
  • Status - Individual scanned documents The EPO document collection  The Paperless project
  • Status: Main project finished End of 2007
    • In-file documents: 22,1 Mio documents (100%)
    • 14700 columns of 1500 documents, almost 1000 tons paper
    • => 3700 m2, about 19 tennis courts
    The EPO document collection  The Paperless project
      • Generate a list of patent documents present in our main database EPODOC but without facsimile image: The " Missing List ".
      • Keep all patents documents with handwritten family information for future correction.
      • Documents not marked: Check for facsimile availability in BNS and for completeness of the red underlined class in classification tool DocTool.
      • All documents appearing on the missing list AND those without a marking that are not yet in facsimile available will be scanned and entered into BNS, our facsimile repository.
      • Adapted procedures for patents and non-patent literature.
    The Procedure - Knowing the gaps The EPO document collection  The Paperless project
  • Corrections - Example: GB The EPO document collection  The Paperless project should be GB191222653
    • Once a classification code has been treated Paperless:
    • Enables the completion of the databases.
    • No information or document is lost.
    • Documentation is made available to all sites and examiners.
    • Externally patents are made accessible through Esp@cenet.
    Summary The EPO document collection  The Paperless project
  • "The EPO is promoting a knowledge-based society in Europe as one of the world’s leading providers of technical information." Thank you for your attention! Any questions? Georg Schiwy Information Acquisition [email_address]