Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
The EPO document collection:A technical treasure chest
Presentation of Georg Schiwi, Documentation Information Manager at the European Patent Office.
The EPO holds one of the largest digital repositories of public knowledge in the world. This vast store is accessed daily by thousands of users and its usage is constantly increasing. Each year about 40 Terabytes, the equivalent of 40 million books, are downloaded from the EPO search collection both by internal and external users. This figure is a perfect illustration of EPO‘s unique contribution to the knowledge economy. The presentation will give an overview on the patent and non-patent collection that is used by examiners for prior-art search. In a second part, the move from a paper documentation collection to an electronic one and the particular challenges in this process will be outlined.
368 views | comments | 0 favorites | 0 downloads | 2 embeds (Stats)
More Info
This slideshow is Public
Total Views: 368 on Slideshare: 359 from embeds: 9
Most viewed embeds (Top 5):
More
Slideshow Transcript
- Slide 1: The EPO document collection:
A technical treasure chest
Georg Schiwy 06 June 2008
Directorate Information Acquisition
- Slide 2: Patents are granted on objective grounds
Inventive Step / Novelty criteria
Based on the \"State of the Art\"
A patent office must have access to State of the Art information
to fulfil its mission.
For the EPO this means:
Collect and manage the best documentation for State of the Art
searches
Enrich both documents and collection
Easy access to the documentation Tools, products and services
Patents
The EPO document collection and documentation
- Slide 3: Documentation at EPO - Blessing or Curse ?
> 371 million records in 117 databases
> 78.8 million Patent and NPL facsimile
> 66 million unique Patent documents
> 57 million Patent abstracts
> 21.7 million full text Patent documents
> 3,5 million full text NPL documents
> 6,000 NPL titles
and growing daily....
Patents
The EPO document collection and documentation
- Slide 4: Patent acquisition approach
Bibliographic data:
extended bibliographic data
Full text: searchable full text,
one patent document in an
official language
Title: searchable
Drawing
Abstract: - original language
- English language
Image: facsimile
The EPO document collection
Documentation overview
- Slide 5: Managing and maintaining
our data
- Slide 6: Patent data quality requirements
Data
Quality
Timeliness Correctness Completeness
\"Global Patent Data Coverage\"
On Internet at the following address: http://www.epo.org/gpdc
The EPO document collection maintaining our data
Managing and
- Slide 7: The Quality of our data
Create and maintain a World Wide Patent database serving our users.
Receive data from 81 Offices world wide.
Process incoming data (Quality: validation, formatting, etc.)
In 2006, 4.2 million documents were added to the patent
bibliographic database.
About 30% of these had to be corrected!
The EPO document collection maintaining our data
Managing and
- Slide 8: EPO Non-Patent Literature (NPL) Resources
Conference
Journals
Proceedings
Databases of
Secondary publishers
INSPEC,COMPDX,BIOSIS,
MEDLINE,IHS...
Books, Thesis,
Technical reports, Standards
Monographs
Company Encyclopaedias,
Disclosures Dictionaries
The EPO document collection maintaining our data
Managing and
- Slide 9: Classification at the EPO
- Slide 10: Why do we need a classification system?
Patent Number US2001051944
1. A method of maintaining a database having a central database and
a plurality of individual partially replicated databases, wherein
updates made to the central database or to one of the individual,
partially replicated databases are selectively propagated to a
recipient partially replicated database if the owner of the recipient
partially replicated database has visibility to the data being
selectively transmitted, said method comprising:
(a) replicating a group of records as a single logical docking object,
which is composed of one or more physical database tables;
(b) applying a single set of visibility rules to the data content of the
entire logical docking unit; and
(c) propagating the docking object to the recipient individual partially
replicated database if the owner thereof has visibility to the data
being transmitted in the single logical docking object.
The EPO document collection
Classification
- Slide 11: Patent Classification
C C12 C12N15/82A
C12N C12N15
8 Sections Further
(A…H) divided in
Class 30 fields in
Subclas ECLA
s
Main group
71 000 subgroups
(IPC)
The EPO document collection
Classification
- Slide 12: Patent classification - Example
Parking space problem:
The EPO document collection
Classification
- Slide 13: Classification Systems
Library Classification
• Dewey Decimal Classification 1 000 classes
Patent Classification
• International Patent Classification (IPC) 71 000 classes
(WIPO, 177 members)
• EPO system: ECLA 136 000 classes
• JPO system: F-terms 180 000 classes
• USPTO system: US Class 130 000 classes
The EPO document collection
Classification
- Slide 14: The Patent Granting
Workbench: Tools at the EPO
- Slide 15: Access and Tools: The Patent Granting Workbench
Yesterday
The EPO document The patent
collection granting workbench
- Slide 16: Access and Tools: The Patent Granting Workbench
Today: SEA
The EPO document The patent
collection granting workbench
- Slide 17: Access and Tools: The Patent Granting Workbench
Today: Chemical formulas
The EPO document The patent
collection granting workbench
- Slide 18: Access and Tools: The Patent Granting Workbench
Today: Sequence data capture
Sequence data capture
The EPO document The patent
collection granting workbench
- Slide 19: Access and Tools: The Patent Granting Workbench
Today: Early OCR
Early OCR
OCR Quality Pre-
Input Storage Output classification
Conversion Control
EPOQUE
Structure
New d text
scanned XML /
applications PDF Examination
Machine
translation
The EPO document The patent
collection granting workbench
- Slide 20: Access and Tools: The Patent Granting Workbench
Today: Machine Translation
German - English - French - Spanish
The EPO document The patent
collection granting workbench
- Slide 21: Access and Tools: The Patent Granting Workbench
Future ?
• Chemical drawings → analysing & recognizing formula
• Handling of chemical formula & sequence data
• Flow chart searching
• Image similarity search
• Searching scientific units → numerical values extraction
• Ranking of search results (customized sorting)
• Capturing of queries
• Synonyms
• Targeted routing of applications
• Global machine translation
The EPO document The patent
collection granting workbench
- Slide 22: Added value: \"intelligent\" patent document collection
Text summariser
Chemical formulae
Numerical Values
extraction Sequences
Citations
Machine Translation Full Text
Flow Chart
searches Facsimile Images
targeted routing Classification
Original abstract Abstract EN
Original title Title EN
Bibliographic data
The EPO document The patent
collection granting workbench
- Slide 23: The Paperless Project
- Slide 24: How to get from A to B?
A B
The EPO document collection
The Paperless project
- Slide 25: Objectives
To increase the quality of the search by improving
completeness and quality of the EPO databases
To offer Office-wide the same search collection
To reduce paper handlings and consequently enable
more digital workflows and reduce costs
The EPO document collection
The Paperless project
- Slide 26: History - Paper groups
The documentation available to the examiners used to be
accessible only on paper.
Paper documentation was organised in paper groups according
to the classification information given by examiners/classifiers
to the documents.
Paper groups contained folders for
patent publications and
non-patent literature (NPL)
Paper groups sorted by country and publication number.
Only one family member was in the paper group.
The EPO document collection
The Paperless project
- Slide 27: The challenges
Cost efficient operation (avoid duplication of work).
Convince the users.
Knowing the gaps: The Missing List...
Improve paper group quality before scanning...
Ensure quality of scanning operation.
The EPO document collection
The Paperless project
- Slide 28: Scanning problems
Documents format
Quality of the paper (glossy, brittle, etc.)
Documents with a black background
Photos
Documents stapled together
Document with handwritten comments
The EPO document collection
The Paperless project
- Slide 29: Project progress: 22 million documents in 6 years
25000000
20000000
15000000
Requested
Destroyed
10000000
5000000
0
01-01-2004
01-07-2006
01-01-2002
01-07-2002
01-01-2003
01-07-2003
01-07-2004
01-01-2005
01-07-2005
01-01-2006
01-01-2007
01-07-2007
01-01-2008
The EPO document collection
The Paperless project
- Slide 30: Status - Individual scanned documents
Scanned documents (2005-2008*)
*until 31.05.2008
100000
80000
60000
Patents
40000 Literature
20000
0
2005 2006 2007 2008
The EPO document collection
The Paperless project
- Slide 31: Status: Main project finished End of 2007
• In-file documents: 22,1 Mio documents (100%)
14700 columns of 1500 documents, almost 1000 tons paper
=> 3700 m2, about 19 tennis courts
The EPO document collection
The Paperless project
- Slide 32: The Procedure - Knowing the gaps
Generate a list of patent documents present in our main database
EPODOC but without facsimile image: The \"Missing List\".
Keep all patents documents with handwritten family information for
future correction.
Documents not marked: Check for facsimile availability in BNS and
for completeness of the red underlined class in classification tool
DocTool.
All documents appearing on the missing list AND those without a
marking that are not yet in facsimile available will be scanned and
entered into BNS, our facsimile repository.
Adapted procedures for patents and non-patent literature.
The EPO document collection
The Paperless project
- Slide 33: Corrections - Example: GB
should be GB191222653
The EPO document collection
The Paperless project
- Slide 34: Summary
Once a classification code has been treated Paperless:
Enables the completion of the databases.
No information or document is lost.
Documentation is made available to all sites and examiners.
Externally patents are made accessible through Esp@cenet.
The EPO document collection
The Paperless project
- Slide 35: \"The EPO is promoting a knowledge-based society
in Europe as one of the world’s leading providers of
technical information.\"
Thank you for your attention!
Any questions?
Georg Schiwy
Information Acquisition
gschiwy@epo.org