Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
 
Post to Twitter Post to Twitter
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons
SlideShare is now available on LinkedIn. Add it to your LinkedIn profile.

The EPO document collection: A technical treasure chest

From stgo, 5 months ago Add as contact

Presentation of Georg Schiwi, Documentation Information Manager at the European Patent Office.
The EPO holds one of the largest digital repositories of public knowledge in the world. This vast store is accessed daily by thousands of users and its usage is constantly increasing. Each year about 40 Terabytes, the equivalent of 40 million books, are downloaded from the EPO search collection both by internal and external users. This figure is a perfect illustration of EPO‘s unique contribution to the knowledge economy. The presentation will give an overview on the patent and non-patent collection that is used by examiners for prior-art search. In a second part, the move from a paper documentation collection to an electronic one and the particular challenges in this process will be outlined.

368 views | 0 comments | 0 favorites | 0 downloads | 2 embeds (Stats)

Categories

Groups/Events

Embed in your blog options close
Embed (wordpress.com) Exclude related slideshows Embed in your blog

More Info

This slideshow is Public
Total Views: 368 on Slideshare: 359 from embeds: 9
Most viewed embeds (Top 5): More
All Embeds: Less
Flagged as inappropriate Flag as inappropriate

Flag as inappropriate

Select your reason for flagging this slideshow as inappropriate.

If needed, use the feedback form to let us know more details.

Slideshow Transcript

  1. Slide 1: The EPO document collection: A technical treasure chest Georg Schiwy 06 June 2008 Directorate Information Acquisition
  2. Slide 2: Patents are granted on objective grounds Inventive Step / Novelty criteria Based on the \"State of the Art\"  A patent office must have access to State of the Art information to fulfil its mission. For the EPO this means:  Collect and manage the best documentation for State of the Art searches  Enrich both documents and collection  Easy access to the documentation  Tools, products and services Patents  The EPO document collection and documentation
  3. Slide 3: Documentation at EPO - Blessing or Curse ? > 371 million records in 117 databases > 78.8 million Patent and NPL facsimile > 66 million unique Patent documents > 57 million Patent abstracts > 21.7 million full text Patent documents > 3,5 million full text NPL documents > 6,000 NPL titles and growing daily.... Patents  The EPO document collection and documentation
  4. Slide 4: Patent acquisition approach Bibliographic data: extended bibliographic data Full text: searchable full text, one patent document in an official language Title: searchable Drawing Abstract: - original language - English language Image: facsimile The EPO document collection  Documentation overview
  5. Slide 5: Managing and maintaining our data
  6. Slide 6: Patent data quality requirements Data Quality Timeliness Correctness Completeness \"Global Patent Data Coverage\" On Internet at the following address: http://www.epo.org/gpdc The EPO document collection maintaining our data Managing and
  7. Slide 7: The Quality of our data Create and maintain a World Wide Patent database serving our users.  Receive data from 81 Offices world wide.  Process incoming data (Quality: validation, formatting, etc.)  In 2006, 4.2 million documents were added to the patent bibliographic database.  About 30% of these had to be corrected! The EPO document collection maintaining our data Managing and
  8. Slide 8: EPO Non-Patent Literature (NPL) Resources Conference Journals Proceedings Databases of Secondary publishers INSPEC,COMPDX,BIOSIS, MEDLINE,IHS... Books, Thesis, Technical reports, Standards Monographs Company Encyclopaedias, Disclosures Dictionaries The EPO document collection maintaining our data Managing and
  9. Slide 9: Classification at the EPO
  10. Slide 10: Why do we need a classification system? Patent Number US2001051944 1. A method of maintaining a database having a central database and a plurality of individual partially replicated databases, wherein updates made to the central database or to one of the individual, partially replicated databases are selectively propagated to a recipient partially replicated database if the owner of the recipient partially replicated database has visibility to the data being selectively transmitted, said method comprising: (a) replicating a group of records as a single logical docking object, which is composed of one or more physical database tables; (b) applying a single set of visibility rules to the data content of the entire logical docking unit; and (c) propagating the docking object to the recipient individual partially replicated database if the owner thereof has visibility to the data being transmitted in the single logical docking object. The EPO document collection  Classification
  11. Slide 11: Patent Classification C C12 C12N15/82A C12N C12N15 8 Sections Further (A…H) divided in Class 30 fields in Subclas ECLA s Main group 71 000 subgroups (IPC) The EPO document collection  Classification
  12. Slide 12: Patent classification - Example Parking space problem: The EPO document collection  Classification
  13. Slide 13: Classification Systems Library Classification • Dewey Decimal Classification 1 000 classes Patent Classification • International Patent Classification (IPC) 71 000 classes (WIPO, 177 members) • EPO system: ECLA 136 000 classes • JPO system: F-terms 180 000 classes • USPTO system: US Class 130 000 classes The EPO document collection  Classification
  14. Slide 14: The Patent Granting Workbench: Tools at the EPO 
  15. Slide 15: Access and Tools: The Patent Granting Workbench Yesterday The EPO document The patent collection granting workbench
  16. Slide 16: Access and Tools: The Patent Granting Workbench Today: SEA The EPO document The patent collection granting workbench
  17. Slide 17: Access and Tools: The Patent Granting Workbench Today: Chemical formulas The EPO document The patent collection granting workbench
  18. Slide 18: Access and Tools: The Patent Granting Workbench Today: Sequence data capture Sequence data capture The EPO document The patent collection granting workbench
  19. Slide 19: Access and Tools: The Patent Granting Workbench Today: Early OCR Early OCR OCR Quality Pre- Input Storage Output classification Conversion Control EPOQUE Structure New d text scanned XML / applications PDF Examination Machine translation The EPO document The patent collection granting workbench
  20. Slide 20: Access and Tools: The Patent Granting Workbench Today: Machine Translation German - English - French - Spanish The EPO document The patent collection granting workbench
  21. Slide 21: Access and Tools: The Patent Granting Workbench Future ? • Chemical drawings → analysing & recognizing formula • Handling of chemical formula & sequence data • Flow chart searching • Image similarity search • Searching scientific units → numerical values extraction • Ranking of search results (customized sorting) • Capturing of queries • Synonyms • Targeted routing of applications • Global machine translation The EPO document The patent collection granting workbench
  22. Slide 22: Added value: \"intelligent\" patent document collection Text summariser Chemical formulae Numerical Values extraction Sequences Citations Machine Translation Full Text Flow Chart searches Facsimile Images targeted routing Classification Original abstract Abstract EN Original title Title EN Bibliographic data The EPO document The patent collection granting workbench
  23. Slide 23: The Paperless Project
  24. Slide 24: How to get from A to B? A B The EPO document collection  The Paperless project
  25. Slide 25: Objectives  To increase the quality of the search by improving completeness and quality of the EPO databases  To offer Office-wide the same search collection  To reduce paper handlings and consequently enable more digital workflows and reduce costs The EPO document collection  The Paperless project
  26. Slide 26: History - Paper groups  The documentation available to the examiners used to be accessible only on paper.  Paper documentation was organised in paper groups according to the classification information given by examiners/classifiers to the documents.  Paper groups contained folders for  patent publications and  non-patent literature (NPL)  Paper groups sorted by country and publication number.  Only one family member was in the paper group. The EPO document collection  The Paperless project
  27. Slide 27: The challenges  Cost efficient operation (avoid duplication of work).  Convince the users.  Knowing the gaps: The Missing List...  Improve paper group quality before scanning...  Ensure quality of scanning operation. The EPO document collection  The Paperless project
  28. Slide 28: Scanning problems  Documents format  Quality of the paper (glossy, brittle, etc.)  Documents with a black background  Photos  Documents stapled together  Document with handwritten comments The EPO document collection  The Paperless project
  29. Slide 29: Project progress: 22 million documents in 6 years 25000000 20000000 15000000 Requested Destroyed 10000000 5000000 0 01-01-2004 01-07-2006 01-01-2002 01-07-2002 01-01-2003 01-07-2003 01-07-2004 01-01-2005 01-07-2005 01-01-2006 01-01-2007 01-07-2007 01-01-2008 The EPO document collection  The Paperless project
  30. Slide 30: Status - Individual scanned documents Scanned documents (2005-2008*) *until 31.05.2008 100000 80000 60000 Patents 40000 Literature 20000 0 2005 2006 2007 2008 The EPO document collection  The Paperless project
  31. Slide 31: Status: Main project finished End of 2007 • In-file documents: 22,1 Mio documents (100%) 14700 columns of 1500 documents, almost 1000 tons paper => 3700 m2, about 19 tennis courts The EPO document collection  The Paperless project
  32. Slide 32: The Procedure - Knowing the gaps  Generate a list of patent documents present in our main database EPODOC but without facsimile image: The \"Missing List\".  Keep all patents documents with handwritten family information for future correction.  Documents not marked: Check for facsimile availability in BNS and for completeness of the red underlined class in classification tool DocTool.  All documents appearing on the missing list AND those without a marking that are not yet in facsimile available will be scanned and entered into BNS, our facsimile repository.  Adapted procedures for patents and non-patent literature. The EPO document collection  The Paperless project
  33. Slide 33: Corrections - Example: GB should be GB191222653 The EPO document collection  The Paperless project
  34. Slide 34: Summary Once a classification code has been treated Paperless:  Enables the completion of the databases.  No information or document is lost.  Documentation is made available to all sites and examiners.  Externally patents are made accessible through Esp@cenet. The EPO document collection  The Paperless project
  35. Slide 35: \"The EPO is promoting a knowledge-based society in Europe as one of the world’s leading providers of technical information.\" Thank you for your attention! Any questions? Georg Schiwy Information Acquisition gschiwy@epo.org