Successfully reported this slideshow.

Querying Patent Data for Empirical Scholarship : Tools and Strategies


Published on

Published in: Technology, Economy & Finance
  • Be the first to comment

  • Be the first to like this

Querying Patent Data for Empirical Scholarship : Tools and Strategies

  1. 1. ©Jon R. CavicchiProfessor of Research & IP LibrarianIP Professor BootcampOn Golden Pond 2013
  2. 2. Plot competitors productstrategies, as well as ways to"patent-block" themGain patent-protected entry intolucrative but hotly contestedmarketsAcquire exclusive rights toemerging market-leadingtechnologiesIncrease R&D effectiveness andavoid infringement minefieldsDetect possible infringers, as wellas likely sources of licensingincome
  3. 3. • National, regional PCT patent documents• Bibliographic data from patent data (+?)• Prosecution history• Post issuance activity• Not included but ripe– Dockets, reported cases, verdicts– License & royalty data, security interests & otherpatent transaction data
  4. 4. • Evergreening and Drug Patents: Bark or Bite– Bhaven Sampat, Columbia University, Mailman School of Public Health• Do fixed patent terms distort innovation?Evidence from cancer clinical trials– Heidi Williams, MIT Department of Economics• From PI to IP: Yet Another Unexpected Effect of Tort Reform– John Golden, University of Texas School of Law• Rush to Judgment? Trial Length and Outcomes in Patent Cases– Mark Lemley, Stanford Law School• The Direct Costs from NPE Disputes– Michael Meurer, Boston University School of Law• Poisoning the Next Apple? How the America Invents Act HarmsInventors– David Abrams, University of Pennsylvania Law School
  5. 5. Bronwyn H. Hall is Professor in the Graduate School at the University ofCalifornia at Berkeley and Professor of Economics of Technology andInnovation at the University of Maastricht, Netherlands. She is a ResearchAssociate of the National Bureau of Economic Research and the Institutefor Fiscal Studies, London. She is also the founder and partner of TSPInternational, an econometric software firm. She received a B.A. inphysics from Wellesley College in 1966 and a Ph.D. in economics fromStanford University in 1988.
  6. 6. Research Challenges• Literature is highly interdisciplinary anddispersed• Comprehensive searching challenges– Many do not use the term empirical in the title– Legal and non-legal indexing not developed tocapture empirical scholarship– Classifying requires human intervention
  7. 7. • Raw data using statistical software– STATA, SAS, Excel & other database applications• Open web platforms– National & regional offices, EPO, WIPO…• Proprietary patent platforms– Thomson Innovation, Lexis Total Patent & many others• Sources of existing statistical data
  8. 8. GTP for patent data experts inbusiness setting
  9. 9. • [TA Program addresses] access to quality patent data in terms of comprehensive and up-to date data, i.e. not just a notification in a Gazette but full publication of all parts ofapplications and granted patents, which is indeed often a deficient situation indeveloping countries. [email from Lutz Mailänder, Head Patent InformationSection, WIPO Global IP Infrastructure Sector 5/31/13]• WIPO’s technical assistance program for Industrial Property Offices falls within Strategic Goal IV - Coordination andDevelopment of Global IP Infrastructure.• The program aims to assist offices of all sizes and from all regions to participate effectively in the global IP system.The activities range from the provision of software systems for administration of IP rights to the setting up ofplatforms to facilitate exchange of data and information related to IP rights between regional and internationalgroups of offices.• Stakeholders of IP Offices (applicants, agents, researchers, local industry, policy makers, etc) are increasinglydemanding online services such as search systems, online registries and online filing systems.– WIPO responds to this need by assisting IP offices with the digitization of their IP records and with preparingdata for online publication and for electronic data exchange. WIPO also provides the Patentscope searchservice through which offices can provide high-quality online patent search to local and international users.
  10. 10. Committee on Development and Intellectual Property (CDIP)• Eleventh Session Geneva, May 13 to 17, 2013– Establishment of National Patent Register Databases– In some 40 countries access to legal status information is mostly sufficient– availability of legal status data of some 50 countries is limited, since many of them do not havethe legal status data in digital form and national on-line registers– availability of the data does not necessarily mean that there is an easy access to data for theidentification of inventions available in the public domain.– The availability of licensing information is limited in most countries.– reliability of data needs to be improved, e.g. by increasing the frequency of updates andsynchronizing their publication…– laborious processing of EPO INPADOC incurs delays of availability of the data that varies from 2days to 3 months depending on the primary source.– reliability of such data is greatly influenced by the correctness of the raw data obtained fromthe primary sources, their completeness and their publication frequency.– WIPO PATENTSCOPE information is provided only on a voluntary basis from selected PCTMember States and with varying regularity since there is no obligation to provide suchinformation to WIPO.
  11. 11. • Raw data using statistical software– STATA, SAS, Excel & other database applications• Open web platforms– National & regional offices, EPO, WIPO…• Proprietary patent platforms– Thomson Innovation, Lexis Total Patent & manyothers
  12. 12. • Consider working with interdisciplinarycolleagues– Economists– Statisticians– Information RetrievalWe are open to exploring research possibilitiesrelated to search with a wide range ofpeople, including law professors, as I think ourrecord indicates. W. Bruce Croft, Ph.D. (5/13/13email)
  13. 13. • National offices• Regional offices• Other governmental agencies• NGOs
  14. 14. • Statistics Home Page & throughout site– Home page– USPTO Data Visualization Center Patents Dashboard– Calendar & Fiscal Year Statistics– Miscellaneous Patent Statistics, Other Web Pages
  15. 15. Electronic Data Products• The USPTO makes patent public data availablein bulk form, which can be used to load intodatabases or other analytical tools forresearch and analysis.• Bulk data is generally provided in the form ofZIP files containing TIFF or PDFimages, structured ASCII files or concatenatedXML documents.– EIPD Order Form
  16. 16. • Patent Technology Monitoring Team (PTMT)– PTMT Custom Reports• These costs may vary widely -- from as low as $50.00plus $10.00 for every 30 single-sided report pages and$25.00 per one and a half megabytes of uncompressedelectronic file output.
  17. 17. Dear Jon-The PTMT custom reports are pretty much limited to the standard PTMT reports that you can see on the USPTO Web Site.Our custom reports generally consist of those reports, limited to select groups of patents that a requester identifies. Wealso will produce some very simple reports and/or data extractions (e.g., lists of inventors and their patents) at reasonablecost.Our staff is quite small, consisting of me and my colleague, Paul Harrison, and a part time programmer; our work scheduleis pretty much fully committed. As a result of our limited staff resources and our workload, we arent able to act as aresearch arm for researchers wanting to run a multitude of reports (as much as we might like to be able to do so). However,we try to help researchers with their questions when we can and to provide guidance to the researchers when they workwith the patent data such as the data obtained from the USPTO Web Site and the PTMT Custom Patent Data DVD.For law professors who lack the technical expertise to work with large data sets in a database, the best options are likely tobe for them to work with a private patent data provider, which can be expensive, or to find a colleague with technicalexpertise who can work with them in a joint research project.Just as an additional comment, professors interested in patent data relating to the patenting process (e.g., number of first-action issues having a particular characteristic, number of patent applications subject to restriction requirements, etc.) willprobably have to submit a pointed request for the data/statistics and may need to file a FOIA if those data arent alreadyavailable on the USPTO Web Site and arent otherwise readily available.Jim
  18. 18. • FAQ - Patent statistics and patent mapping• Be aware that simply counting patents is often not enough, since thevalue of patents is so different from case to case - you need to assessthe importance of the invention.– Significant indicators include: patent family size, the length of time thepatent is in force and citation information.• Some sources of patent statistics are limited to data from a particulargeographical area, ESPACE Bulletin for example containing onlyEuropean publication data.• You should also always compare the resulting information with othersources, such as market information and expert opinions. You shouldalso be familiar with the patent grant procedure.
  19. 19. • Policy makers need empirical evidence of howdifferent IP strategies can affect innovationand GDP growth.• WIPO is helping to address the lack of reliableeconomic research on IP by developingmethodologies and commissioning economicstudies to assist policy makers in theirdecision-making.
  20. 20. • IP Outreach Research - Surveys Database– WIPOs Research Database contains hundreds ofsummaries of empirical research studies whichexamine the awareness, attitudes and behavior ofdifferent groups towards the creation, use andrespect of intellectual property. The continuouslyupdated database is searchable bysubject, country, year and more.
  21. 21. • The mission of the Organisation for Economic Co-operation andDevelopment (OECD) is to promote policies that will improve theeconomic and social well-being of people around the world.• Indicators on patents– The OECD Patent Database was set up to develop patent indicatorsthat are suitable for statistical analysis and that can help address S&Tpolicy issues.– The Patent Database covers data on patent applications to theEuropean Patent Office (EPO), the US Patent and Trademark Office(USPTO), patent applications filed under the Patent Co-operationTreaty (PCT) that designate the EPO, as well as Triadic patent families.– Data mainly derives from the latest version of the EPO’s WorldwidePatent Statistical Database (PATSTAT).
  22. 22. More OECD Tools• OECD Compendium of Patent Statistics• Raw data on patents– OECD Triadic Patent Families Database, July 2011: set ofpatents filed for at the EPO, the Japan Patent Office (JPO) andgranted by the USPTO that share one or more priorityapplications.– OECD REGPAT Database, July 2011: patent applications to theEPO and PCT filings linked to more than 5 500 regions using theinventors/applicants addresses (covering regions from selectedcountries outside the OECD area).– OECD Citations Database, July 2011: citations from patentspublished by the EPO and the WIPO (PCT).• OECD "Harmonised Applicants Names" database• OECD’s Core Data
  23. 23. • Conference on Patent Statistics for DecisionMakers• Methodological information helps to designand interpret patent statistics• OECD Patent Statistics Manual• Patent Statistics Task Force
  24. 24. Growth of EPO & WIPO CollectionsYou have meddled with the primal forces of nature, Mr. Beale, and I wonthave it! Is that clear?• Having been in this industry for over 10 years my sense isthat the free services have become an impediment to thegrowth and development of more robust offerings andanalytic capabilities from the private sector.– Peter Vanderheyden, Former Vice President, GlobalIntellectual Property, LexisNexis
  25. 25. • Bibliographic platforms since 1970’s• Followed by full text issued U.S. patents• Followed by European and PCT docs• Followed by Asian bib and translatedcollections• Followed by bib and translated collectionsfrom 90 other countries
  26. 26. • Search• Analysis• Work flow and project tools• Specialty searches– Chemical structures and DNA sequences
  27. 27. Tools to Compare Platforms
  28. 28. • Chronological Scope of data limited• only be bibliographic data• lack post issuance activity• contain data errors• Keyword obfuscation of invention• Lack assignee normalization• Not be readily found using anyclassification scheme
  29. 29. U.S. Patents Riddled WithMistakes, Survey Finds• An astounding 98% of approved U.S. patent applications containmistakes ranging from simple spelling errors to omitted claims.• The mistakes were uncovered by Intellevate, the world’s largestpatent proofreading organization. More than half of the mistakesit found at its office in India were made by the U.S. Patent andTrademark Office, according to Intellevate chief executive LeonSteinberg.• Mistakes on everything– leaving out portions of the patent claims– putting in the wrong drawings– spelling
  30. 30. • Short, meaningless titles and abstracts• Patent documents notorious for vagueness• Language may be abstract - patent attorneyis own lexionographer– Frisbee = levitating disk• Vocabulary may not be standardized or evenexist– Kevlar = optically anisotrophic aromatic polyamidedopes classed with synthetic resins and not tiresor bullet proofing
  31. 31. – Too broad?– Too narrow– Out of date?– Neglected?– Unclassifiable (U.S. Class 1/1)– Untested? (CPC)– Patented invention may be in different technology from thatin which it is eventually applied?• Velcro = classed in stock materials while applicationsfound in medical and amusement devices
  32. 32. Non textual searching….• Challenges– Figures– Drawings– Diagrams– Structures– Sequences– Letterforms &typography• EmergingSolutions– PATSEEK– ImageSeeker byLTU– PatMedia
  33. 33. Alternative approaches“The 1 click that takes 1000 clicks on other services”• Search and examination relied on operator qualityand could not be held to an empirical standard.• Heuristics• Latent semantic analysis• Natural language
  34. 34. Work Flow & Collaboration Tools
  35. 35. Working with huge data sets
  36. 36. Powerful analytical and visualizationtools• Clustering Tool – Quickly find valuablerelationships through linguistic analysis of searchterms.• ThemeScape Maps – Easily identify predominantconcepts and see their relationship to oneanother.• Citation Maps – Trace the history of an invention.• Charting – Instantly create lists or charts that aremeaningful to your search.
  37. 37. Reading Content Maps• Documents containingsimilar content are drawnnear each other in the map• Contour lines indicaterelative document density• Tall peaks contain manydocuments, while thesmaller peaks contain fewerdocuments• Peaks that are locatedcloser to each other havemore closely relatedcontent than peaks that arelocated farther away
  38. 38. Showing specific data compared to thefull landscape.
  39. 39. Citation Mapping
  40. 40. Sample map: By Time &Generation, Backward Only, 5Generations, 10-Year Increments
  41. 41. You can assign colors to nodes according to patent recordproperties By selecting Assignee from the menu, you will beable to see, by color, the records with the same assignee.
  42. 42. Royalty Rate Data