0
By Gianluca Tarasconi – Kites Univ. Bocconi / O.S.T.
About the speaker   Background in Management Engineering @    Politecnico of Milan   Database Architect @ KITeS (previos...
What is PATSTAT   is a snapshot of the EPO database for    over about 70 million applications from    more than 80 applic...
Data Sorces for PATSTAT Source for EP data is DOCDB (EPO  master documentation database) Source for other offices are fi...
Implementing the DB (I) Over 20 tables in  a relational DB  with application is  as main primary  key EPO adds /  improv...
Implementing the DB (II) + standard scripts, a growing community  to exchange procedures etc. (example) - need a person ...
Plug & play extensionsDatasets that can be added with no effort: Regpat: OECD dataset giving NUTS3 for each  applocant / ...
Some papers using Kites-PatstatDBLissoni, F., Llerena, P., McKelvey, M., and B. Sanditov "Academic Patenting in Europe: Ne...
Some advancedapplications OST patent applicants data quality  procedure and Match with ORBIS OST common identifier among...
Applicants data qualityprocedure and Match withORBIS (I) Goal of the procedure is to clean and  standardize patent applic...
Applicants data qualityprocedure and Match withORBIS (II)   Data quality procedure developed using    portable query and ...
Applicants data qualityprocedure and Match withORBIS (III) C&S step results: from 12.280.000 pat.  applicants to about 3....
OST Common identifier (I)Data cathegories existing across patent, scientific publications and Framework programs data:    ...
OST Common identifier (II)1)DEFINE ATOMIC ENTITIES AND NON  AMBIGUOS JOINS Even if they regard similar entities there are...
OST Common identifier (III)   Example
OST Common identifier (IV)   2) TIMESERIES   2a) DATASET ASINCHRONIES   Data may enter the database with different time...
OST Common identifier (V)   Timeseries examples                        Sarajevo chg from YU to BS in 1992               B...
OST Common identifier (V)   OBJECT / PROPERTIES DATASTRUCTURE   Data structure proposed should be a TEMPORAL DATABASE(1)...
APPENDIX: Temporal database Example (I)NOVARTISNovartis pharma is originated by merge of CIBA(1884) GEIGY (1758) and Sando...
Temporal database : Example (II)    NOVARTIS    1970 first merge CIBA + GEIGHY = CIBA GEIGHY LTDLEGPCODE    LEGPNAME      ...
Temporal database : Example (III)   NOVARTIS   1996 second merge: CIBA GEIGHY + Sandoz = NovartisLEGPCODE LEGPNAME       3...
Upcoming SlideShare
Loading in...5
×

Patstat and patstat related resources for patent data analisys

1,432

Published on

Presented @Bordeaux IV oct 2012

Published in: Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,432
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Patstat and patstat related resources for patent data analisys"

  1. 1. By Gianluca Tarasconi – Kites Univ. Bocconi / O.S.T.
  2. 2. About the speaker Background in Management Engineering @ Politecnico of Milan Database Architect @ KITeS (previosly CESPRI) since 2002 Project manager for data production in EU Projects STI-NET, TENIA, AEGIS and EU Tenders ICT network impact, INNOVA, Higly Cited Patents, Measurement and analysis of knowledge and R&D exploitation flows, assessed by patent and licensing data Collaborations on database projects with: MIT, LSE, Danish Board of Technology, Bonn Graduate School of Economic, Universtät Mainz, BETA … Redactor of blog rawpatentdata.blogspot.com
  3. 3. What is PATSTAT is a snapshot of the EPO database for over about 70 million applications from more than 80 application authorities, containing bibliographic data, citations and family links. It requires the data to be loaded in the customers own database. + low cost of ownership - costs of implementation
  4. 4. Data Sorces for PATSTAT Source for EP data is DOCDB (EPO master documentation database) Source for other offices are files provided by other patent authorities + Good coverage for US, EU states, JP, EPO, WIPO - For other authorities gaps and leaks not easy to identify
  5. 5. Implementing the DB (I) Over 20 tables in a relational DB with application is as main primary key EPO adds / improves data each ediction
  6. 6. Implementing the DB (II) + standard scripts, a growing community to exchange procedures etc. (example) - need a person who has both DB and patent data knowledge
  7. 7. Plug & play extensionsDatasets that can be added with no effort: Regpat: OECD dataset giving NUTS3 for each applocant / inventor (EP only) Han: OECD Harmonized applicants names dataset (EP only) eee_ppat: KUL/Eurostat standard names and sector allocation (all patstat) Tls221: Epo legal data table, allowing to include changes of ownership, oppositions... (example) ape-inv: Inventors disambiguation tools and academic inventors.Note: all tables, but TLS221 are free of cost
  8. 8. Some papers using Kites-PatstatDBLissoni, F., Llerena, P., McKelvey, M., and B. Sanditov "Academic Patenting in Europe: New Evidence from the KEINS Database," Research Evaluation, 17(2): 87-102.Bacchiocchi E., Montobbio F. (2009); Knowledge Diffusion from University and Public Research. A Comparison between US Japan and Europe using Patent Citations. Journal of Technology Transfer, vol.34 (2), pp.169-181.Breschi S., Lissoni F., Montobbio F. (2008). University patenting and scientific productivity. A quantitative study of Italian academic inventors. European Management Review. The Journal of the European Academy of Management 5(2): 91-109Corrocher N., Malerba F., Montobbio F. (2007); Schumpeterian Patterns of Innovative Activity in the ICT Field. Research Policy. vol. 36, pp. 418-432Breschi S., Lissoni F., Montobbio F. (2007). The Scientific Productivity Of Academic Inventors: New Evidence From Italian Data. Economics of Innovation and New Technology, Vol. 16, Issue 2, pp. 101-118Della Malva A, Breschi S, Lissoni F, Montobbio F. (2007). Lattivita brevettuale dei docenti universitari: LItalia in un confronto internazionale. Economia e Politica Industriale.v.2 pp.43-70. [pdf]Montobbio F. (2008); Patenting Activity in Latin American and Caribbean Countries.In World Intellectual Property Organization(WIPO) - Economic Commission for Latin America and the Caribbean (ECLAC) - Study on Intellectual Property Management in Open Economies: A Strategic Vision for Latin America". ForthcomingFrazzoni S., Mancusi M., Rotondi Z., Sobrero M., Vezzulli A., (2011), “Relationship with banks and access to credit for innovation and internationalization in SMEs”, L’EUROPA E OLTRE. Banche e imprese nella nuova globalizzazione, XVI Rapporto sul sistema finanziario italiano, Edibank, 2011. ISBN 978-88-449-0495-1.V. Sterzi: Patent quality and ownership: An analysis of UK faculty patenting, Research Policy, 2012 (forthcoming)
  9. 9. Some advancedapplications OST patent applicants data quality procedure and Match with ORBIS OST common identifier among Patstat WoS, Framework programs DBs
  10. 10. Applicants data qualityprocedure and Match withORBIS (I) Goal of the procedure is to clean and standardize patent applicants names (ie removing type of company, common misspelling etc.) After names C&S a procedure has been developed in order to apply 5 different match algorithms in order to give allow the best matches with ORBIS company names.
  11. 11. Applicants data qualityprocedure and Match withORBIS (II) Data quality procedure developed using portable query and tables (see Tarasconi - Sharing names/address cleaning patterns for Patstat from patstat users day 2011) Match procedure developed aiming to be multiporpose (IE has already been used to match TM vs Patents applicants @ KITeS) Code and tables available for MySql and Oracle. http://documents.epo.org/projects/babylon/eponet.nsf/0/92ab5eb34ff406d1c125795d0050bbc c/$FILE/PATSTAT_user_day_2011_presentations.zip
  12. 12. Applicants data qualityprocedure and Match withORBIS (III) C&S step results: from 12.280.000 pat. applicants to about 3.800.000 companies Match against: 353.294 Orbis Companies in Nace 2540, 2630, 2651, 2910, 3030, 3011, 8422 (defense) Results: 94726 Patent applicants against 66256 Orbis companies Benchmark: Againsts a sample of 1% validation returned a precision rate of 91% and a recall of 95%
  13. 13. OST Common identifier (I)Data cathegories existing across patent, scientific publications and Framework programs data: PATSTAT FPS WOS inventors/applicant participantsGeographic data s addresses addresses affiliations addresses inventors,Individuals applicants contacts authorscompanies applicants participants affiliationssci /tech taxonomies IPC TPs subject cathegories
  14. 14. OST Common identifier (II)1)DEFINE ATOMIC ENTITIES AND NON AMBIGUOS JOINS Even if they regard similar entities there are differences among datasets on the granularity they use on data. (ie in WOS affiliations may be by lab / dept while patents may be by IP office: different size) Bridge dataset should use a entity size allowing unique data match across different sets. This might need some changes also in existing databases. Bridge dataset should also make possible a hierarchic structure of entities allowing join at different level to main datasets.
  15. 15. OST Common identifier (III) Example
  16. 16. OST Common identifier (IV) 2) TIMESERIES 2a) DATASET ASINCHRONIES Data may enter the database with different time frame depending from the dataset. (IE PATSTAT is a full update so a snapshot at moment of data creation, WOS is an incremental update; so name changes/M&A could make same entity different in 2 datasets; note also geographic entities change with time: counties, countries…) Bridge tables must have a time-related dimension. 2b) DATA TRANFORMATIONS Data change within time. (IE companies may merge, split [most critical case], change name, change owner…) Bridge tables must have a continuation dimension allowing to follow transformation of entities.
  17. 17. OST Common identifier (V) Timeseries examples Sarajevo chg from YU to BS in 1992 BEFORE Sarajevo YU BS AFTER Sarajevo YU 1800 1991 Sarajevo BS 1992 9999
  18. 18. OST Common identifier (V) OBJECT / PROPERTIES DATASTRUCTURE Data structure proposed should be a TEMPORAL DATABASE(1), allowing to store PROPERTIES/STATUS/EVENTS, so FI contain following fields: PROPERTY NAME (ie ownership, affiliation…) PROPERTYVALUE (ie new owner, new affiliation) DATEFROM DATETO CHGREASON (if blank is still valid) VALUE1…N (ie type of acquisition, % ownership…) Along with properties must also be defined how properties are inherited among entities (IE CNRS Bordeaux inherits from CNRS ownership, probably sector of activity… ) (1) See Richard T. Snodgrass. "TSQL2 Temporal Query Language". www.cs.arizona.edu. Computer Science Department of the University of Arizona
  19. 19. APPENDIX: Temporal database Example (I)NOVARTISNovartis pharma is originated by merge of CIBA(1884) GEIGY (1758) and Sandoz (1876)Until 1970 they are 3 separate entitiesLEGPCODE LEGPNAME 1 CIBA 2 GEIGHY 3 SANDOZ 4 CIBA SUB 1..N 5 GEIGHY SUB 1…N 6 SANDOZ SUB 1…NLEGPCODE PROPNAME PROPVALUE STATUSCODE2 STATUSTEXT STATUSPERC DATEFROM DATETO CHGREASON 1 OWNERSHIP FULLOWN 1 100 1884 9999 2 OWNERSHIP FULLOWN 2 100 1758 9999 3 OWNERSHIP FULLOWN 3 100 1876 9999 4 OWNERSHIP FULLOWN 1 100 1884 9999 5 OWNERSHIP FULLOWN 2 100 1758 9999 6 OWNERSHIP FULLOWN 3 100 1876 9999 19
  20. 20. Temporal database : Example (II) NOVARTIS 1970 first merge CIBA + GEIGHY = CIBA GEIGHY LTDLEGPCODE LEGPNAME 1 CIBA 2 GEIGHY 3 SANDOZ 4 CIBA SUB 1..N 5 GEIGHY SUB 1…N 6 SANDOZ SUB 1…N 7 CIBA GEIGY LTD. LEGPCODE PROPNAME PROPVALUE STATUSCODE2 STATUSTEXT STATUSPERC DATEFROM DATETO CHGREASON 1 OWNERSHIP FULLOWN 1 100 1884 1969 MERGE 2 OWNERSHIP FULLOWN 2 100 1758 1969 MERGE 3 OWNERSHIP FULLOWN 3 100 1876 9999 4 OWNERSHIP FULLOWN 1 100 1884 1969 MERGE 5 OWNERSHIP FULLOWN 2 100 1758 1969 MERGE 6 OWNERSHIP FULLOWN 3 100 1876 9999 TRANSFORMATI 1 ON MERGE 7 50 1970 1970 TRANSFORMATI 2 ON MERGE 7 50 1970 1970 7 OWNERSHIP FULLOWN 7 100 1970 9999 4 OWNERSHIP FULLOWN 7 100 1970 9999 5 OWNERSHIP FULLOWN 7 100 1970 9999 20
  21. 21. Temporal database : Example (III) NOVARTIS 1996 second merge: CIBA GEIGHY + Sandoz = NovartisLEGPCODE LEGPNAME 3 SANDOZ 4 CIBA SUB 1..N 5 GEIGHY SUB 1…N 6 SANDOZ SUB 1…N 7 CIBA GEIGY LTD. 8 NOVARTISLEGPCODE PROPNAME PROPVALUE STATUSCODE2 STATUSTEXT STATUSPERC DATEFROM DATETO CHGREASON 3 OWNERSHIP FULLOWN 3 100 1876 1995 MERGE 4 OWNERSHIP FULLOWN 1 100 1884 1969 MERGE 5 OWNERSHIP FULLOWN 2 100 1758 1969 MERGE 6 OWNERSHIP FULLOWN 3 100 1876 1995 MERGE 7 OWNERSHIP FULLOWN 7 100 1970 1995 MERGE 4 OWNERSHIP FULLOWN 7 100 1970 1995 MERGE 5 OWNERSHIP FULLOWN 7 100 1970 1995 MERGE TRANSFORMATI 3 ON MERGE 8 50 1996 9999 TRANSFORMATI 7 ON MERGE 8 50 1996 9999 8 OWNERSHIP FULLOWN 8 100 1996 9999 MERGE 4 OWNERSHIP FULLOWN 8 100 1996 9999 MERGE 5 OWNERSHIP FULLOWN 8 100 1996 9999 MERGE 6 OWNERSHIP FULLOWN 8 100 1996 9999 MERGE 21
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×