Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyzing trajectories of technological knowledge, Dr Arho Suominen

253 views

Published on

The 1st Annual International Conference of the IEEE Technology and Engineering Management Society

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Analyzing trajectories of technological knowledge, Dr Arho Suominen

  1. 1. VTT TECHNICAL RESEARCH CENTRE OF FINLAND LTD Analyzing trajectories of technological knowledge Topic modelling approach to knowledge depth and breadth The 1st Annual International Conference of the IEEE Technology and Engineering Management Society Dr. Arho Suominen
  2. 2. 2 KNOWLEDGE – THE CORE ASSET OF CORPORATIONS MANAGING IT REQUIRES US TO KNOW WHAT INTERNAL AND EXTERNAL KNOWLEDGE IS AVAILABLE
  3. 3. 331/05/2017 3 INTRODUCTION  Technology management and planning requires that we are able to quantify knowledge embedded in and outside the organization.  Depth and breadth of knowledge are the main dimensions used to make this happen.  Knowledge depth is defined as an actors level of expertise or sophistication.  Knowledge breadth is defined an actors capabilities to exploit adjacent technologies or the multi-dimensionality of its knowledge base.  Knowledge depth and breadth have been shown to have a significant impact to company performance
  4. 4. 431/05/2017 4 WHAT WE HAVE DONE BEFORE THAT MIGHT HAVE SOME LIMITATIONS  Patent data, admit its caveats, have been seen as the most practical vantage point into a companies knowledge.  Previous studies have operationalized companies knowledge structure by looking at patent classifications:  This approach has significant caveats, due to  classifications errors,  overall noisiness  challenges related to the taxonomy of patents and  the classification system inability represent novelty by forcing new thing in historical classes  Above is written with the understanding that there have been recent studies looking at keyword and machine learning based approaches in operationalizing patents.
  5. 5. TWO EXAMPLES WHY OUR APPROACH CAN ADD VALUE OVERCOMING LIMITATIONS OF PATENT CLASSIFICATIONS AND ABSTRACT BASED ANALYSIS
  6. 6. 6631/05/2017 CLASSIFICATION VS. MACHINE LEARNING MACHINE LEARNED TOPICS ALIGN POORLY WITH HUMAN CLASSIFICATION 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 Topic1 Topic7 Topic13 Topic19 Topic25 Topic31 Topic37 Topic43 Topic49 Topic55 Topic61 Topic67 Topic73 Analysis of biological materials Audio-visual technology Basic communication processes Basic materials chemistry Biotechnology Chemical engineering
  7. 7. 731/05/2017 7 CLASSIFICATION VS. MACHINE LEARNING MACHINE LEARNED TOPICS ALIGN POORLY WITH HUMAN CLASSIFICATION 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 Topic1 Topic3 Topic5 Topic7 Topic9 Topic11 Topic13 Topic15 Topic17 Topic19 Topic21 Topic23 Topic25 Topic27 Topic29 Topic31 Topic33 Topic35 Topic37 Topic39 Topic41 Topic43 Topic45 Topic47 Topic49 Topic51 Topic53 Topic55 Topic57 Topic59 Topic61 Topic63 Topic65 Topic67 Topic69 Topic71 Topic73 Topic75 Analysis of biological materials Audio-visual technology Basic communication processes Basic materials chemistry Biotechnology Chemical engineering Civil engineering Computer technology Control Digital communication Electrical machinery, apparatus, energy Engines, pumps, turbines Environmental technology Food chemistry Furniture, games Handling
  8. 8. 8831/05/2017 EXAMPLE US9185203B2 Mobile device display management Abstract The display of a mobile device is managed during a voice communication session using a proximity sensor and an accelerometer. In one example, the display of a mobile device is turned off during a phone call on the mobile device when a proximity sensor detects an object is proximate the device and an accelerometer determines the device is in a first orientation. In total 62 words EXAMPLE US9185203B2 Mobile device display management Description Background… Summary… Brief Description of Drawings… Detailed description… In total 8886 words CLASSIFICATION VS. MACHINE LEARNING MACHINE LEARNED TOPICS ALIGN POORLY WITH HUMAN CLASSIFICATION
  9. 9. 9931/05/2017 CLASSIFICATION VS. MACHINE LEARNING ABSTRACTS ARE POOR DESCRIPTION OF THE KNOWLEDGE CONTENT
  10. 10. 1031/05/2017 10 Unsupervised learning  Produces an outcome based on an input while not receiving any feedback from the environment.  reliance on a formal framework that enables the algorithm to find patterns.  Topic models " ...can extract surprisingly interpretable and useful structure without any explicit "understanding" of the language by computer".  As a simplification each document in a corpus is a random mixture over latent topics, and each latent topic is characterized by a distribution over words.
  11. 11. 1131/05/2017 11 DATA, PRE-PROCESSING, AND ANALYSIS SAMPLE  From the telecommunication industry  Alcatel-Lucent, Apple, Google, Huawei, Microsoft, Nokia and Samsung Electronics  The analysis was limited to a time period from 2001 to 2014. METHOD  Analyzed sample companies knowledge base with unsupervised learning using patent data as proxy. DATA SOURCE  full-text patent descriptions filed in the USPTO containing approximately 6 million patents. The repository, owned by Teqmine Analytics Ltd  Final data contains 157 718 records.
  12. 12. 1231/05/2017 12 DATA, PRE-PROCESSING, AND ANALYSIS Topic 1 Topic 2 … Topic N Patent 1 0.10 0.24 0.40 Patent 2 0.40 0.01 0.10 … Patent N 0.01 0.80 0.01 Topic 2 Topic N Topic 1 Patent 2 Patent 1 Patent N
  13. 13. 131331/05/2017 DATA, PRE-PROCESSING, AND ANALYSIS ALGORITHM: LDA  The algorithm is based on an online variational Bayes algorithm for LDA [9]  Number of Topics used was set using a trial-and-error approach to 75. IMPLEMENTATION: Python  Python implementation included pre- processing ANALYSIS: Gephi, Python, Excel  Gephi was used to create visuals from the soft classification created by the algorithm.  Python was used to pivot the document topic probability matrix by company to a sum of probabilities by company in a given year  Excel was used to calculate TD defined as:
  14. 14. 14 RESULTS
  15. 15. 151531/05/2017 BI-PARTITE NETWORK OF KNOWLEDGE MACHINE LEARNED TOPICS CREATE A NETWORK MAP OF KNOWLEDGE
  16. 16. 161631/05/2017 PRACTICAL USE CASE MACHINE LEARNED TOPICS CREATE A TEMPORAL VIEW AND ESTIMATION ON KNOWLEDGE ASSETS
  17. 17. 1731/05/2017 17 INSIGHT: TELECOMMUNICATION INDUSTRY Sample telecommunication companies with a decreasing technological diversity value. X-axis is years and Y-axis is Technological Diversity (TD), calculated for each company.
  18. 18. 1831/05/2017 18 Insight on the telecommunication industry Sample telecommunication companies with a increasing technological diversity value. X-axis is years and Y-axis is Technological Diversity (TD), calculated for each company. Largest increase in TD from Google, for which the linear trend line is given with fit values.
  19. 19. 1931/05/2017 19 INSIGHT: TELECOMMUNICATION INDUSTRY  Correlation between technological diversity and count of patents.  p-value is higher than 0.05 the results of the correlation were not statistically significant (r(109) = 0.17, p = 0.077)  A multiple linear regression was calculated to predict the technological diversity based on patent count and company  A significant regression equation was found F(8, 102), 35.99, p = .000 with an R2 = 0.73  Google, Huawei and Microsoft were significant predictors. Patent count, Apple, Motorola, Nokia and Samsung were not a significant predictors.  There is a clear trend of technological diversity.  Patent count is not a significant predictor in explaining technological diversity.
  20. 20. 2031/05/2017 20  Natural language offers an important vantage point to interesting phenomenon not directly measurable.  This advantage is clear in the case of patent data analysis, where abstract are known to carry a low information value and the use of metadata has significant limitations.  Main finding is that, by using full-text and LDA, we can create a Technology Diversity value independent of patent count.  This analysis opens the possibility to utilize the approach in more in depth studies focusing in, for example, measuring the impact of company knowledge depth and breadth to company performance. INSIGHT: TELECOMMUNICATION INDUSTRY
  21. 21. 212131/05/2017 THANK YOU Dr. Arho Suominen Senior Scientist, D.Sc. (Tech) Academy of Finland Postdoctoral Researcher stationed at VTT TECHNICAL RESEARCH CENTRE OF FINLAND Innovations, Economy, and Policy Vuorimiehentie 3, P.O. Box 1000, 02044 Espoo, Finland Tel. +358 50 5050 354 www.vtt.fi, arho.suominen@vtt.fi https://www.linkedin.com/in/arhosuominen Twitter @ArhoSuominen
  22. 22. TECHNOLOGY FOR BUSINESS

×